doc-analysis/XFUND
XFUND: A Multilingual Form Understanding Benchmark
This project provides a collection of human-labeled business forms in seven languages (Chinese, Japanese, Spanish, French, Italian, German, Portuguese). It is used to train and test artificial intelligence systems that can automatically extract key information, like names and addresses, from scanned or digital forms. It helps machine learning engineers and researchers build document processing systems for global operations.
217 stars. No commits in the last 6 months.
Use this if you are developing or evaluating machine learning models for automated data extraction from multilingual business forms.
Not ideal if you need a solution to process English-only documents or if you are looking for an out-of-the-box document processing application.
Stars
217
Forks
21
Language
—
License
—
Category
Last pushed
Jul 15, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/doc-analysis/XFUND"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
luheng/deep_srl
Code and pre-trained model for: Deep Semantic Role Labeling: What Works and What's Next
sileod/tasksource
Datasets collection and preprocessings framework for NLP extreme multitask learning
loomchild/maligna
Bilingual sengence aligner
CK-Explorer/DuoSubs
Semantic subtitle aligner and merger for bilingual subtitle syncing.
coastalcph/lex-glue
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English