doc-analysis/XFUND

XFUND: A Multilingual Form Understanding Benchmark

/ 100

Emerging

This project provides a collection of human-labeled business forms in seven languages (Chinese, Japanese, Spanish, French, Italian, German, Portuguese). It is used to train and test artificial intelligence systems that can automatically extract key information, like names and addresses, from scanned or digital forms. It helps machine learning engineers and researchers build document processing systems for global operations.

217 stars. No commits in the last 6 months.

Use this if you are developing or evaluating machine learning models for automated data extraction from multilingual business forms.

Not ideal if you need a solution to process English-only documents or if you are looking for an out-of-the-box document processing application.

document-processing data-extraction multilingual-forms AI-training-data optical-character-recognition

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 14 / 25

How are scores calculated?

Stars

217

Forks

Language

—

License

—

Higher-rated alternatives

luheng/deep_srl

Code and pre-trained model for: Deep Semantic Role Labeling: What Works and What's Next

sileod/tasksource

Datasets collection and preprocessings framework for NLP extreme multitask learning

loomchild/maligna

Bilingual sengence aligner

CK-Explorer/DuoSubs

Semantic subtitle aligner and merger for bilingual subtitle syncing.

coastalcph/lex-glue

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

Explore NLP Tools

All categories Trending NLP directory Insights