worldbank/wb-nlp-tools

Natural language processing tools developed by the World Bank's DECAT unit. A suite of text preprocessing and cleaning algorithms for NLP analysis and modeling.

37
/ 100
Emerging

This suite of tools helps researchers and analysts efficiently prepare large volumes of text from documents like PDFs for natural language processing. It takes raw text or PDFs, cleans them by correcting spelling, expanding acronyms, and identifying key phrases, and outputs high-quality, structured text ready for analysis or modeling. This is ideal for economists, social scientists, or policy researchers working with extensive textual data.

No commits in the last 6 months.

Use this if you need to transform messy, real-world documents into clean, consistent text for tasks like topic modeling, sentiment analysis, or information extraction.

Not ideal if you primarily work with structured data, require only basic text search, or need highly specialized linguistic analysis not covered by standard cleaning and phrase detection.

policy-research social-science-research economic-analysis development-studies document-analysis
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 16 / 25

How are scores calculated?

Stars

10

Forks

7

Language

Python

License

MIT

Last pushed

Jun 11, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/worldbank/wb-nlp-tools"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.