uscensusbureau/SABLE

Scraping Assisted by Learning

/ 100

Emerging

SABLE helps researchers, economists, or data analysts automatically find and extract specific data from PDF documents scattered across many government websites. It takes a list of URLs and, using machine learning, identifies relevant PDFs, extracts their text, and then pulls out the exact data you need, like tax revenue figures. The output is organized data ready for analysis.

No commits in the last 6 months.

Use this if you regularly collect specific numerical or textual data from a large number of government PDFs online and want to automate the process of discovery and extraction.

Not ideal if your data sources are primarily structured databases, private documents, or if you only need to extract data from a few PDFs manually.

data-collection government-data economic-research public-policy tax-revenue-reporting

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 14 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

flairNLP/fundus

A very simple news crawler with a funny name

fhamborg/news-please

news-please - an integrated web crawler and information extractor for news that just works

affjljoo3581/canrevan

대량의 네이버 뉴스 기사를 수집하는 라이브러리입니다.

FreeDiscovery/FreeDiscovery

Web Service for E-Discovery Analytics

tirthajyoti/Web-Database-Analytics

Web scrapping and related analytics using Python tools

Explore NLP Tools

All categories Trending NLP directory Insights