uscensusbureau/SABLE

Scraping Assisted by Learning

39
/ 100
Emerging

SABLE helps researchers, economists, or data analysts automatically find and extract specific data from PDF documents scattered across many government websites. It takes a list of URLs and, using machine learning, identifies relevant PDFs, extracts their text, and then pulls out the exact data you need, like tax revenue figures. The output is organized data ready for analysis.

No commits in the last 6 months.

Use this if you regularly collect specific numerical or textual data from a large number of government PDFs online and want to automate the process of discovery and extraction.

Not ideal if your data sources are primarily structured databases, private documents, or if you only need to extract data from a few PDFs manually.

data-collection government-data economic-research public-policy tax-revenue-reporting
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 14 / 25

How are scores calculated?

Stars

36

Forks

6

Language

Python

License

Last pushed

Sep 15, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/uscensusbureau/SABLE"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.