uscensusbureau/SABLE
Scraping Assisted by Learning
SABLE helps researchers, economists, or data analysts automatically find and extract specific data from PDF documents scattered across many government websites. It takes a list of URLs and, using machine learning, identifies relevant PDFs, extracts their text, and then pulls out the exact data you need, like tax revenue figures. The output is organized data ready for analysis.
No commits in the last 6 months.
Use this if you regularly collect specific numerical or textual data from a large number of government PDFs online and want to automate the process of discovery and extraction.
Not ideal if your data sources are primarily structured databases, private documents, or if you only need to extract data from a few PDFs manually.
Stars
36
Forks
6
Language
Python
License
—
Category
Last pushed
Sep 15, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/uscensusbureau/SABLE"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
flairNLP/fundus
A very simple news crawler with a funny name
fhamborg/news-please
news-please - an integrated web crawler and information extractor for news that just works
affjljoo3581/canrevan
대량의 네이버 뉴스 기사를 수집하는 라이브러리입니다.
FreeDiscovery/FreeDiscovery
Web Service for E-Discovery Analytics
tirthajyoti/Web-Database-Analytics
Web scrapping and related analytics using Python tools