DataHenHQ/till
DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes.
This tool helps web scrapers overcome common issues when trying to gather large amounts of data from websites. It takes your existing web scraping script and automatically adds features to make it more reliable and less likely to be blocked. The result is a robust data collection process that provides reliable data for analysis, typically used by data analysts, market researchers, or business intelligence professionals who rely on web data.
815 stars. No commits in the last 6 months.
Use this if you are collecting data from websites at scale and frequently encounter issues with your scraper being blocked, failing midway, or becoming difficult to maintain.
Not ideal if you only need to scrape a small amount of data occasionally, as the benefits of advanced scaling and anti-blocking features may not be necessary.
Stars
815
Forks
23
Language
Go
License
Apache-2.0
Category
Last pushed
Dec 05, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/perception/DataHenHQ/till"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
scrapy/scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
Altimis/Scweet
A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers,...
lexiforest/curl_cffi
Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser...
plabayo/rama
modular service framework to move and transform network packets
scrapinghub/spidermon
Scrapy Extension for monitoring spiders execution.