my8100/scrapyd-cluster-on-heroku
Set up free and scalable Scrapyd cluster for distributed web-crawling with just a few clicks. DEMO :point_right:
This project helps data analysts and researchers set up a robust, scalable system for gathering information from websites. You provide a list of websites you want to collect data from, and the system automatically distributes the work, extracts the information you need, and stores it for your analysis. This is ideal for anyone who needs to collect large amounts of public data from multiple websites efficiently and reliably.
123 stars. No commits in the last 6 months.
Use this if you need to perform large-scale, automated web scraping across many sites and want a free, easily scalable setup.
Not ideal if your web scraping needs are small-scale or if you require persistent storage directly on the scraping servers themselves without an external database.
Stars
123
Forks
81
Language
Python
License
GPL-3.0
Category
Last pushed
Apr 04, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/perception/my8100/scrapyd-cluster-on-heroku"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
scrapy/scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
Altimis/Scweet
A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers,...
lexiforest/curl_cffi
Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser...
plabayo/rama
modular service framework to move and transform network packets
scrapinghub/spidermon
Scrapy Extension for monitoring spiders execution.