tokenmill/crawling-framework
Easily crawl news portals or blog sites using Storm Crawler.
This framework helps you gather information from news websites and blogs. You specify which sites to monitor, set up rules to extract article titles, content, and publication names, and it automatically collects the data. It's designed for data analysts, researchers, or marketers who need to track trends or gather competitive intelligence from online sources.
No commits in the last 6 months.
Use this if you need to systematically collect articles and blog posts from many different websites for analysis.
Not ideal if you only need to scrape a few pages occasionally or if you don't have experience with Java and Elasticsearch setup.
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/perception/tokenmill/crawling-framework"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
scrapy/scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
Altimis/Scweet
A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers,...
lexiforest/curl_cffi
Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser...
plabayo/rama
modular service framework to move and transform network packets
scrapinghub/spidermon
Scrapy Extension for monitoring spiders execution.