tokenmill/crawling-framework

Easily crawl news portals or blog sites using Storm Crawler.

/ 100

Emerging

This framework helps you gather information from news websites and blogs. You specify which sites to monitor, set up rules to extract article titles, content, and publication names, and it automatically collects the data. It's designed for data analysts, researchers, or marketers who need to track trends or gather competitive intelligence from online sources.

No commits in the last 6 months.

Use this if you need to systematically collect articles and blog posts from many different websites for analysis.

Not ideal if you only need to scrape a few pages occasionally or if you don't have experience with Java and Elasticsearch setup.

news-monitoring market-research competitive-intelligence content-aggregation media-analysis

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

Java

License

—

Featured in

Giving AI Agents Eyes: Browser Automation in 2026

Higher-rated alternatives

scrapy/scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

Altimis/Scweet

A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers,...

lexiforest/curl_cffi

Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser...

plabayo/rama

modular service framework to move and transform network packets

scrapinghub/spidermon

Scrapy Extension for monitoring spiders execution.

Explore Perception Tools

All categories Trending Perception directory Insights