tokenmill/crawling-framework

Easily crawl news portals or blog sites using Storm Crawler.

37
/ 100
Emerging

This framework helps you gather information from news websites and blogs. You specify which sites to monitor, set up rules to extract article titles, content, and publication names, and it automatically collects the data. It's designed for data analysts, researchers, or marketers who need to track trends or gather competitive intelligence from online sources.

No commits in the last 6 months.

Use this if you need to systematically collect articles and blog posts from many different websites for analysis.

Not ideal if you only need to scrape a few pages occasionally or if you don't have experience with Java and Elasticsearch setup.

news-monitoring market-research competitive-intelligence content-aggregation media-analysis
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 16 / 25
Community 15 / 25

How are scores calculated?

Stars

21

Forks

4

Language

Java

License

Category

scraper

Last pushed

Nov 15, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/perception/tokenmill/crawling-framework"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.