kkrugler/flink-crawler

Continuous scalable web crawler built on top of Flink and crawler-commons

43
/ 100
Emerging

This tool continuously fetches web pages, processing them as they're discovered rather than in large batches. It takes a starting set of web links and systematically explores new links found on those pages, outputting a steady stream of crawled content. This is ideal for data professionals, researchers, or analysts who need to collect and analyze large volumes of up-to-date information from the web.

No commits in the last 6 months.

Use this if you need to continuously monitor and collect fresh data from websites at scale, from thousands to billions of pages, without repeated restarts.

Not ideal if you only need to perform a one-off, small-scale scrape of a few specific web pages.

web data extraction market intelligence content discovery search engine indexing competitive monitoring
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 19 / 25

How are scores calculated?

Stars

52

Forks

18

Language

Java

License

Apache-2.0

Category

scraper

Last pushed

Apr 08, 2019

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/perception/kkrugler/flink-crawler"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.