kkrugler/flink-crawler

Continuous scalable web crawler built on top of Flink and crawler-commons

/ 100

Emerging

This tool continuously fetches web pages, processing them as they're discovered rather than in large batches. It takes a starting set of web links and systematically explores new links found on those pages, outputting a steady stream of crawled content. This is ideal for data professionals, researchers, or analysts who need to collect and analyze large volumes of up-to-date information from the web.

No commits in the last 6 months.

Use this if you need to continuously monitor and collect fresh data from websites at scale, from thousands to billions of pages, without repeated restarts.

Not ideal if you only need to perform a one-off, small-scale scrape of a few specific web pages.

web data extraction market intelligence content discovery search engine indexing competitive monitoring

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 19 / 25

How are scores calculated?

Stars

Forks

Language

Java

License

Apache-2.0

Featured in

Giving AI Agents Eyes: Browser Automation in 2026

Higher-rated alternatives

scrapy/scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

Altimis/Scweet

A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers,...

lexiforest/curl_cffi

Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser...

plabayo/rama

modular service framework to move and transform network packets

scrapinghub/spidermon

Scrapy Extension for monitoring spiders execution.

Explore Perception Tools

All categories Trending Perception directory Insights