istresearch/scrapy-cluster

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.

/ 100

Established

This project helps developers gather large amounts of information from websites by distributing web scraping tasks across many machines. It takes URLs and instructions for what to extract, and outputs the scraped raw HTML and extracted data. This is for developers building web scrapers that need to operate at a very large scale.

1,230 stars. No commits in the last 6 months.

Use this if you need to build highly scalable, resilient web scraping systems that can handle many concurrent jobs and dynamic crawling.

Not ideal if you're looking for a simple, out-of-the-box web scraping tool for small, one-off data extraction tasks.

web-scraping data-extraction distributed-systems developer-tooling

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 25 / 25

How are scores calculated?

Stars

1,230

Forks

322

Language

Python

License

MIT

Featured in

Giving AI Agents Eyes: Browser Automation in 2026

Related tools

scrapy/scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

Altimis/Scweet

A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers,...

lexiforest/curl_cffi

Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser...

plabayo/rama

modular service framework to move and transform network packets

scrapinghub/spidermon

Scrapy Extension for monitoring spiders execution.

Explore Perception Tools

All categories Trending Perception directory Insights