leopardslab/CrawlerX
CrawlerX - Develop Extensible, Distributed, Scalable Crawler System which is a web platform that can be used to crawl URLs in different kind of protocols in a distributed way.
CrawlerX is a web-based platform that helps you gather information from many websites automatically and continuously. You tell it which web pages to look at and what data points to collect, and it provides you with structured data for analysis. This is ideal for data analysts, market researchers, or anyone needing to collect large datasets from the web without an official API.
No commits in the last 6 months.
Use this if you need to systematically extract data from numerous websites on an ongoing or scheduled basis for research, competitive analysis, or building a dataset.
Not ideal if you only need to extract data from a single website once, or if you prefer to write custom scripts for each data extraction task.
Stars
25
Forks
19
Language
SCSS
License
Apache-2.0
Category
Last pushed
Feb 14, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/perception/leopardslab/CrawlerX"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
scrapy/scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
Altimis/Scweet
A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers,...
lexiforest/curl_cffi
Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser...
plabayo/rama
modular service framework to move and transform network packets
scrapinghub/spidermon
Scrapy Extension for monitoring spiders execution.