brendonboshell/supercrawler

A web crawler. Supercrawler automatically crawls websites. Define custom handlers to parse content. Obeys robots.txt, rate limits and concurrency limits.

55
/ 100
Established

This tool helps developers automate the process of systematically browsing websites and extracting specific information. You provide a starting web address and define what content to look for (like links, images, or text), and the crawler will navigate the site according to rules like robots.txt, gathering the specified data. It's for developers who need to collect large amounts of publicly available web data for analysis, research, or integration into other applications.

382 stars. No commits in the last 6 months. Available on npm.

Use this if you need to programmatically explore a website, respecting site rules, to extract specific content or links.

Not ideal if you need a simple tool for occasional, manual data extraction without writing code.

web-scraping data-collection web-automation content-extraction developer-tool
Stale 6m
Maintenance 0 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 20 / 25

How are scores calculated?

Stars

382

Forks

63

Language

JavaScript

License

Apache-2.0

Category

scraper

Last pushed

Dec 30, 2022

Commits (30d)

0

Dependencies

10

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/perception/brendonboshell/supercrawler"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.