apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
This library helps web developers build robust tools to collect data from websites. It allows you to gather various types of content, from structured data like product details to entire files such as images, PDFs, or HTML pages. Developers create automated scripts that take website URLs as input and produce organized datasets or downloaded files as output.
22,542 stars. Used by 2 other packages. Actively maintained with 43 commits in the last 30 days. Available on npm.
Use this if you are a developer tasked with reliably extracting large amounts of data from the web for applications like AI training, content aggregation, or competitive analysis.
Not ideal if you need a no-code solution or a simple browser extension for occasional, manual data extraction.
Stars
22,542
Forks
1,288
Language
TypeScript
License
Apache-2.0
Category
Last pushed
Mar 28, 2026
Commits (30d)
43
Dependencies
14
Reverse dependents
2
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/perception/apify/crawlee"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
seleniumbase/SeleniumBase
APIs for browser automation, testing, and bypassing bot-detection.
apify/crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers....
intoli/user-agents
A JavaScript library for generating random user agents with data that's updated daily.
Kaliiiiiiiiii-Vinyzu/patchright
Undetected version of the Playwright testing and automation library.
orangecoding/fredy
❤️ Fredy - [F]ind [R]eal [E]state [D]amn Eas[y] - Fredy keeps searching for new apartments,...