get-set-fetch/scraper
Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supported databases: SQLite, MySQL, PostgreSQL. Supported headless clients: Puppeteer, Playwright, Cheerio, JSdom.
This tool helps you automatically collect information from websites, whether it's product details, public data, or other content. You provide it with a starting web address and tell it what specific pieces of information you want to extract, like headlines, prices, or links. It then delivers a structured dataset, often in a format like CSV. This is ideal for researchers, marketers, or data analysts who need to gather large amounts of publicly available web data.
113 stars. No commits in the last 6 months. Available on npm.
Use this if you need to systematically collect data from many web pages and store it in a structured format for analysis or further use.
Not ideal if you only need to extract data from a handful of pages or prefer a simple browser extension for occasional data grabs.
Stars
113
Forks
18
Language
TypeScript
License
MIT
Category
Last pushed
Mar 13, 2023
Commits (30d)
0
Dependencies
11
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/perception/get-set-fetch/scraper"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
seleniumbase/SeleniumBase
APIs for browser automation, testing, and bypassing bot-detection.
apify/crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers....
intoli/user-agents
A JavaScript library for generating random user agents with data that's updated daily.
apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In...
Kaliiiiiiiiii-Vinyzu/patchright
Undetected version of the Playwright testing and automation library.