get-set-fetch/scraper

Nodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supported databases: SQLite, MySQL, PostgreSQL. Supported headless clients: Puppeteer, Playwright, Cheerio, JSdom.

51
/ 100
Established

This tool helps you automatically collect information from websites, whether it's product details, public data, or other content. You provide it with a starting web address and tell it what specific pieces of information you want to extract, like headlines, prices, or links. It then delivers a structured dataset, often in a format like CSV. This is ideal for researchers, marketers, or data analysts who need to gather large amounts of publicly available web data.

113 stars. No commits in the last 6 months. Available on npm.

Use this if you need to systematically collect data from many web pages and store it in a structured format for analysis or further use.

Not ideal if you only need to extract data from a handful of pages or prefer a simple browser extension for occasional data grabs.

web-data-collection market-research content-aggregation competitive-intelligence data-acquisition
Stale 6m
Maintenance 0 / 25
Adoption 9 / 25
Maturity 25 / 25
Community 17 / 25

How are scores calculated?

Stars

113

Forks

18

Language

TypeScript

License

MIT

Last pushed

Mar 13, 2023

Commits (30d)

0

Dependencies

11

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/perception/get-set-fetch/scraper"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.