cldellow/datasette-scraper

Add website scraping abilities to Datasette

36
/ 100
Emerging

This tool helps non-technical users gather information from websites by defining what pages to visit and what data to extract. You provide a starting point (like a URL) and rules for what information you're interested in, and it gives you a structured database of that scraped data. It's ideal for analysts, researchers, or marketers who need to collect and organize publicly available web content.

No commits in the last 6 months. Available on PyPI.

Use this if you need to systematically collect data from a moderate number of public web pages (up to about 100,000) and want to store it in an easily accessible database.

Not ideal if you're trying to scrape websites that actively block automated bots or require complex authentication, or if you need to extract data from millions of pages.

web-scraping data-collection market-research content-analysis competitive-intelligence
Stale 6m
Maintenance 0 / 25
Adoption 8 / 25
Maturity 25 / 25
Community 3 / 25

How are scores calculated?

Stars

66

Forks

1

Language

Python

License

Apache-2.0

Category

scraper

Last pushed

Mar 04, 2023

Commits (30d)

0

Dependencies

7

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/perception/cldellow/datasette-scraper"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.