stummjr/scrapy-fieldstats

A Scrapy extension to log items coverage when the spider shuts down

46
/ 100
Emerging

When collecting data from websites, it's common to find that some information is missing for certain items. This tool helps web scrapers understand the completeness of their collected datasets by showing what percentage of scraped items contain specific fields, such as 'price' or 'author'. It provides a clear summary upon job completion, allowing for quick quality checks of the scraped output.

No commits in the last 6 months. Available on PyPI.

Use this if you need to quickly assess the quality and completeness of data collected by your web scraping jobs.

Not ideal if you need a highly detailed, item-by-item data validation report rather than an aggregate field coverage summary.

web-scraping data-quality-check data-extraction web-data-collection dataset-analysis
Stale 6m No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 25 / 25
Community 15 / 25

How are scores calculated?

Stars

19

Forks

4

Language

Python

License

MIT

Category

scraper

Last pushed

Apr 11, 2020

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/perception/stummjr/scrapy-fieldstats"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.