tomcardoso/intro-to-scraping
An introduction to web and document scraping
This project helps researchers, analysts, or anyone needing to gather information efficiently by teaching how to automate data collection from websites and documents like PDFs. You'll learn to transform unstructured web pages or offline files into organized, usable datasets without manual entry. It's designed for individuals who need to build their own data sources for analysis.
No commits in the last 6 months.
Use this if you spend countless hours manually extracting data from websites or documents and want to automate this tedious process.
Not ideal if you already have access to well-structured databases or APIs that provide all the data you need directly.
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/perception/tomcardoso/intro-to-scraping"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
scrapy/scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
Altimis/Scweet
A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers,...
lexiforest/curl_cffi
Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser...
plabayo/rama
modular service framework to move and transform network packets
scrapinghub/spidermon
Scrapy Extension for monitoring spiders execution.