clemfromspace/scrapy-puppeteer
Scrapy + Puppeteer
This project helps web scraping developers extract data from modern websites that rely heavily on JavaScript to display content. It acts as a bridge, allowing your scraping tool to 'see' and process content that dynamically loads after the initial page request. You provide a web address, and it returns the fully rendered HTML, ready for data extraction, along with optional screenshots.
110 stars. No commits in the last 6 months. Available on PyPI.
Use this if you are a web scraping developer encountering difficulties extracting data from JavaScript-heavy websites using Scrapy alone.
Not ideal if you are not already a Python web scraping developer familiar with Scrapy and its ecosystem.
Stars
110
Forks
30
Language
Python
License
MIT
Category
Last pushed
Jun 11, 2021
Commits (30d)
0
Dependencies
2
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/perception/clemfromspace/scrapy-puppeteer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
seleniumbase/SeleniumBase
APIs for browser automation, testing, and bypassing bot-detection.
apify/crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers....
intoli/user-agents
A JavaScript library for generating random user agents with data that's updated daily.
apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In...
Kaliiiiiiiiii-Vinyzu/patchright
Undetected version of the Playwright testing and automation library.