omar-elmaria/python_scrapy_airflow_pipeline
This repo contains a full-fledged Python-based script that scrapes a JavaScript-rendered website, cleans the data, and pushes the results to a cloud-based database. The workflow is orchestrated on Airflow to run automatically
This project helps e-commerce analysts or competitive intelligence specialists automatically gather detailed product and pricing data from competitor websites, even those with anti-bot measures. It takes a website URL as input and outputs a structured table containing product names, categories, prices, discounts, delivery times, and other key details directly into a cloud database. This enables users to track competitor strategies and market trends without manual effort.
No commits in the last 6 months.
Use this if you need to regularly collect comprehensive product and pricing information from JavaScript-rendered e-commerce websites for competitive analysis or market research.
Not ideal if you only need to scrape a website once manually, or if you require a simple, code-free scraping solution for basic data extraction.
Stars
14
Forks
—
Language
Python
License
—
Category
Last pushed
Oct 02, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/perception/omar-elmaria/python_scrapy_airflow_pipeline"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
scrapy/scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
Altimis/Scweet
A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers,...
lexiforest/curl_cffi
Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser...
plabayo/rama
modular service framework to move and transform network packets
scrapinghub/spidermon
Scrapy Extension for monitoring spiders execution.