omar-elmaria/python_scrapy_airflow_pipeline

This repo contains a full-fledged Python-based script that scrapes a JavaScript-rendered website, cleans the data, and pushes the results to a cloud-based database. The workflow is orchestrated on Airflow to run automatically

13
/ 100
Experimental

This project helps e-commerce analysts or competitive intelligence specialists automatically gather detailed product and pricing data from competitor websites, even those with anti-bot measures. It takes a website URL as input and outputs a structured table containing product names, categories, prices, discounts, delivery times, and other key details directly into a cloud database. This enables users to track competitor strategies and market trends without manual effort.

No commits in the last 6 months.

Use this if you need to regularly collect comprehensive product and pricing information from JavaScript-rendered e-commerce websites for competitive analysis or market research.

Not ideal if you only need to scrape a website once manually, or if you require a simple, code-free scraping solution for basic data extraction.

e-commerce-analytics competitor-intelligence market-research pricing-strategy product-assortment
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 8 / 25
Community 0 / 25

How are scores calculated?

Stars

14

Forks

Language

Python

License

Category

scraper

Last pushed

Oct 02, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/perception/omar-elmaria/python_scrapy_airflow_pipeline"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.