CaterinaBi/aicore-web-scraping-pipeline
Web scraping pipeline I worked on as part of my 'AI and data engineering' training at AiCore.
This project helps real estate professionals, investors, or individuals gather detailed property data from websites like RightMove. It takes publicly available property listings, navigates through pages, bypasses cookies, and extracts key information such as price, address, property type, description, and floorplan image URLs. The output is structured data, including JSON files and locally saved images, which can be used for market analysis or research.
No commits in the last 6 months.
Use this if you need to systematically collect structured property data from real estate websites for analysis or record-keeping.
Not ideal if you require real-time, high-volume data streams or need to interact with websites that employ advanced anti-bot measures beyond basic cookie handling.
Stars
7
Forks
2
Language
Python
License
MIT
Category
Last pushed
Nov 17, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/perception/CaterinaBi/aicore-web-scraping-pipeline"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
seleniumbase/SeleniumBase
APIs for browser automation, testing, and bypassing bot-detection.
apify/crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers....
intoli/user-agents
A JavaScript library for generating random user agents with data that's updated daily.
apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In...
Kaliiiiiiiiii-Vinyzu/patchright
Undetected version of the Playwright testing and automation library.