GeminidSystems/GoogleNewsScraper
A Python package that scrapes Google News article data while remaining undetected by Google. Our scraper can scrape page data up until the last page and never trigger a CAPTCHA (download stats: https://pepy.tech/project/GoogleNewsScraper)
This tool helps researchers, marketers, or analysts gather news article data from Google News without being blocked by CAPTCHAs. You provide a search term and optional date range, and it delivers structured data including article titles, descriptions, sources, URLs, and publication dates. This is ideal for anyone needing to collect large volumes of current or historical news information for analysis.
No commits in the last 6 months.
Use this if you need to systematically collect news article data from Google News for research, trend analysis, or content monitoring without encountering technical roadblocks.
Not ideal if you only need a few articles manually, or if you require real-time alerts rather than bulk data collection.
Stars
11
Forks
5
Language
Python
License
MIT
Category
Last pushed
Feb 28, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/perception/GeminidSystems/GoogleNewsScraper"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
seleniumbase/SeleniumBase
APIs for browser automation, testing, and bypassing bot-detection.
apify/crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers....
intoli/user-agents
A JavaScript library for generating random user agents with data that's updated daily.
apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In...
Kaliiiiiiiiii-Vinyzu/patchright
Undetected version of the Playwright testing and automation library.