arleigh418/Web-Crawler
PTT、Yahoo台股財經新聞、鉅亨網、台灣塑膠製品同業公會、PChome、讀古詩詞網、金10報價牆、證交所三大法人買賣超日報
This tool helps financial analysts, traders, and researchers gather specific data from various Taiwanese financial news sites, stock exchanges, and e-commerce platforms. It takes URLs from sites like Yahoo Finance Taiwan, TWSE, and Jin10, and extracts structured information such as stock news, institutional investor trading data, and product listings, which can then be used for market analysis or research. It's designed for individuals who need to systematically collect web data from these specified sources without manual copying.
No commits in the last 6 months.
Use this if you need to extract specific financial news, stock market data, or product information from Taiwanese websites like Yahoo Finance, TWSE, and PChome.
Not ideal if you need to crawl websites not listed, if those websites frequently change their layout, or if you require real-time data extraction that adapts to website updates.
Stars
19
Forks
5
Language
Python
License
—
Category
Last pushed
Nov 22, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/perception/arleigh418/Web-Crawler"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
seleniumbase/SeleniumBase
APIs for browser automation, testing, and bypassing bot-detection.
apify/crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers....
intoli/user-agents
A JavaScript library for generating random user agents with data that's updated daily.
apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In...
Kaliiiiiiiiii-Vinyzu/patchright
Undetected version of the Playwright testing and automation library.