scrapehero/selectorlib
A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them
This tool helps developers automate the process of extracting specific information from web pages. You provide it with a web page's HTML content and a YAML file that defines what data you want to pull out (like titles or links) using CSS selectors or XPath. The output is structured data, such as a dictionary, containing the extracted information. This is ideal for developers building web scraping solutions or data collection tools.
Used by 1 other package. No commits in the last 6 months. Available on PyPI.
Use this if you are a developer who needs a structured and configurable way to define and extract data from HTML content within your Python applications.
Not ideal if you are not a developer and need a visual, no-code web scraping tool, or if you require advanced features like CAPTCHA solving, JavaScript rendering, or proxy management.
Stars
74
Forks
12
Language
HTML
License
MIT
Category
Last pushed
Jan 30, 2023
Commits (30d)
0
Dependencies
3
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/perception/scrapehero/selectorlib"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
scrapy/scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
Altimis/Scweet
A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers,...
lexiforest/curl_cffi
Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser...
plabayo/rama
modular service framework to move and transform network packets
scrapinghub/spidermon
Scrapy Extension for monitoring spiders execution.