hrbrmstr/reapr
đ¸ââšī¸ Reap Information from Websites
This project helps anyone who needs to extract specific content and detailed metadata from websites without getting bogged down in technical complexities. You provide a web address, and it gives you back the webpage's text content, its title, server information, and a rich breakdown of all the HTML elements and their attributes. It's designed for data analysts, researchers, or marketers who regularly pull information from web pages for analysis.
No commits in the last 6 months.
Use this if you need more detailed information from a website than just basic text, like specific HTML tag counts, server details, or the exact attributes of page elements.
Not ideal if you only need simple text content from a web page and don't require deep dives into the page's structure or metadata.
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/perception/hrbrmstr/reapr"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
scrapy/scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
Altimis/Scweet
A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers,...
lexiforest/curl_cffi
Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser...
plabayo/rama
modular service framework to move and transform network packets
scrapinghub/spidermon
Scrapy Extension for monitoring spiders execution.