hrbrmstr/htmlunit
🕸🧰☕️Tools to Scrape Dynamic Web Content via the 'HtmlUnit' Java Library
This tool helps non-programmers extract information from websites that are difficult to access with standard methods, such as those with interactive elements or JavaScript. It takes a web address (URL) and provides structured data like tables or text, similar to how a browser sees it. Digital marketers, researchers, or anyone needing to gather public information from dynamic websites would find this useful.
No commits in the last 6 months.
Use this if you need to reliably pull data from websites that use JavaScript, AJAX, or require form submissions and link clicks to reveal their content.
Not ideal if you primarily need to scrape static HTML content from simple websites without dynamic elements or complex interactions.
Stars
36
Forks
6
Language
R
License
Apache-2.0
Category
Last pushed
Apr 12, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/perception/hrbrmstr/htmlunit"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
scrapy/scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
Altimis/Scweet
A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers,...
lexiforest/curl_cffi
Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser...
plabayo/rama
modular service framework to move and transform network packets
scrapinghub/spidermon
Scrapy Extension for monitoring spiders execution.