peterbencze/serritor
Serritor is an open source web crawler framework built upon Selenium and written in Java. It can be used to crawl dynamic web pages that require JavaScript to render data.
Serritor is a web crawling framework for software developers who need to collect data from websites that rely on JavaScript for content. It takes a starting URL and a set of rules, then outputs the raw content from dynamically rendered web pages, allowing you to extract information that traditional crawlers might miss. This is for developers building custom web scraping or data collection tools.
No commits in the last 6 months.
Use this if you are a Java developer building a web crawler and need to interact with modern websites that extensively use JavaScript to display their content.
Not ideal if you are looking for a ready-to-use, no-code web scraping tool or if you are not comfortable with Java programming and Selenium.
Stars
32
Forks
14
Language
Java
License
Apache-2.0
Category
Last pushed
Jul 07, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/perception/peterbencze/serritor"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
seleniumbase/SeleniumBase
APIs for browser automation, testing, and bypassing bot-detection.
apify/crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers....
intoli/user-agents
A JavaScript library for generating random user agents with data that's updated daily.
apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In...
Kaliiiiiiiiii-Vinyzu/patchright
Undetected version of the Playwright testing and automation library.