code4craft/webmagic
A scalable web crawler framework for Java.
This framework helps developers quickly build custom web crawlers to collect specific data from websites. You define what information you need and from which pages, and it provides the tools to download, manage URLs, extract content, and save the data. It's designed for software developers who need to automate data collection from the web for various applications.
11,699 stars.
Use this if you are a Java developer needing to build a custom, scalable web crawler to systematically extract specific data from multiple web pages.
Not ideal if you need a no-code solution for web scraping or if your project isn't built with Java.
Stars
11,699
Forks
4,149
Language
Java
License
Apache-2.0
Category
Last pushed
Dec 20, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/perception/code4craft/webmagic"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
scrapy/scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
Altimis/Scweet
A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers,...
lexiforest/curl_cffi
Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser...
plabayo/rama
modular service framework to move and transform network packets
scrapinghub/spidermon
Scrapy Extension for monitoring spiders execution.