touero/ctenopharyngodon-idella

Use the MapReduce's Java interface to distributed crawle the data of Chinese universities and learn basic knowledge of hdfs.

/ 100

Experimental

This project helps you gather publicly available data about Chinese universities, such as information often found on websites like '掌上高考' (Gaokao.cn). It takes URLs of university pages and extracts structured data, which can then be used for analysis or database population. This tool is for data engineers or researchers who need to collect large datasets from the web, specifically about educational institutions in China.

134 stars. No commits in the last 6 months.

Use this if you need to systematically collect and store comprehensive data from multiple Chinese university websites, especially those that use JavaScript to load content.

Not ideal if you're looking to crawl data from websites that are not Chinese universities, or if you don't have experience setting up and managing a distributed computing environment like Hadoop.

education-data web-scraping data-collection academic-research china-universities

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 2 / 25

How are scores calculated?

Stars

134

Forks

Language

Java

License

Apache-2.0

Featured in

Giving AI Agents Eyes: Browser Automation in 2026

Higher-rated alternatives

scrapy/scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

Altimis/Scweet

A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers,...

lexiforest/curl_cffi

Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser...

plabayo/rama

modular service framework to move and transform network packets

scrapinghub/spidermon

Scrapy Extension for monitoring spiders execution.

Explore Perception Tools

All categories Trending Perception directory Insights