touero/ctenopharyngodon-idella

Use the MapReduce's Java interface to distributed crawle the data of Chinese universities and learn basic knowledge of hdfs.

28
/ 100
Experimental

This project helps you gather publicly available data about Chinese universities, such as information often found on websites like 'ζŽŒδΈŠι«˜θ€ƒ' (Gaokao.cn). It takes URLs of university pages and extracts structured data, which can then be used for analysis or database population. This tool is for data engineers or researchers who need to collect large datasets from the web, specifically about educational institutions in China.

134 stars. No commits in the last 6 months.

Use this if you need to systematically collect and store comprehensive data from multiple Chinese university websites, especially those that use JavaScript to load content.

Not ideal if you're looking to crawl data from websites that are not Chinese universities, or if you don't have experience setting up and managing a distributed computing environment like Hadoop.

education-data web-scraping data-collection academic-research china-universities
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 2 / 25

How are scores calculated?

Stars

134

Forks

1

Language

Java

License

Apache-2.0

Category

scraper

Last pushed

Oct 16, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/perception/touero/ctenopharyngodon-idella"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.