CYang828/datasetstation
快速下载中文数据集,处理数据集,数据分析、可视化分析,一站式解决数据问题
This tool helps data scientists and NLP practitioners quickly access and prepare Chinese language datasets for machine learning projects. You input the name of a specific dataset, and it provides cleaned and pre-processed text data ready for tasks like sentiment analysis or text classification. It's designed for anyone working with Chinese text who needs efficient data loading and basic transformation.
No commits in the last 6 months. Available on PyPI.
Use this if you frequently work with Chinese text data for machine learning and want to streamline the process of finding, downloading, and preparing datasets.
Not ideal if your primary need is for highly specialized data cleaning beyond basic transformations or if you only work with English datasets.
Stars
69
Forks
5
Language
Python
License
Apache-2.0
Category
Last pushed
Nov 15, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/CYang828/datasetstation"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
open-edge-platform/datumaro
Dataset Management Framework, a Python library and a CLI tool to build, analyze and manage...
explosion/ml-datasets
🌊 Machine learning dataset loaders for testing and example scripts
webdataset/webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with...
tensorflow/datasets
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
mlcommons/croissant
Croissant is a high-level format for machine learning datasets that brings together four rich layers.