carbonz0/alpaca-chinese-dataset
alpaca中文指令微调数据集
This project provides a dataset for anyone looking to train or fine-tune AI models to better understand and generate responses in Chinese. It takes common instructions and examples, translated or generated, and produces a structured collection of Chinese instruction-response pairs. This is ideal for AI researchers, language model developers, or data scientists working on Chinese natural language processing applications.
397 stars. No commits in the last 6 months.
Use this if you need a high-quality, pre-structured dataset of Chinese instructions and corresponding outputs to teach an AI model how to follow commands in Chinese.
Not ideal if you are looking for a dataset of general Chinese text for tasks like sentiment analysis or machine translation without an instruction-following component.
Stars
397
Forks
24
Language
—
License
—
Category
Last pushed
Mar 26, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/carbonz0/alpaca-chinese-dataset"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
axolotl-ai-cloud/axolotl
Go ahead and axolotl questions
google/paxml
Pax is a Jax-based machine learning framework for training large scale models. Pax allows for...
JosefAlbers/PVM
Phi-3.5 for Mac: Locally-run Vision and Language Models for Apple Silicon
iamarunbrahma/finetuned-qlora-falcon7b-medical
Finetuning of Falcon-7B LLM using QLoRA on Mental Health Conversational Dataset
h2oai/h2o-wizardlm
Open-Source Implementation of WizardLM to turn documents into Q:A pairs for LLM fine-tuning