raunak-agarwal/instruction-datasets

Datasets for Instruction Tuning of Large Language Models

28
/ 100
Experimental

This is a curated collection of specialized datasets designed to fine-tune Large Language Models (LLMs) to follow instructions more accurately. It provides a wide array of textual and multimodal data, ranging from conversational exchanges to task-specific prompts in multiple languages, which go into training an LLM. The output is an LLM that is better at understanding and executing complex instructions. This resource is for AI researchers and machine learning engineers who are building and improving LLMs for various applications.

261 stars. No commits in the last 6 months.

Use this if you are a machine learning engineer or researcher looking for high-quality, pre-processed datasets to train or fine-tune large language models to better understand and follow user instructions.

Not ideal if you are an end-user simply looking to use an existing language model or if you require datasets for traditional machine learning tasks outside of LLM instruction tuning.

LLM training NLP research AI model fine-tuning conversational AI machine learning engineering
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 10 / 25

How are scores calculated?

Stars

261

Forks

13

Language

License

Last pushed

Nov 30, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/raunak-agarwal/instruction-datasets"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.