Instruction Tuning Datasets LLM Tools

Datasets, papers, and resources specifically for instruction tuning and instruction-following in LLMs. Does NOT include general fine-tuning methods, evaluation benchmarks, or model inference tools.

There are 19 instruction tuning datasets tools tracked. 1 score above 50 (established tier). The highest-rated is MantisAI/sieves at 55/100 with 125 stars.

Get all 19 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=instruction-tuning-datasets&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	MantisAI/sieves Plug-and-play document AI with zero-shot models.	55	Established	125	Python
2	xiaoya-li/Instruction-Tuning-Survey Project for the paper entitled `Instruction Tuning for Large Language...	44	Emerging	230	—
3	TencentARC-QQ/TagGPT TagGPT: Large Language Models are Zero-shot Multimodal Taggers	35	Emerging	66	Python
4	rafaelpierre/bullet bullet: A Zero-Shot / Few-Shot Learning, LLM Based, text classification framework	35	Emerging	12	Jupyter Notebook
5	amazon-science/adaptive-in-context-learning AdaICL: Which Examples to Annotate of In-Context Learning? Towards Effective...	34	Emerging	20	Python
6	andrewzamai/SLIMER_IT An Instruction-tuned LLM for zero-shot NER on Italian	33	Emerging	4	Jupyter Notebook
7	princeton-pli/STAT Skill-Targeted Adaptive Training	32	Emerging	16	Python
8	LIN-SHANG/InstructERC The offical realization of InstructERC	31	Emerging	148	Python
9	OpenGVLab/Instruct2Act Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with...	30	Emerging	373	Python
10	Lichang-Chen/InstructZero Official Implementation of InstructZero; the first framework to optimize bad...	30	Emerging	199	Python
11	raunak-agarwal/instruction-datasets Datasets for Instruction Tuning of Large Language Models	28	Experimental	261	—
12	basicv8vc/chinese-instruction-datasets-for-llms 用于微调LLM的中文指令数据集	27	Experimental	29	—
13	snowood1/Zero-Shot-PLOVER Leveraging Codebook Knowledge with NLI and ChatGPT for Zero-Shot Political...	25	Experimental	6	Jupyter Notebook
14	A-baoYang/instruction-finetune-datasets Collect and maintain high quality instruction finetune datasets in different...	22	Experimental	20	—
15	Reason-Wang/notable-instruction-llm The repo collects model and data projects for instruction following large...	21	Experimental	1	—
16	Showndarya/Few-Shot-ChatGPT Zero-Shot and Few-shot learning method using ChatGPT on problem sets	20	Experimental	5	Jupyter Notebook
17	orionw/FollowIR FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions	16	Experimental	52	Python
18	DeperiasKerre/qpInstruct Instruction Dataset for QCL properties Extraction from Text	13	Experimental	—	Python
19	davor10105/laat Use LLMs as training regularizers for small, differentiable models and...	11	Experimental	1	Python