Instruction Tuning Datasets Transformer Models

There are 33 instruction tuning datasets models tracked. The highest-rated is DaoD/INTERS at 47/100 with 207 stars.

Get all 33 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=instruction-tuning-datasets&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	DaoD/INTERS This is the repository for our paper "INTERS: Unlocking the Power of Large...	47	Emerging	207	Python
2	declare-lab/instruct-eval This repository contains code to quantitatively evaluate instruction-tuned...	41	Emerging	552	Python
3	Haiyang-W/TokenFormer [ICLR2025 Spotlight🔥] Official Implementation of TokenFormer: Rethinking...	41	Emerging	588	Python
4	hkust-nlp/deita Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]	40	Emerging	591	Python
5	kehanlu/DeSTA2 Code and model for ICASSP 2025 Paper "Developing Instruction-Following...	39	Emerging	123	HTML
6	TIGER-AI-Lab/VisualWebInstruct The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction...	36	Emerging	38	Python
7	zhilizju/Awesome-instruction-tuning A curated list of awesome instruction tuning datasets, models, papers and...	36	Emerging	347	Python
8	FengheTan9/LLM4Seg [MICCAI 2025] Official code for "Pre-Trained LLM is a Semantic-Aware and...	36	Emerging	51	Python
9	declare-lab/Auto-Scaling [Arxiv 2024] Official Implementation of the paper: "Towards Robust...	35	Emerging	9	Jupyter Notebook
10	RenzeLou/Muffin MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following	35	Emerging	16	Python
11	UCSC-REAL/TokenCleaning [ICML 2025] Official implementation of paper "Token Cleaning: Fine-Grained...	35	Emerging	51	Python
12	cxcscmu/Montessori-Instruct Official repository for Montessori-Instruct: Generate Influential Training...	34	Emerging	50	Python
13	18907305772/Explore-Instruct EMNLP'2023: Explore-Instruct: Enhancing Domain-Specific Instruction Coverage...	34	Emerging	5	Python
14	TamSiuhin/P2P source code for "Instant Personalized Large Language Model Adaptation via...	32	Emerging	9	Python
15	Shivanshu-Gupta/in-context-learning Easy in-context learning experiemnts with variety of datasets, LLMs, and...	32	Emerging	1	Python
16	hplt-project/monolingual-multilingual-instruction-tuning Monolingual or Multilingual Instruction Tuning: Which Makes a Better Alpaca	31	Emerging	9	Python
17	SinclairCoder/Instruction-Tuning-Papers Reading list of Instruction-tuning. A trend starts from Natrural-Instruction...	30	Emerging	766	—
18	gentaiscool/few-shot-lm The source code of "Language Models are Few-shot Multilingual Learners" (MRL...	29	Experimental	53	Python
19	HamedBabaei/author-profiling-pan2023 Symbol Team model for PAN@AP 2023 shared task on Profiling Cryptocurrency...	29	Experimental	1	Python
20	OSU-NLP-Group/QA4RE [ACL'23 Findings] "Aligning Instruction Tasks Unlocks Large Language Models...	28	Experimental	41	Python
21	liziniu/GEM Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large...	28	Experimental	52	Python
22	zhuang-li/SCAR [ACL 2025 main] SCAR: Data Selection via Style Consistency-Aware Response...	27	Experimental	39	Python
23	ZifanL/TSDS Implementation of TSDS: Data Selection for Task-Specific Model Finetuning....	27	Experimental	17	Python
24	yeyimilk/llm-zero-shot-classifiers Large Language Models are zero-shot text classifiers; Smart Expert System:...	26	Experimental	35	Jupyter Notebook
25	OpenDFM/HeadsUp [ICML 2025] Codes for the paper "Heads up! Large Language Models Can Perform...	25	Experimental	3	Jupyter Notebook
26	OFA-Sys/DiverseEvol Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning	24	Experimental	86	Python
27	LGDiMaggio/few-shot-fault-diagnosis-multimodal-LLM Few-shot bearing fault diagnosis using multimodal LLMs and prototypical networks	24	Experimental	4	Python
28	dkopi/Bitune Implementation of Bitune: Bidirectional Instruction-Tuning	24	Experimental	25	Python
29	MiuLab/InstUPR Source code of our paper "InstUPR: Instruction-based Unsupervised Passage...	23	Experimental	3	Python
30	mukhal/icl-ensembling [Me-FoMo ICLR 2023 - Oral] Exploring Demonstration Ensembling for In-context Learning	20	Experimental	5	Python
31	davidandym/Multitask-Transfer-Instruction-Tuning This is the official code repository for the ACL Findings Paper "Multi-Task...	17	Experimental	1	—
32	MK2112/conflicting-few-shots experiments on how conflicting few-shot examples affect emotion...	17	Experimental	—	Python
33	eriknomitch/reshaper A few shot data reshaper via LLMs (GPT-3)	11	Experimental	—	Jupyter Notebook