OFA-Sys/InsTag

InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning

26
/ 100
Experimental

This tool helps researchers and AI engineers analyze and improve the data used to fine-tune large language models (LLMs). It takes existing LLM training datasets and tags individual user queries based on their meaning and intent. The output provides insights into the diversity and complexity of the dataset, helping users select high-quality data subsets to train more capable LLMs.

285 stars. No commits in the last 6 months.

Use this if you are a researcher or AI engineer focused on enhancing large language model performance by carefully curating and understanding your supervised fine-tuning (SFT) datasets.

Not ideal if you are looking for a tool to perform general data cleaning or to fine-tune models without needing deep insights into dataset diversity and complexity.

LLM fine-tuning AI model training dataset analysis natural language processing instruction following
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 8 / 25

How are scores calculated?

Stars

285

Forks

8

Language

License

Last pushed

Aug 20, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/OFA-Sys/InsTag"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.