OFA-Sys/InsTag
InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning
This tool helps researchers and AI engineers analyze and improve the data used to fine-tune large language models (LLMs). It takes existing LLM training datasets and tags individual user queries based on their meaning and intent. The output provides insights into the diversity and complexity of the dataset, helping users select high-quality data subsets to train more capable LLMs.
285 stars. No commits in the last 6 months.
Use this if you are a researcher or AI engineer focused on enhancing large language model performance by carefully curating and understanding your supervised fine-tuning (SFT) datasets.
Not ideal if you are looking for a tool to perform general data cleaning or to fine-tune models without needing deep insights into dataset diversity and complexity.
Stars
285
Forks
8
Language
—
License
—
Category
Last pushed
Aug 20, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/OFA-Sys/InsTag"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
TsinghuaC3I/MARTI
A Framework for LLM-based Multi-Agent Reinforced Training and Inference
zjunlp/KnowLM
An Open-sourced Knowledgable Large Language Model Framework.
cli99/llm-analysis
Latency and Memory Analysis of Transformer Models for Training and Inference
tanyuqian/redco
NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to...
stanleylsx/llms_tool
一个基于HuggingFace开发的大语言模型训练、测试工具。支持各模型的webui、终端预测,低参数量及全参数模型训练(预训练、SFT、RM、PPO、DPO)和融合、量化。