zhuang-li/SCAR
[ACL 2025 main] SCAR: Data Selection via Style Consistency-Aware Response Ranking for Efficient Instruction-Tuning of Large Language Models
This tool helps machine learning engineers and researchers select the best training data for fine-tuning large language models. You provide a list of instructions and their corresponding answers, and SCAR scores how 'style consistent' and beneficial each pair is. The output is a ranked list of instruction-answer pairs, allowing you to efficiently choose the most impactful data to improve your LLM's performance.
No commits in the last 6 months.
Use this if you are a machine learning engineer or researcher looking to improve the performance of your large language models by selecting the highest quality instruction-tuning data from a larger dataset.
Not ideal if your training data includes non-English examples or has duplicate entries, as these are not currently supported and require manual cleaning beforehand.
Stars
39
Forks
4
Language
Python
License
—
Category
Last pushed
Aug 06, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/zhuang-li/SCAR"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
DaoD/INTERS
This is the repository for our paper "INTERS: Unlocking the Power of Large Language Models in...
declare-lab/instruct-eval
This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca...
Haiyang-W/TokenFormer
[ICLR2025 Spotlightš„] Official Implementation of TokenFormer: Rethinking Transformer Scaling...
hkust-nlp/deita
Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
kehanlu/DeSTA2
Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model...