OFA-Sys/DiverseEvol

Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning

24
/ 100
Experimental

This project helps machine learning engineers efficiently train large language models (LLMs) by intelligently selecting the most impactful training data. You provide a large instruction dataset and an LLM, and it outputs a smaller, highly diverse subset of that data and an instruction-tuned LLM that performs as well or better than models trained on the full dataset. This is for professionals building and deploying custom LLMs who need to optimize training time and resources.

No commits in the last 6 months.

Use this if you are developing custom large language models and want to significantly reduce the data volume and computational cost of instruction tuning without sacrificing performance.

Not ideal if you are a casual user of off-the-shelf LLMs or do not have the technical expertise to manage model training environments and configurations.

LLM-development model-training-optimization data-sampling natural-language-processing machine-learning-engineering
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 8 / 25
Community 7 / 25

How are scores calculated?

Stars

86

Forks

4

Language

Python

License

Last pushed

Dec 14, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/OFA-Sys/DiverseEvol"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.