VITA-Group/Data-Efficient-Scaling
[ICML 2023] "Data Efficient Neural Scaling Law via Model Reusing" by Peihao Wang, Rameswar Panda, Zhangyang Wang
This project helps machine learning researchers and practitioners efficiently train very large language models even when there isn't enough training data. It takes smaller, pre-trained models and uses them to kickstart the training of much larger models. The output is a large transformer model that performs well despite data scarcity.
No commits in the last 6 months.
Use this if you are developing or training large language models (like BERT or RoBERTa) and are concerned about their performance when you have a limited amount of training data.
Not ideal if you are working with smaller models or have abundant training data for your large model, as the primary benefit is addressing data scarcity for gigantic models.
Stars
14
Forks
—
Language
Python
License
—
Category
Last pushed
Jan 04, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/VITA-Group/Data-Efficient-Scaling"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
jncraton/languagemodels
Explore large language models in 512MB of RAM
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
haizelabs/verdict
Inference-time scaling for LLMs-as-a-judge.
albertan017/LLM4Decompile
Reverse Engineering: Decompiling Binary Code with Large Language Models
bytedance/Sa2VA
Official Repo For Pixel-LLM Codebase