VITA-Group/Data-Efficient-Scaling

[ICML 2023] "Data Efficient Neural Scaling Law via Model Reusing" by Peihao Wang, Rameswar Panda, Zhangyang Wang

/ 100

Experimental

This project helps machine learning researchers and practitioners efficiently train very large language models even when there isn't enough training data. It takes smaller, pre-trained models and uses them to kickstart the training of much larger models. The output is a large transformer model that performs well despite data scarcity.

No commits in the last 6 months.

Use this if you are developing or training large language models (like BERT or RoBERTa) and are concerned about their performance when you have a limited amount of training data.

Not ideal if you are working with smaller models or have abundant training data for your large model, as the primary benefit is addressing data scarcity for gigantic models.

large-language-models model-training natural-language-processing data-efficiency deep-learning-research

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

—

Higher-rated alternatives

jncraton/languagemodels

Explore large language models in 512MB of RAM

microsoft/unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

haizelabs/verdict

Inference-time scaling for LLMs-as-a-judge.

albertan017/LLM4Decompile

Reverse Engineering: Decompiling Binary Code with Large Language Models

bytedance/Sa2VA

Official Repo For Pixel-LLM Codebase

Explore Transformer Models

All categories Trending Transformer directory Insights