hpcaitech/Elixir

Elixir: Train a Large Language Model on a Small GPU Cluster

/ 100

Experimental

This project helps machine learning engineers train very large language models efficiently, even with a smaller cluster of GPUs. It takes your model and optimizer configurations and automatically determines the most memory-efficient way to distribute parameters and manage memory across CPUs and GPUs. This is for ML engineers and researchers working on large-scale model training.

No commits in the last 6 months.

Use this if you need to train extremely large language models but are constrained by the memory capacity of your existing GPU cluster.

Not ideal if you are working with smaller models or already have access to a very large, high-memory GPU cluster.

large-language-models distributed-training GPU-optimization deep-learning-infrastructure ML-resource-management

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 8 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

NX-AI/xlstm

Official repository of the xLSTM.

sinanuozdemir/oreilly-hands-on-gpt-llm

Mastering the Art of Scalable and Efficient AI Model Deployment

DashyDashOrg/pandas-llm

Pandas-LLM

wxhcore/bumblecore

An LLM training framework built from the ground up, featuring a custom BumbleBee architecture...

MiniMax-AI/MiniMax-01

The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model &...

Explore Transformer Models

All categories Trending Transformer directory Insights