higgsfield-ai/higgsfield

Fault-tolerant, highly scalable GPU orchestration, and a machine learning framework designed for training models with billions to trillions of parameters

49
/ 100
Emerging

This project helps machine learning engineers and researchers efficiently train extremely large AI models, like large language models (LLMs), across multiple GPUs and servers. It takes your Python training code and manages the allocation of computational resources, monitors training progress, and handles fault tolerance. The output is a fully trained, multi-trillion parameter model ready for deployment or further experimentation.

3,558 stars. No commits in the last 6 months.

Use this if you are a machine learning engineer or researcher struggling with the complexity and resource management challenges of training massive deep learning models on distributed GPU infrastructure.

Not ideal if you are working with smaller models that can be trained on a single GPU or if you prefer manual orchestration of your distributed training jobs.

large-language-models distributed-training GPU-orchestration deep-learning-research ML-infrastructure
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 23 / 25

How are scores calculated?

Stars

3,558

Forks

590

Language

Jupyter Notebook

License

Apache-2.0

Last pushed

May 25, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/higgsfield-ai/higgsfield"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.