higgsfield-ai/higgsfield
Fault-tolerant, highly scalable GPU orchestration, and a machine learning framework designed for training models with billions to trillions of parameters
This project helps machine learning engineers and researchers efficiently train extremely large AI models, like large language models (LLMs), across multiple GPUs and servers. It takes your Python training code and manages the allocation of computational resources, monitors training progress, and handles fault tolerance. The output is a fully trained, multi-trillion parameter model ready for deployment or further experimentation.
3,558 stars. No commits in the last 6 months.
Use this if you are a machine learning engineer or researcher struggling with the complexity and resource management challenges of training massive deep learning models on distributed GPU infrastructure.
Not ideal if you are working with smaller models that can be trained on a single GPU or if you prefer manual orchestration of your distributed training jobs.
Stars
3,558
Forks
590
Language
Jupyter Notebook
License
Apache-2.0
Category
Last pushed
May 25, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/higgsfield-ai/higgsfield"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
alibaba/MNN
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...
xorbitsai/inference
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...
tensorzero/tensorzero
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...