tanyuqian/redco
NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference
This tool helps machine learning engineers and researchers efficiently train and run large AI models, like those for generating images or understanding language, across multiple GPUs or TPUs. It takes your model code and data, then automates the complex setup for distributed processing, giving you faster training and inference results without needing specialized distributed systems expertise. It's for anyone developing or deploying large-scale AI applications.
No commits in the last 6 months. Available on PyPI.
Use this if you need to train or run large AI models (e.g., LLMs, Stable Diffusion) more quickly by distributing them across multiple GPUs or TPUs, without diving deep into complex distributed systems configurations.
Not ideal if you are working with small models that don't require distributed computing or if you prefer to manually configure every aspect of your distributed ML pipeline.
Stars
69
Forks
7
Language
Python
License
Apache-2.0
Category
Last pushed
Dec 09, 2024
Commits (30d)
0
Dependencies
3
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/tanyuqian/redco"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
TsinghuaC3I/MARTI
A Framework for LLM-based Multi-Agent Reinforced Training and Inference
zjunlp/KnowLM
An Open-sourced Knowledgable Large Language Model Framework.
cli99/llm-analysis
Latency and Memory Analysis of Transformer Models for Training and Inference
stanleylsx/llms_tool
一个基于HuggingFace开发的大语言模型训练、测试工具。支持各模型的webui、终端预测,低参数量及全参数模型训练(预训练、SFT、RM、PPO、DPO)和融合、量化。
slp-rl/slamkit
SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for...