Tandemn-Labs/tandemn-tuna
A hybrid router that uses Spot GPU instances to reduce costs and Serverless GPUs for making Cold Starts faster.
This project helps operations engineers and AI/ML platform teams manage the cost and performance of serving large language models (LLMs). It intelligently routes incoming model inference requests to either fast-starting, but expensive, serverless GPUs or cheaper, but slower and interruptible, spot GPUs. You get an OpenAI-compatible endpoint that balances low latency for new requests with cost savings for sustained traffic.
Available on PyPI.
Use this if you are deploying large language models or other GPU-intensive AI applications and need to optimize for both low latency during demand spikes and significant cost savings for ongoing inference workloads.
Not ideal if your GPU workloads are small, infrequent, or don't require the scale and complexity of managing hybrid GPU infrastructure.
Stars
22
Forks
4
Language
Python
License
MIT
Category
Last pushed
Mar 13, 2026
Commits (30d)
0
Dependencies
7
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/mlops/Tandemn-Labs/tandemn-tuna"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
mlflow/mlflow
The open source AI engineering platform. MLflow enables teams of all sizes to debug, evaluate,...
kitops-ml/kitops
An open source DevOps tool from the CNCF for packaging and versioning AI/ML models, datasets,...
aws-samples/mlops-e2e
MLOps End-to-End Example using Amazon SageMaker Pipeline, AWS CodePipeline and AWS CDK
tensorchord/envd
🏕️ Reproducible development environment for humans and agents
techiescamp/mlops-for-devops
MLOps for DevOps Engineers - A hands-on, project-based guide to Machine Learning Operations