Tandemn-Labs/tandemn-tuna

A hybrid router that uses Spot GPU instances to reduce costs and Serverless GPUs for making Cold Starts faster.

/ 100

Established

This project helps operations engineers and AI/ML platform teams manage the cost and performance of serving large language models (LLMs). It intelligently routes incoming model inference requests to either fast-starting, but expensive, serverless GPUs or cheaper, but slower and interruptible, spot GPUs. You get an OpenAI-compatible endpoint that balances low latency for new requests with cost savings for sustained traffic.

Available on PyPI.

Use this if you are deploying large language models or other GPU-intensive AI applications and need to optimize for both low latency during demand spikes and significant cost savings for ongoing inference workloads.

Not ideal if your GPU workloads are small, infrequent, or don't require the scale and complexity of managing hybrid GPU infrastructure.

MLOps LLM deployment GPU cost optimization AI infrastructure Model serving

Maintenance 10 / 25

Adoption 6 / 25

Maturity 20 / 25

Community 14 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Related tools

mlflow/mlflow

The open source AI engineering platform. MLflow enables teams of all sizes to debug, evaluate,...

kitops-ml/kitops

An open source DevOps tool from the CNCF for packaging and versioning AI/ML models, datasets,...

aws-samples/mlops-e2e

MLOps End-to-End Example using Amazon SageMaker Pipeline, AWS CodePipeline and AWS CDK

tensorchord/envd

🏕️ Reproducible development environment for humans and agents

techiescamp/mlops-for-devops

MLOps for DevOps Engineers - A hands-on, project-based guide to Machine Learning Operations

Explore MLOps Tools

All categories Trending MLOps directory Insights