Tandemn-Labs/tandemn-tuna

A hybrid router that uses Spot GPU instances to reduce costs and Serverless GPUs for making Cold Starts faster.

50
/ 100
Established

This project helps operations engineers and AI/ML platform teams manage the cost and performance of serving large language models (LLMs). It intelligently routes incoming model inference requests to either fast-starting, but expensive, serverless GPUs or cheaper, but slower and interruptible, spot GPUs. You get an OpenAI-compatible endpoint that balances low latency for new requests with cost savings for sustained traffic.

Available on PyPI.

Use this if you are deploying large language models or other GPU-intensive AI applications and need to optimize for both low latency during demand spikes and significant cost savings for ongoing inference workloads.

Not ideal if your GPU workloads are small, infrequent, or don't require the scale and complexity of managing hybrid GPU infrastructure.

MLOps LLM deployment GPU cost optimization AI infrastructure Model serving
Maintenance 10 / 25
Adoption 6 / 25
Maturity 20 / 25
Community 14 / 25

How are scores calculated?

Stars

22

Forks

4

Language

Python

License

MIT

Category

mlops-end-to-end

Last pushed

Mar 13, 2026

Commits (30d)

0

Dependencies

7

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/mlops/Tandemn-Labs/tandemn-tuna"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.