mKaloer/TFServingCache
Distributed model cache for TF Serving
This project helps machine learning engineers and MLOps specialists manage and serve a very large number of TensorFlow models in production without incurring prohibitive memory costs. It acts as a smart load balancer, taking prediction requests for any of your models, and dynamically loading or unloading them from TensorFlow Serving instances as needed. This allows you to serve many models, like one model per user, even when each model is large.
No commits in the last 6 months.
Use this if you need to serve hundreds or thousands of TensorFlow models, each potentially large, where individual model usage is low but the aggregate memory requirement of serving all models simultaneously is too high.
Not ideal if you only have a few TensorFlow models to serve, or if all your models are small and can easily fit into memory on a standard TensorFlow Serving instance.
Stars
25
Forks
6
Language
Go
License
Apache-2.0
Category
Last pushed
Feb 03, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/mKaloer/TFServingCache"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
modelscope/modelscope
ModelScope: bring the notion of Model-as-a-Service to life.
basetenlabs/truss
The simplest way to serve AI/ML models in production
Lightning-AI/LitServe
A minimal Python framework for building custom AI inference servers with full control over...
deepjavalibrary/djl-serving
A universal scalable machine learning model deployment solution
tensorflow/serving
A flexible, high-performance serving system for machine learning models