mKaloer/TFServingCache

Distributed model cache for TF Serving

39
/ 100
Emerging

This project helps machine learning engineers and MLOps specialists manage and serve a very large number of TensorFlow models in production without incurring prohibitive memory costs. It acts as a smart load balancer, taking prediction requests for any of your models, and dynamically loading or unloading them from TensorFlow Serving instances as needed. This allows you to serve many models, like one model per user, even when each model is large.

No commits in the last 6 months.

Use this if you need to serve hundreds or thousands of TensorFlow models, each potentially large, where individual model usage is low but the aggregate memory requirement of serving all models simultaneously is too high.

Not ideal if you only have a few TensorFlow models to serve, or if all your models are small and can easily fit into memory on a standard TensorFlow Serving instance.

MLOps Model Serving TensorFlow Deployment Cost Optimization Scalable ML Infrastructure
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 16 / 25

How are scores calculated?

Stars

25

Forks

6

Language

Go

License

Apache-2.0

Last pushed

Feb 03, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/mKaloer/TFServingCache"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.