mKaloer/TFServingCache

Distributed model cache for TF Serving

/ 100

Emerging

This project helps machine learning engineers and MLOps specialists manage and serve a very large number of TensorFlow models in production without incurring prohibitive memory costs. It acts as a smart load balancer, taking prediction requests for any of your models, and dynamically loading or unloading them from TensorFlow Serving instances as needed. This allows you to serve many models, like one model per user, even when each model is large.

No commits in the last 6 months.

Use this if you need to serve hundreds or thousands of TensorFlow models, each potentially large, where individual model usage is low but the aggregate memory requirement of serving all models simultaneously is too high.

Not ideal if you only have a few TensorFlow models to serve, or if all your models are small and can easily fit into memory on a standard TensorFlow Serving instance.

MLOps Model Serving TensorFlow Deployment Cost Optimization Scalable ML Infrastructure

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 16 / 25

How are scores calculated?

Stars

Forks

Language

License

Apache-2.0

Higher-rated alternatives

modelscope/modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

basetenlabs/truss

The simplest way to serve AI/ML models in production

Lightning-AI/LitServe

A minimal Python framework for building custom AI inference servers with full control over...

deepjavalibrary/djl-serving

A universal scalable machine learning model deployment solution

tensorflow/serving

A flexible, high-performance serving system for machine learning models

Explore ML Frameworks

All categories Trending ML Framework directory Insights