ddickmann/vllm-factory

Production inference for encoder models - ColBERT, GLiNER, ColPali, embeddings etc. - as vLLM plugins for online and in-process deployment

/ 100

Emerging

This project helps data scientists and ML engineers efficiently deploy specialized AI models like ColBERT, GLiNER, and other structured prediction models for real-time use. You provide text or images, and it quickly returns relevant search results, extracted entities, or linked information. This is for professionals managing AI applications who need to serve these models at high volumes.

Available on PyPI.

Use this if you need to serve advanced encoder-based AI models like retrieval or entity extraction models in a production environment with high throughput and low latency.

Not ideal if you are working with large language models (LLMs) that generate long text sequences, as this is optimized for encoder models that produce embeddings or structured outputs.

information-retrieval named-entity-recognition machine-learning-operations search-systems multimodal-ai

Maintenance 13 / 25

Adoption 6 / 25

Maturity 18 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

byte5ai/palaia

Palaia — Local, crash-safe memory for AI agents. Semantic vector search...

j33pguy/magi

MAGI — Multi-Agent Graph Intelligence. Universal memory server for AI agents. MCP + gRPC + REST...

LLMSystems/TensorrtServer

A high-performance deep learning model inference server based on TensorRT, supporting fast...

abdullah85398/embedding-server

A high-performance, self-hosted, model-agnostic embedding service designed for LLM applications,...

alez007/yasha

Self-hosted, multi-model AI inference server. Run LLMs, TTS, STT, embeddings, and image...

Explore Embedding Tools

All categories Trending Embeddings directory Insights