ddickmann/vllm-factory

Production inference for encoder models - ColBERT, GLiNER, ColPali, embeddings etc. - as vLLM plugins for online and in-process deployment

42
/ 100
Emerging

This project helps data scientists and ML engineers efficiently deploy specialized AI models like ColBERT, GLiNER, and other structured prediction models for real-time use. You provide text or images, and it quickly returns relevant search results, extracted entities, or linked information. This is for professionals managing AI applications who need to serve these models at high volumes.

Available on PyPI.

Use this if you need to serve advanced encoder-based AI models like retrieval or entity extraction models in a production environment with high throughput and low latency.

Not ideal if you are working with large language models (LLMs) that generate long text sequences, as this is optimized for encoder models that produce embeddings or structured outputs.

information-retrieval named-entity-recognition machine-learning-operations search-systems multimodal-ai
Maintenance 13 / 25
Adoption 6 / 25
Maturity 18 / 25
Community 5 / 25

How are scores calculated?

Stars

15

Forks

1

Language

Python

License

Apache-2.0

Category

server

Last pushed

Apr 02, 2026

Commits (30d)

0

Dependencies

6

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/ddickmann/vllm-factory"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.