Llm Inference Serving Transformer Models

There are 23 llm inference serving models tracked. 1 score above 70 (verified tier). The highest-rated is PaddlePaddle/FastDeploy at 73/100 with 3,659 stars. 3 of the top 10 are actively maintained.

Get all 23 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-inference-serving&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	PaddlePaddle/FastDeploy High-performance Inference and Deployment Toolkit for LLMs and VLMs based on...	73	Verified	3,659	Python
2	mlc-ai/mlc-llm Universal LLM Deployment Engine with ML Compilation	62	Established	22,185	Python
3	skyzh/tiny-llm A course of learning LLM inference serving on Apple Silicon for systems...	57	Established	3,935	Python
4	ServerlessLLM/ServerlessLLM Serverless LLM Serving for Everyone.	54	Established	663	Python
5	AXERA-TECH/ax-llm Explore LLM model deployment based on AXera's AI chips	53	Established	142	C++
6	AmpereComputingAI/ampere_model_library AML's goal is to make benchmarking of various AI architectures on Ampere...	49	Emerging	23	Python
7	VectorInstitute/vector-inference Efficient LLM inference on Slurm clusters.	49	Emerging	95	Python
8	replit/ReplitLM Inference code and configs for the ReplitLM model family	46	Emerging	1,042	Python
9	pytorch/torchchat Run PyTorch LLMs locally on servers, desktop and mobile	44	Emerging	3,625	Python
10	datawhalechina/llm-deploy 大模型/LLM推理和部署理论与实践	39	Emerging	381	—
11	asprenger/ray_vllm_inference A simple service that integrates vLLM with Ray Serve for fast and scalable...	39	Emerging	78	Python
12	justADeni/intel-npu-llm A simple Python script for running LLMs on Intel's Neural Processing Units (NPUs)	37	Emerging	35	Python
13	snapllm/snapllm 🔥 🔥 Alternative to Ollama 🔥 🔥 multi-model <1ms LLM switching	37	Emerging	32	C++
14	ray-project/ray-llm RayLLM - LLMs on Ray (Archived). Read README for more info.	36	Emerging	1,267	—
15	hpdps-group/ElasticMM ElasticMM: Elastic and Efficient MLLM Serving System	34	Emerging	41	Python
16	tmcarmichael/fabricai-inference-server A hackable, modular, containerized inference server for deploying large...	32	Emerging	6	Python
17	bentoml/transformers-nlp-service Online Inference API for NLP Transformer models - summarization, text...	31	Emerging	45	Python
18	Notnaton/microllm My own implementation to run inference on local LLM models	28	Experimental	8	Python
19	lix19937/llm-deploy AI Infra LLM infer/ tensorrt-llm/ vllm	28	Experimental	22	Python
20	g1ibby/llm-deploy Tool to manage ollama model on vast.ai	27	Experimental	19	Python
21	sajidkhan2067/LLMOnAWS Deploy smaller LLM on AWS Lambda: Phi-2, cost-effective language model	25	Experimental	8	Shell
22	jaslatendresse/llm-demo This repository demonstrates how to do inference using llama.cpp on a...	24	Experimental	5	Python
23	ahmadalsharef994/deploy_llm_on_aws_sagemaker Step-by-step Jupyter notebooks to deploy large language models on AWS...	10	Experimental	2	Jupyter Notebook

Comparisons in this category

mlc-llm and llm-deploy (62 vs 28) FastDeploy and llm-deploy (73 vs 28) ray_vllm_inference and ray-llm (39 vs 36)