Llm Inference Serving Transformer Models

There are 23 llm inference serving models tracked. 1 score above 70 (verified tier). The highest-rated is PaddlePaddle/FastDeploy at 73/100 with 3,659 stars. 3 of the top 10 are actively maintained.

Get all 23 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-inference-serving&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 PaddlePaddle/FastDeploy

High-performance Inference and Deployment Toolkit for LLMs and VLMs based on...

73
Verified
2 mlc-ai/mlc-llm

Universal LLM Deployment Engine with ML Compilation

62
Established
3 skyzh/tiny-llm

A course of learning LLM inference serving on Apple Silicon for systems...

57
Established
4 ServerlessLLM/ServerlessLLM

Serverless LLM Serving for Everyone.

54
Established
5 AXERA-TECH/ax-llm

Explore LLM model deployment based on AXera's AI chips

53
Established
6 AmpereComputingAI/ampere_model_library

AML's goal is to make benchmarking of various AI architectures on Ampere...

49
Emerging
7 VectorInstitute/vector-inference

Efficient LLM inference on Slurm clusters.

49
Emerging
8 replit/ReplitLM

Inference code and configs for the ReplitLM model family

46
Emerging
9 pytorch/torchchat

Run PyTorch LLMs locally on servers, desktop and mobile

44
Emerging
10 datawhalechina/llm-deploy

大模型/LLM推理和部署理论与实践

39
Emerging
11 asprenger/ray_vllm_inference

A simple service that integrates vLLM with Ray Serve for fast and scalable...

39
Emerging
12 justADeni/intel-npu-llm

A simple Python script for running LLMs on Intel's Neural Processing Units (NPUs)

37
Emerging
13 snapllm/snapllm

🔥 🔥 Alternative to Ollama 🔥 🔥 multi-model <1ms LLM switching

37
Emerging
14 ray-project/ray-llm

RayLLM - LLMs on Ray (Archived). Read README for more info.

36
Emerging
15 hpdps-group/ElasticMM

ElasticMM: Elastic and Efficient MLLM Serving System

34
Emerging
16 tmcarmichael/fabricai-inference-server

A hackable, modular, containerized inference server for deploying large...

32
Emerging
17 bentoml/transformers-nlp-service

Online Inference API for NLP Transformer models - summarization, text...

31
Emerging
18 Notnaton/microllm

My own implementation to run inference on local LLM models

28
Experimental
19 lix19937/llm-deploy

AI Infra LLM infer/ tensorrt-llm/ vllm

28
Experimental
20 g1ibby/llm-deploy

Tool to manage ollama model on vast.ai

27
Experimental
21 sajidkhan2067/LLMOnAWS

Deploy smaller LLM on AWS Lambda: Phi-2, cost-effective language model

25
Experimental
22 jaslatendresse/llm-demo

This repository demonstrates how to do inference using llama.cpp on a...

24
Experimental
23 ahmadalsharef994/deploy_llm_on_aws_sagemaker

Step-by-step Jupyter notebooks to deploy large language models on AWS...

10
Experimental