VectorInstitute/vector-inference

Efficient LLM inference on Slurm clusters.

/ 100

Emerging

This tool helps researchers and AI practitioners efficiently deploy and manage large language models (LLMs) on Slurm-managed computing clusters. You provide a model name, and it gives you a URL endpoint to send inference requests to, making it easy to generate text or responses from the model. This is ideal for anyone who needs to run LLM inference at scale within a shared cluster environment.

Use this if you need to quickly get an LLM inference server running on a Slurm cluster and get a live URL to send requests to.

Not ideal if you are working with a single machine or do not have access to a Slurm-managed cluster.

AI-research LLM-deployment high-performance-computing cluster-management model-serving

No Package No Dependents

Maintenance 10 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 14 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

PaddlePaddle/FastDeploy

High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

mlc-ai/mlc-llm

Universal LLM Deployment Engine with ML Compilation

skyzh/tiny-llm

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny...

ServerlessLLM/ServerlessLLM

Serverless LLM Serving for Everyone.

AXERA-TECH/ax-llm

Explore LLM model deployment based on AXera's AI chips

Explore Transformer Models

All categories Trending Transformer directory Insights