OpenLMLab/MOSS_Vortex

Moss Vortex is a lightweight and high-performance deployment and inference backend engineered specifically for MOSS 003, providing a wealth of features aimed at enhancing performance and functionality, built upon the foundations of MOSEC and Torch.

/ 100

Emerging

This project helps AI developers and researchers quickly set up and run the MOSS 003 language model on their own GPU servers. It takes user prompts as input and efficiently streams back AI-generated text responses, supporting continuous conversations and custom tool integrations. The end-user persona is a machine learning engineer or AI practitioner responsible for deploying and managing large language models.

No commits in the last 6 months.

Use this if you need a fast and lightweight server to deploy the MOSS 003 model for real-time text generation and interactive AI applications.

Not ideal if your application heavily relies on advanced token batching for complex LLM reasoning tasks, as this feature is not yet fully implemented.

AI deployment large language models model inference AI infrastructure real-time AI

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

AGPL-3.0

Higher-rated alternatives

jundot/omlx

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the...

josStorer/RWKV-Runner

A RWKV management and startup tool, full automation, only 8MB. And provides an interface...

waybarrios/vllm-mlx

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models...

jordanhubbard/nanolang

A tiny experimental language designed to be targeted by coding LLMs

akivasolutions/tightwad

Pool your CUDA + ROCm GPUs into one OpenAI-compatible API. Speculative decoding proxy gives you...

Explore LLM Tools

All categories Trending LLM Tool directory Insights