ray_vllm_inference and ray-llm
The former is a specific implementation utilizing vLLM on Ray Serve for scalable inference, whereas the latter appears to be a broader, now-archived project or framework that encompassed LLMs on Ray, making them ecosystem siblings where one potentially built upon or leveraged components from the other within the Ray ecosystem.
About ray_vllm_inference
asprenger/ray_vllm_inference
A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.
This service helps developers serve large language models (LLMs) quickly and efficiently. It takes an LLM from Hugging Face and serves it as an API endpoint, returning generated text based on prompts. This is for machine learning engineers or MLOps teams who need to deploy LLMs for applications requiring high throughput and responsiveness.
About ray-llm
ray-project/ray-llm
RayLLM - LLMs on Ray (Archived). Read README for more info.
Scores updated daily from GitHub, PyPI, and npm data. How scores work