wsmlby/homl

The easiest & fastest way to run LLMs in your home lab

/ 100

Emerging

HoML helps AI developers and researchers quickly set up and experiment with large language models (LLMs) on their own hardware. It takes models from Hugging Face Hub and provides an OpenAI-compatible API and interactive chat for testing. This tool is for individuals managing local LLM deployments, from concept to deployment.

Use this if you need an easy, high-performance way to run various LLMs locally for development, testing, or internal applications.

Not ideal if you need to run multiple LLMs concurrently on a single GPU or require support for non-CUDA architectures like Apple Silicon or ROCm right out of the box.

LLM deployment AI development local inference model experimentation machine learning engineering

No Package No Dependents

Maintenance 10 / 25

Adoption 9 / 25

Maturity 15 / 25

Community 6 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

containers/ramalama

RamaLama is an open-source developer tool that simplifies the local serving of AI models from...

av/harbor

One command brings a complete pre-wired LLM stack with hundreds of services to explore.

RunanywhereAI/runanywhere-sdks

Production ready toolkit to run AI locally

runpod-workers/worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

foldl/chatllm.cpp

Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)

Explore LLM Tools

All categories Trending LLM Tool directory Insights