leliuga/cohere-configurations

Co:Here Inference configurations

/ 100

Emerging

This project provides pre-configured settings for running various Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) locally on your own machine. It takes a model identifier and a variant (like 'Llama-2-7B-32K-Instruct:Q4_0') as input and outputs a running inference service for that model. This is for AI practitioners, researchers, or developers who want to experiment with or deploy different models locally without complex setup.

No commits in the last 6 months.

Use this if you need to quickly get a wide range of popular LLMs or MLLMs up and running for local inference with minimal configuration, especially for experimentation or testing.

Not ideal if you need a cloud-based LLM inference solution, or if you're not comfortable using containerization tools like Docker, Podman, or Nerdctl.

Large Language Models Multimodal AI AI Experimentation Model Deployment Natural Language Processing

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

License

MPL-2.0

Higher-rated alternatives

ModelCloud/GPTQModel

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD...

intel/auto-round

🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality...

pytorch/ao

PyTorch native quantization and sparsity for training and inference

bodaay/HuggingFaceModelDownloader

Simple go utility to download HuggingFace Models and Datasets

NVIDIA/kvpress

LLM KV cache compression made easy

Explore Transformer Models

All categories Trending Transformer directory Insights