leliuga/cohere-configurations
Co:Here Inference configurations
This project provides pre-configured settings for running various Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) locally on your own machine. It takes a model identifier and a variant (like 'Llama-2-7B-32K-Instruct:Q4_0') as input and outputs a running inference service for that model. This is for AI practitioners, researchers, or developers who want to experiment with or deploy different models locally without complex setup.
No commits in the last 6 months.
Use this if you need to quickly get a wide range of popular LLMs or MLLMs up and running for local inference with minimal configuration, especially for experimentation or testing.
Not ideal if you need a cloud-based LLM inference solution, or if you're not comfortable using containerization tools like Docker, Podman, or Nerdctl.
Stars
10
Forks
2
Language
Go
License
MPL-2.0
Category
Last pushed
May 27, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/leliuga/cohere-configurations"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ModelCloud/GPTQModel
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD...
intel/auto-round
🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality...
pytorch/ao
PyTorch native quantization and sparsity for training and inference
bodaay/HuggingFaceModelDownloader
Simple go utility to download HuggingFace Models and Datasets
NVIDIA/kvpress
LLM KV cache compression made easy