codewithdark-git/QuantLLM
QuantLLM is a Python library designed for developers, researchers, and teams who want to fine-tune and deploy large language models (LLMs) efficiently using 4-bit and 8-bit quantization techniques.
This is a tool for developers who work with large language models. It simplifies the process of making these models smaller and faster using quantization techniques, and helps you adapt them for specific tasks. You provide a large language model and your own data, and it outputs an optimized, fine-tuned model ready for deployment on various platforms. Machine learning engineers, AI researchers, and MLOps teams would use this.
Use this if you need to efficiently fine-tune and deploy large language models with reduced memory footprint and improved inference speed across different hardware and deployment environments.
Not ideal if you are not a developer and lack experience with Python and machine learning workflows, or if you primarily need a no-code solution for using pre-trained models.
Stars
13
Forks
1
Language
Python
License
MIT
Category
Last pushed
Dec 21, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/codewithdark-git/QuantLLM"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ModelCloud/GPTQModel
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD...
intel/auto-round
🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality...
pytorch/ao
PyTorch native quantization and sparsity for training and inference
bodaay/HuggingFaceModelDownloader
Simple go utility to download HuggingFace Models and Datasets
NVIDIA/kvpress
LLM KV cache compression made easy