NoakLiu/LLMEasyQuant
A Serving System for Distributed and Parallel LLM Quantization [Efficient ML System]
This project helps machine learning engineers and researchers make large language models (LLMs) run faster and use less memory without losing much accuracy. It takes an existing LLM, applies various compression techniques, and outputs a more efficient model ready for deployment. The primary users are those working on deploying LLMs to production or research environments where computational resources are a concern.
No commits in the last 6 months.
Use this if you need to optimize the performance and reduce the memory footprint of your Large Language Models for efficient deployment or research.
Not ideal if you are looking for a no-code solution or primarily work with traditional machine learning models outside of the LLM space.
Stars
26
Forks
1
Language
Python
License
—
Category
Last pushed
Jun 18, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/NoakLiu/LLMEasyQuant"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
bitsandbytes-foundation/bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
intel/neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model...
dropbox/hqq
Official implementation of Half-Quadratic Quantization (HQQ)
OpenGVLab/OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
Hsu1023/DuQuant
[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger...