NoakLiu/LLMEasyQuant

A Serving System for Distributed and Parallel LLM Quantization [Efficient ML System]

21
/ 100
Experimental

This project helps machine learning engineers and researchers make large language models (LLMs) run faster and use less memory without losing much accuracy. It takes an existing LLM, applies various compression techniques, and outputs a more efficient model ready for deployment. The primary users are those working on deploying LLMs to production or research environments where computational resources are a concern.

No commits in the last 6 months.

Use this if you need to optimize the performance and reduce the memory footprint of your Large Language Models for efficient deployment or research.

Not ideal if you are looking for a no-code solution or primarily work with traditional machine learning models outside of the LLM space.

LLM deployment model optimization machine learning engineering AI research computational efficiency
No License Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 7 / 25
Maturity 8 / 25
Community 4 / 25

How are scores calculated?

Stars

26

Forks

1

Language

Python

License

Last pushed

Jun 18, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/NoakLiu/LLMEasyQuant"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.