intel/neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

74
/ 100
Verified

This is a tool for machine learning engineers and AI researchers who are working with large deep learning models. It helps reduce the size and computational demands of models like Large Language Models (LLMs) and Vision-Language Models (VLMs). You input a trained deep learning model, and it outputs a more compact, faster-running version of that model, ready for deployment on various Intel hardware, as well as some AMD, ARM, and Nvidia platforms.

2,597 stars. Used by 1 other package. Actively maintained with 14 commits in the last 30 days. Available on PyPI.

Use this if you need to optimize your large deep learning models for faster inference and reduced memory footprint, especially when deploying on Intel CPUs, GPUs, or Habana Gaudi AI accelerators.

Not ideal if you are not working with deep learning models or if your primary goal is not model compression for deployment on specialized AI hardware.

deep-learning-optimization model-deployment large-language-models AI-inference edge-AI
Maintenance 17 / 25
Adoption 11 / 25
Maturity 25 / 25
Community 21 / 25

How are scores calculated?

Stars

2,597

Forks

298

Language

Python

License

Apache-2.0

Last pushed

Mar 13, 2026

Commits (30d)

14

Dependencies

14

Reverse dependents

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/intel/neural-compressor"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.