onnx/neural-compressor
Model compression for ONNX
This tool helps AI practitioners make their large language models (LLMs) and other AI models run faster and use less memory without losing accuracy. You provide an existing ONNX model, and it outputs a 'quantized' version that's more efficient. This is for machine learning engineers and data scientists deploying models, especially on Intel hardware.
Use this if you need to optimize the performance of your ONNX-compatible AI models, particularly LLMs like Llama2/3, on Intel processors without sacrificing model accuracy.
Not ideal if your models are not in the ONNX format or if you primarily work with hardware other than Intel CPUs.
Stars
99
Forks
9
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 01, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/onnx/neural-compressor"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
open-mmlab/mmengine
OpenMMLab Foundational Library for Training Deep Learning Models
Xilinx/brevitas
Brevitas: neural network quantization in PyTorch
google/qkeras
QKeras: a quantization deep learning library for Tensorflow Keras
fastmachinelearning/qonnx
QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX
tensorflow/model-optimization
A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization...