onnx/neural-compressor

Model compression for ONNX

/ 100

Emerging

This tool helps AI practitioners make their large language models (LLMs) and other AI models run faster and use less memory without losing accuracy. You provide an existing ONNX model, and it outputs a 'quantized' version that's more efficient. This is for machine learning engineers and data scientists deploying models, especially on Intel hardware.

Use this if you need to optimize the performance of your ONNX-compatible AI models, particularly LLMs like Llama2/3, on Intel processors without sacrificing model accuracy.

Not ideal if your models are not in the ONNX format or if you primarily work with hardware other than Intel CPUs.

AI model deployment LLM optimization machine learning engineering model efficiency deep learning inference

No Package No Dependents

Maintenance 10 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Compare

neural-compressor and model-optimization

Higher-rated alternatives

open-mmlab/mmengine

OpenMMLab Foundational Library for Training Deep Learning Models

Xilinx/brevitas

Brevitas: neural network quantization in PyTorch

google/qkeras

QKeras: a quantization deep learning library for Tensorflow Keras

fastmachinelearning/qonnx

QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX

tensorflow/model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization...

Explore ML Frameworks

All categories Trending ML Framework directory Insights