model-optimization and neural-compressor
These are complementary tools that operate on different model formats—one optimizes TensorFlow/Keras models while the other compresses ONNX models—so users would choose based on their model framework rather than using them together.
About model-optimization
tensorflow/model-optimization
A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
This toolkit helps machine learning engineers and researchers make their trained Keras and TensorFlow models smaller and faster. It takes an existing, functional machine learning model and applies optimization techniques like quantization or pruning. The output is a more efficient model that performs similarly but requires less computational power and memory, ideal for deploying on devices with limited resources.
About neural-compressor
onnx/neural-compressor
Model compression for ONNX
This tool helps AI practitioners make their large language models (LLMs) and other AI models run faster and use less memory without losing accuracy. You provide an existing ONNX model, and it outputs a 'quantized' version that's more efficient. This is for machine learning engineers and data scientists deploying models, especially on Intel hardware.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work