pytorch/ao
PyTorch native quantization and sparsity for training and inference
This project helps machine learning engineers and researchers make large language models (LLMs) and other deep learning models run much faster and use less memory. By optimizing models, you can get significantly quicker results for both training and deploying your models, which is crucial when working with powerful but resource-intensive models like Llama. You'll put in a PyTorch model and get out a more efficient, optimized version ready for faster use or training.
2,729 stars. Actively maintained with 121 commits in the last 30 days.
Use this if you need to accelerate model training or inference, reduce memory consumption, and deploy large models more efficiently.
Not ideal if you are working with small models where performance optimization isn't a primary concern or if you need to maintain absolute floating-point precision without any trade-offs.
Stars
2,729
Forks
456
Language
Python
License
—
Category
Last pushed
Mar 13, 2026
Commits (30d)
121
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/pytorch/ao"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
ModelCloud/GPTQModel
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD...
intel/auto-round
🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality...
bodaay/HuggingFaceModelDownloader
Simple go utility to download HuggingFace Models and Datasets
NVIDIA/kvpress
LLM KV cache compression made easy
BlinkDL/RWKV-LM
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly...