pytorch/ao

PyTorch native quantization and sparsity for training and inference

71
/ 100
Verified

This project helps machine learning engineers and researchers make large language models (LLMs) and other deep learning models run much faster and use less memory. By optimizing models, you can get significantly quicker results for both training and deploying your models, which is crucial when working with powerful but resource-intensive models like Llama. You'll put in a PyTorch model and get out a more efficient, optimized version ready for faster use or training.

2,729 stars. Actively maintained with 121 commits in the last 30 days.

Use this if you need to accelerate model training or inference, reduce memory consumption, and deploy large models more efficiently.

Not ideal if you are working with small models where performance optimization isn't a primary concern or if you need to maintain absolute floating-point precision without any trade-offs.

large-language-models deep-learning-optimization model-deployment machine-learning-engineering AI-inference
No Package No Dependents
Maintenance 22 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 23 / 25

How are scores calculated?

Stars

2,729

Forks

456

Language

Python

License

Last pushed

Mar 13, 2026

Commits (30d)

121

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/pytorch/ao"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.