JulesBelveze/bert-squeeze

🛠️ Tools for Transformers compression using PyTorch Lightning ⚡

40
/ 100
Emerging

This project helps machine learning engineers and data scientists deploy large language models more efficiently by reducing their size and speeding up their performance. It takes a pre-trained Transformer model and applies various optimization techniques like distillation, pruning, and quantization to output a smaller, faster model ready for production. This is for anyone who struggles with the computational demands of deploying sophisticated NLP models.

Use this if you need to deploy transformer-based models for tasks like text classification but are facing challenges with slow inference times or excessive memory usage.

Not ideal if you are looking for a general-purpose model compression library that works with non-transformer architectures or tasks beyond sequence classification.

natural-language-processing machine-learning-operations model-deployment text-classification deep-learning-optimization
No License No Package No Dependents
Maintenance 10 / 25
Adoption 9 / 25
Maturity 8 / 25
Community 13 / 25

How are scores calculated?

Stars

85

Forks

10

Language

Python

License

Last pushed

Feb 01, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/JulesBelveze/bert-squeeze"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.