JulesBelveze/bert-squeeze
🛠️ Tools for Transformers compression using PyTorch Lightning ⚡
This project helps machine learning engineers and data scientists deploy large language models more efficiently by reducing their size and speeding up their performance. It takes a pre-trained Transformer model and applies various optimization techniques like distillation, pruning, and quantization to output a smaller, faster model ready for production. This is for anyone who struggles with the computational demands of deploying sophisticated NLP models.
Use this if you need to deploy transformer-based models for tasks like text classification but are facing challenges with slow inference times or excessive memory usage.
Not ideal if you are looking for a general-purpose model compression library that works with non-transformer architectures or tasks beyond sequence classification.
Stars
85
Forks
10
Language
Python
License
—
Category
Last pushed
Feb 01, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/JulesBelveze/bert-squeeze"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Tongjilibo/bert4torch
An elegent pytorch implement of transformers
nyu-mll/jiant
jiant is an nlp toolkit
lonePatient/TorchBlocks
A PyTorch-based toolkit for natural language processing
monologg/JointBERT
Pytorch implementation of JointBERT: "BERT for Joint Intent Classification and Slot Filling"
grammarly/gector
Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite"...