insoochung/transformer_bcq

BCQ tutorial for transformers

23
/ 100
Experimental

This project helps machine learning engineers and researchers make large language models, specifically transformer-based models, much smaller and faster without losing accuracy. It takes an existing full-precision transformer model and outputs a significantly compressed version that uses far less memory and can run more efficiently on devices with limited resources. The end-user persona is an ML engineer or researcher who deploys or optimizes transformer models.

No commits in the last 6 months.

Use this if you need to reduce the memory footprint and speed up inference for a transformer model, especially for deployment on edge devices or in resource-constrained environments.

Not ideal if you need a complete, production-ready solution for on-device quantized inference with full optimizations like caching, as this project is a starting point for the quantization method.

model-compression on-device-AI natural-language-processing deep-learning-optimization transformer-deployment
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 8 / 25
Community 9 / 25

How are scores calculated?

Stars

17

Forks

2

Language

Python

License

Last pushed

Jul 17, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/insoochung/transformer_bcq"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.