MarsJacobs/kd-qat-large-enc

[EMNLP 2022 main] Code for "Understanding and Improving Knowledge Distillation for Quantization-Aware-Training of Large Transformer Encoders"

12
/ 100
Experimental

This project helps machine learning engineers and researchers optimize large transformer models for deployment on resource-constrained devices. It takes a pre-trained, full-precision BERT model and applies knowledge distillation during quantization-aware training to create a significantly smaller, ternary (3-bit) version while maintaining performance. The output is a highly compressed, efficient transformer model suitable for mobile or edge applications.

No commits in the last 6 months.

Use this if you need to deploy large language models like BERT on hardware with limited memory or computational power, without sacrificing too much accuracy.

Not ideal if you are not working with BERT-like transformer encoders or if you don't require extreme model compression to ternary precision.

model compression edge AI natural language processing deep learning deployment transformer optimization
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 4 / 25
Maturity 8 / 25
Community 0 / 25

How are scores calculated?

Stars

8

Forks

Language

Jupyter Notebook

License

Last pushed

Feb 07, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/MarsJacobs/kd-qat-large-enc"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.