insoochung/transformer_bcq

BCQ tutorial for transformers

/ 100

Experimental

This project helps machine learning engineers and researchers make large language models, specifically transformer-based models, much smaller and faster without losing accuracy. It takes an existing full-precision transformer model and outputs a significantly compressed version that uses far less memory and can run more efficiently on devices with limited resources. The end-user persona is an ML engineer or researcher who deploys or optimizes transformer models.

No commits in the last 6 months.

Use this if you need to reduce the memory footprint and speed up inference for a transformer model, especially for deployment on edge devices or in resource-constrained environments.

Not ideal if you need a complete, production-ready solution for on-device quantized inference with full optimizations like caching, as this project is a starting point for the quantization method.

model-compression on-device-AI natural-language-processing deep-learning-optimization transformer-deployment

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 8 / 25

Community 9 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

ThilinaRajapakse/simpletransformers

Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling,...

jsksxs360/How-to-use-Transformers

Transformers 库快速入门教程

google/deepconsensus

DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences...

Denis2054/Transformers-for-NLP-2nd-Edition

Transformer models from BERT to GPT-4, environments from Hugging Face to OpenAI. Fine-tuning,...

abhimishra91/transformers-tutorials

Github repo with tutorials to fine tune transformers for diff NLP tasks

Explore Transformer Models

All categories Trending Transformer directory Insights