insoochung/transformer_bcq
BCQ tutorial for transformers
This project helps machine learning engineers and researchers make large language models, specifically transformer-based models, much smaller and faster without losing accuracy. It takes an existing full-precision transformer model and outputs a significantly compressed version that uses far less memory and can run more efficiently on devices with limited resources. The end-user persona is an ML engineer or researcher who deploys or optimizes transformer models.
No commits in the last 6 months.
Use this if you need to reduce the memory footprint and speed up inference for a transformer model, especially for deployment on edge devices or in resource-constrained environments.
Not ideal if you need a complete, production-ready solution for on-device quantized inference with full optimizations like caching, as this project is a starting point for the quantization method.
Stars
17
Forks
2
Language
Python
License
—
Category
Last pushed
Jul 17, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/insoochung/transformer_bcq"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ThilinaRajapakse/simpletransformers
Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling,...
jsksxs360/How-to-use-Transformers
Transformers 库快速入门教程
google/deepconsensus
DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences...
Denis2054/Transformers-for-NLP-2nd-Edition
Transformer models from BERT to GPT-4, environments from Hugging Face to OpenAI. Fine-tuning,...
abhimishra91/transformers-tutorials
Github repo with tutorials to fine tune transformers for diff NLP tasks