Tencent/TurboTransformers
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
TurboTransformers is for AI practitioners and MLOps engineers who deploy large language models. It takes your pre-trained transformer models (like BERT, GPT-2, or ALBERT) and makes them run significantly faster on both CPUs and GPUs. The output is the same model predictions, but delivered much more quickly, especially for real-time applications like chatbots or recommendation systems.
1,542 stars. No commits in the last 6 months.
Use this if you need to dramatically speed up the inference time of your transformer-based AI models in production, especially when handling variable-length inputs and different batch sizes.
Not ideal if you are still in the model development or training phase, as this tool focuses on accelerating the deployment and serving of already-trained models.
Stars
1,542
Forks
205
Language
C++
License
—
Category
Last pushed
Jul 18, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Tencent/TurboTransformers"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
huggingface/transformers-bloom-inference
Fast Inference Solutions for BLOOM
mit-han-lab/lite-transformer
[ICLR 2020] Lite Transformer with Long-Short Range Attention
mit-han-lab/hardware-aware-transformers
[ACL'20] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
LibreTranslate/Locomotive
Toolkit for training/converting LibreTranslate compatible language models 🚂
aliemo/transfomers-silicon-research
Research and Materials on Hardware implementation of Transformer Model