Tencent/TurboTransformers

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

50
/ 100
Established

TurboTransformers is for AI practitioners and MLOps engineers who deploy large language models. It takes your pre-trained transformer models (like BERT, GPT-2, or ALBERT) and makes them run significantly faster on both CPUs and GPUs. The output is the same model predictions, but delivered much more quickly, especially for real-time applications like chatbots or recommendation systems.

1,542 stars. No commits in the last 6 months.

Use this if you need to dramatically speed up the inference time of your transformer-based AI models in production, especially when handling variable-length inputs and different batch sizes.

Not ideal if you are still in the model development or training phase, as this tool focuses on accelerating the deployment and serving of already-trained models.

AI-inference NLP-deployment machine-learning-operations real-time-AI language-model-serving
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 22 / 25

How are scores calculated?

Stars

1,542

Forks

205

Language

C++

License

Last pushed

Jul 18, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Tencent/TurboTransformers"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.