Tencent/TurboTransformers

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

/ 100

Established

TurboTransformers is for AI practitioners and MLOps engineers who deploy large language models. It takes your pre-trained transformer models (like BERT, GPT-2, or ALBERT) and makes them run significantly faster on both CPUs and GPUs. The output is the same model predictions, but delivered much more quickly, especially for real-time applications like chatbots or recommendation systems.

1,542 stars. No commits in the last 6 months.

Use this if you need to dramatically speed up the inference time of your transformer-based AI models in production, especially when handling variable-length inputs and different batch sizes.

Not ideal if you are still in the model development or training phase, as this tool focuses on accelerating the deployment and serving of already-trained models.

AI-inference NLP-deployment machine-learning-operations real-time-AI language-model-serving

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 22 / 25

How are scores calculated?

Stars

1,542

Forks

205

Language

C++

License

—

Related models

huggingface/transformers-bloom-inference

Fast Inference Solutions for BLOOM

mit-han-lab/lite-transformer

[ICLR 2020] Lite Transformer with Long-Short Range Attention

mit-han-lab/hardware-aware-transformers

[ACL'20] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

LibreTranslate/Locomotive

Toolkit for training/converting LibreTranslate compatible language models 🚂

aliemo/transfomers-silicon-research

Research and Materials on Hardware implementation of Transformer Model

Explore Transformer Models

All categories Trending Transformer directory Insights