OpenNMT/CTranslate2
Fast inference engine for Transformer models
This is a specialized tool for developers who work with AI models, specifically 'Transformer' models used in natural language processing (like translation and text generation) or speech processing. It helps them make these AI models run much faster and use less computer memory when they are put into production. Developers can convert their trained models into an optimized format, and then use this library to get quicker results, such as translating text more rapidly or generating responses from large language models with less computational cost.
4,354 stars.
Use this if you are a developer deploying Transformer-based AI models and need to maximize their speed and reduce memory consumption on CPUs or GPUs for efficient, real-world application.
Not ideal if you are not a developer working with pre-trained Transformer models or if your primary need is model training rather than optimized inference.
Stars
4,354
Forks
456
Language
C++
License
MIT
Category
Last pushed
Feb 04, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/OpenNMT/CTranslate2"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
mechramc/Orion
Local AI runtime for training & running small LLMs directly on Apple Neural Engine (ANE). No...
Pomilon/LEMA
LEMA (Layer-wise Efficient Memory Abstraction): A hardware-aware framework for fine-tuning LLMs...
dilbersha/llm-inference-benchmarking-3080
A production-grade telemetry-aware suite for benchmarking LLM inference performance on NVIDIA RTX 3080.
Yuan-ManX/infera
Infera — A High-Performance Inference Engine for Large Language Models.
gxcsoccer/alloy
Hybrid SSM-Attention language model on Apple Silicon with MLX — interleaving Mamba-2 and...