OpenNMT/CTranslate2

Fast inference engine for Transformer models

/ 100

Established

This is a specialized tool for developers who work with AI models, specifically 'Transformer' models used in natural language processing (like translation and text generation) or speech processing. It helps them make these AI models run much faster and use less computer memory when they are put into production. Developers can convert their trained models into an optimized format, and then use this library to get quicker results, such as translating text more rapidly or generating responses from large language models with less computational cost.

4,354 stars.

Use this if you are a developer deploying Transformer-based AI models and need to maximize their speed and reduce memory consumption on CPUs or GPUs for efficient, real-world application.

Not ideal if you are not a developer working with pre-trained Transformer models or if your primary need is model training rather than optimized inference.

AI-inference-optimization natural-language-processing machine-translation text-generation speech-recognition

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 20 / 25

How are scores calculated?

Stars

4,354

Forks

456

Language

C++

License

MIT

Related models

mechramc/Orion

Local AI runtime for training & running small LLMs directly on Apple Neural Engine (ANE). No...

Pomilon/LEMA

LEMA (Layer-wise Efficient Memory Abstraction): A hardware-aware framework for fine-tuning LLMs...

dilbersha/llm-inference-benchmarking-3080

A production-grade telemetry-aware suite for benchmarking LLM inference performance on NVIDIA RTX 3080.

Yuan-ManX/infera

Infera — A High-Performance Inference Engine for Large Language Models.

gxcsoccer/alloy

Hybrid SSM-Attention language model on Apple Silicon with MLX — interleaving Mamba-2 and...

Explore Transformer Models

All categories Trending Transformer directory Insights