NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.
This library helps AI engineers and researchers create and train large language models (LLMs) more efficiently. It takes your existing Transformer model code and, using NVIDIA GPUs, allows you to train and run inference much faster while using less memory. This is especially useful for those working with massive datasets and complex AI models.
3,206 stars. Actively maintained with 57 commits in the last 30 days.
Use this if you are developing or training large Transformer-based AI models and want to significantly speed up your workflows and reduce memory usage on NVIDIA GPUs.
Not ideal if you are not working with Transformer models or do not have access to NVIDIA GPUs, especially newer generations like Hopper, Ada, or Blackwell.
Stars
3,206
Forks
659
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 12, 2026
Commits (30d)
57
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/NVIDIA/TransformerEngine"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
mlcommons/inference
Reference implementations of MLPerf® inference benchmarks
mlcommons/training
Reference implementations of MLPerf® training benchmarks
datamade/usaddress
:us: a python library for parsing unstructured United States address strings into address components
GRAAL-Research/deepparse
Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning
CMU-SAFARI/Pythia
A customizable hardware prefetching framework using online reinforcement learning as described...