NVIDIA/TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.

/ 100

Verified

This library helps AI engineers and researchers create and train large language models (LLMs) more efficiently. It takes your existing Transformer model code and, using NVIDIA GPUs, allows you to train and run inference much faster while using less memory. This is especially useful for those working with massive datasets and complex AI models.

3,206 stars. Actively maintained with 57 commits in the last 30 days.

Use this if you are developing or training large Transformer-based AI models and want to significantly speed up your workflows and reduce memory usage on NVIDIA GPUs.

Not ideal if you are not working with Transformer models or do not have access to NVIDIA GPUs, especially newer generations like Hopper, Ada, or Blackwell.

large-language-models ai-model-training deep-learning-optimization generative-ai

No Package No Dependents

Maintenance 22 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 25 / 25

How are scores calculated?

Stars

3,206

Forks

659

Language

Python

License

Apache-2.0

Related frameworks

mlcommons/inference

Reference implementations of MLPerf® inference benchmarks

mlcommons/training

Reference implementations of MLPerf® training benchmarks

datamade/usaddress

:us: a python library for parsing unstructured United States address strings into address components

GRAAL-Research/deepparse

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

CMU-SAFARI/Pythia

A customizable hardware prefetching framework using online reinforcement learning as described...

Explore ML Frameworks

All categories Trending ML Framework directory Insights