kyegomez/SwitchTransformers

Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"

/ 100

Established

This is a tool for machine learning engineers and researchers who are building very large language models. It provides the core components to implement 'Switch Transformers,' a type of model designed for extreme scale. You would use this to incorporate sparse expert layers into your neural network architectures, aiming for more efficient training and inference of multi-billion or trillion parameter models.

136 stars.

Use this if you are a machine learning engineer or researcher specifically working on developing and scaling large language models or other transformer-based architectures.

Not ideal if you are looking for a pre-trained model, a high-level API for natural language processing tasks, or a solution for smaller-scale deep learning problems.

large-language-models neural-network-architecture model-scaling deep-learning-research sparse-models

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 14 / 25

How are scores calculated?

Stars

136

Forks

Language

Python

License

MIT

Related models

lucidrains/x-transformers

A concise but complete full-attention transformer with a set of promising experimental features...

kanishkamisra/minicons

Utility for behavioral and representational analyses of Language Models

lucidrains/simple-hierarchical-transformer

Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT

lucidrains/dreamer4

Implementation of Danijar's latest iteration for his Dreamer line of work

Nicolepcx/Transformers-in-Action

This is the corresponding code for the book Transformers in Action

Explore Transformer Models

All categories Trending Transformer directory Insights