kyegomez/SwitchTransformers
Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"
This is a tool for machine learning engineers and researchers who are building very large language models. It provides the core components to implement 'Switch Transformers,' a type of model designed for extreme scale. You would use this to incorporate sparse expert layers into your neural network architectures, aiming for more efficient training and inference of multi-billion or trillion parameter models.
136 stars.
Use this if you are a machine learning engineer or researcher specifically working on developing and scaling large language models or other transformer-based architectures.
Not ideal if you are looking for a pre-trained model, a high-level API for natural language processing tasks, or a solution for smaller-scale deep learning problems.
Stars
136
Forks
16
Language
Python
License
MIT
Category
Last pushed
Jan 17, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/kyegomez/SwitchTransformers"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
lucidrains/x-transformers
A concise but complete full-attention transformer with a set of promising experimental features...
kanishkamisra/minicons
Utility for behavioral and representational analyses of Language Models
lucidrains/simple-hierarchical-transformer
Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT
lucidrains/dreamer4
Implementation of Danijar's latest iteration for his Dreamer line of work
Nicolepcx/Transformers-in-Action
This is the corresponding code for the book Transformers in Action