kyegomez/Mixture-of-MQA

An implementation of a switch transformer like Multi-query attention model

/ 100

Experimental

This is an advanced neural network architecture designed for developers building large-scale AI models. It processes sequences of data, such as text or other sequential inputs, to produce learned representations or predictions, improving efficiency and scalability. AI/ML engineers and researchers who are working on complex natural language processing or sequence modeling tasks would find this useful.

No commits in the last 6 months.

Use this if you are a machine learning engineer or researcher developing transformer-based models and need to process very long sequences more efficiently.

Not ideal if you are a data scientist looking for an off-the-shelf model for immediate use or if you are not comfortable with deep learning model architecture.

deep-learning natural-language-processing sequence-modeling large-scale-ai machine-learning-engineering

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 4 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

MIT

Higher-rated alternatives

lucidrains/x-transformers

A concise but complete full-attention transformer with a set of promising experimental features...

kanishkamisra/minicons

Utility for behavioral and representational analyses of Language Models

lucidrains/simple-hierarchical-transformer

Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT

lucidrains/dreamer4

Implementation of Danijar's latest iteration for his Dreamer line of work

Nicolepcx/Transformers-in-Action

This is the corresponding code for the book Transformers in Action

Explore Transformer Models

All categories Trending Transformer directory Insights