augustwester/transformer-xl

A lightweight PyTorch implementation of the Transformer-XL architecture proposed by Dai et al. (2019)

/ 100

Emerging

This is a lightweight tool for machine learning researchers and students to understand the core Transformer-XL architecture. It takes a sequence of unordered numbers as input and demonstrates how the model learns to sort them. The output shows the model's performance, highlighting the benefits of its memory-augmented design.

No commits in the last 6 months.

Use this if you are a machine learning researcher or student looking for a simplified, runnable example to grasp the mechanics of the Transformer-XL architecture.

Not ideal if you need a solution for training large-scale language models or processing real-world text data, as it's designed for architectural understanding, not practical deployment.

deep-learning-research neural-network-architecture language-model-development sequence-modeling machine-learning-education

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

huggingface/transformers-bloom-inference

Fast Inference Solutions for BLOOM

Tencent/TurboTransformers

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc)...

mit-han-lab/lite-transformer

[ICLR 2020] Lite Transformer with Long-Short Range Attention

mit-han-lab/hardware-aware-transformers

[ACL'20] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

LibreTranslate/Locomotive

Toolkit for training/converting LibreTranslate compatible language models 🚂

Explore Transformer Models

All categories Trending Transformer directory Insights