ai4sd/multiscale-byte-lm
A hierarchical LM that scales to training on context windows of +5M tokens
This is a developer tool that provides a hierarchical language model architecture capable of processing extremely long sequences of text, up to millions of tokens. It takes in raw byte-level text data and outputs predictions or loss values, which are then used to train and evaluate large language models. This tool is designed for machine learning engineers and researchers who are building or experimenting with advanced language models.
Use this if you are a machine learning engineer or researcher focused on developing new language models and need an architecture that can handle very long text contexts efficiently.
Not ideal if you are an end-user looking for a pre-trained language model or a high-level API for natural language processing tasks.
Stars
9
Forks
—
Language
Python
License
MIT
Category
Last pushed
Feb 17, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ai4sd/multiscale-byte-lm"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Goekdeniz-Guelmez/mlx-lm-lora
Train Large Language Models on MLX.
uber-research/PPLM
Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.
VHellendoorn/Code-LMs
Guide to using pre-trained large language models of source code
ssbuild/chatglm_finetuning
chatglm 6b finetuning and alpaca finetuning
jarobyte91/pytorch_beam_search
A lightweight implementation of Beam Search for sequence models in PyTorch.