ai4sd/multiscale-byte-lm

A hierarchical LM that scales to training on context windows of +5M tokens

/ 100

Emerging

This is a developer tool that provides a hierarchical language model architecture capable of processing extremely long sequences of text, up to millions of tokens. It takes in raw byte-level text data and outputs predictions or loss values, which are then used to train and evaluate large language models. This tool is designed for machine learning engineers and researchers who are building or experimenting with advanced language models.

Use this if you are a machine learning engineer or researcher focused on developing new language models and need an architecture that can handle very long text contexts efficiently.

Not ideal if you are an end-user looking for a pre-trained language model or a high-level API for natural language processing tasks.

Machine Learning Engineering Natural Language Processing Large Language Models Deep Learning Research Model Architecture

No Package No Dependents

Maintenance 10 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

MIT

Higher-rated alternatives

Goekdeniz-Guelmez/mlx-lm-lora

Train Large Language Models on MLX.

uber-research/PPLM

Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.

VHellendoorn/Code-LMs

Guide to using pre-trained large language models of source code

ssbuild/chatglm_finetuning

chatglm 6b finetuning and alpaca finetuning

jarobyte91/pytorch_beam_search

A lightweight implementation of Beam Search for sequence models in PyTorch.

Explore Transformer Models

All categories Trending Transformer directory Insights