nawnoes/pytorch-gpt-x

An implementation of an autoregressive language model using an improved Transformer and DeepSpeed pipeline parallelism.

/ 100

Emerging

This project helps machine learning researchers or engineers train large language models (like GPT-style models) on limited hardware. You provide text data, and it trains a ~1 billion parameter model optimized with techniques like ReZero and DeepSpeed, enabling efficient training on just two V100 16GB GPUs. This is for individuals or teams developing advanced natural language processing capabilities.

Use this if you need to train a large autoregressive language model efficiently on a cluster with a small number of powerful GPUs.

Not ideal if you're looking for an off-the-shelf pre-trained model or if you don't have access to specialized GPU hardware.

large-language-models deep-learning-training natural-language-generation ml-research gpu-optimization

No License No Package No Dependents

Maintenance 6 / 25

Adoption 7 / 25

Maturity 8 / 25

Community 9 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

AliHaiderAhmad001/GPT-from-Scratch-with-Tensorflow

Implementation for "Improving Language Understanding by Generative Pre-Training" paper

HomebrewML/HomebrewNLP-torch

A case study of efficient training of large language models using commodity hardware.

akshat0123/GPT-1

Pytorch implementation of GPT-1

qiqiApink/MotionGPT

The official PyTorch implementation of the paper "MotionGPT: Finetuned LLMs are General-Purpose...

Shenggan/atp

Adaptive Tensor Parallelism for Foundation Models

Explore Transformer Models

All categories Trending Transformer directory Insights