gpt-neox and gpt-neo

These are ecosystem siblings representing different technological approaches to the same goal—GPT-Neo uses mesh-tensorflow for distributed training while GPT-NeoX uses Megatron/DeepSpeed for the same purpose, with NeoX being the more recent evolution designed to scale to larger models.

gpt-neox
58
Established
gpt-neo
47
Emerging
Maintenance 10/25
Adoption 10/25
Maturity 16/25
Community 22/25
Maintenance 0/25
Adoption 10/25
Maturity 16/25
Community 21/25
Stars: 7,399
Forks: 1,100
Downloads:
Commits (30d): 0
Language: Python
License: Apache-2.0
Stars: 8,286
Forks: 963
Downloads:
Commits (30d): 0
Language: Python
License: MIT
No Package No Dependents
Archived Stale 6m No Package No Dependents

About gpt-neox

EleutherAI/gpt-neox

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

This is a specialized toolkit for researchers and engineers who need to train very large language models from scratch, or fine-tune existing ones, using substantial computational resources. It takes raw text data and configuration settings as input, and outputs a custom-trained language model capable of generating human-like text. This is for users operating at the cutting edge of AI, often in academic, industry, or government labs.

large-language-model-training deep-learning-research natural-language-processing high-performance-computing AI-model-development

About gpt-neo

EleutherAI/gpt-neo

An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

Supports diverse attention mechanisms including local and linear attention variants, alongside mixture-of-experts and axial positional embeddings beyond standard GPT architectures. Built on mesh-tensorflow for distributed training across TPU and GPU clusters with both data and model parallelism, enabling efficient scaling to multi-billion parameter models. Includes pre-trained checkpoints (1.3B and 2.7B parameters) trained on The Pile dataset, compatible with HuggingFace Transformers for immediate inference.

Scores updated daily from GitHub, PyPI, and npm data. How scores work