zhanshijinwat/Steel-LLM

Train a 1B LLM with 1T tokens from scratch by personal

/ 100

Emerging

Steel-LLM is a project for those who want to build their own custom Chinese large language models (LLMs) from scratch. It provides a complete guide and all necessary code for collecting and processing Chinese text data, then training an LLM. The output is a functional Chinese LLM tailored to specific data, ready for fine-tuning.

791 stars. No commits in the last 6 months.

Use this if you are a machine learning researcher or engineer with access to 8 or more GPUs (like H800 or A100) and want to pre-train a Chinese LLM from the ground up, rather than simply using an existing model.

Not ideal if you are looking for a ready-to-use LLM for immediate application without extensive training, or if you do not have significant GPU resources and expertise in LLM training.

large-language-model-training natural-language-processing machine-learning-engineering AI-research computational-linguistics

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 18 / 25

How are scores calculated?

Stars

791

Forks

Language

Jupyter Notebook

License

—

Higher-rated alternatives

rasbt/LLMs-from-scratch

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

facebookresearch/LayerSkip

Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024

FareedKhan-dev/train-llm-from-scratch

A straightforward method for training your LLM, from downloading data to generating text.

kmeng01/rome

Locating and editing factual associations in GPT (NeurIPS 2022)

datawhalechina/llms-from-scratch-cn

仅需Python基础，从0构建大语言模型；从0逐步构建GLM4\Llama3\RWKV6，深入理解大模型原理

Explore Transformer Models

All categories Trending Transformer directory Insights