dhcode-cpp/cut-cross-entropy-pytorch

pytorch notebook for implemention for cut-cross-entropy LLM training.

/ 100

Experimental

This is a specialized training technique for large language models (LLMs) that helps improve how they handle very large vocabularies. It takes your existing LLM training data and architecture, applying an optimized method to calculate loss. The result is more efficient and potentially better-performing LLMs, especially useful for AI researchers and machine learning engineers working on advanced natural language processing.

No commits in the last 6 months.

Use this if you are an AI researcher or machine learning engineer training large language models and encounter performance challenges with extremely large vocabularies.

Not ideal if you are not directly involved in the low-level training optimization of large language models or are working with standard vocabulary sizes.

LLM-training natural-language-processing deep-learning-optimization AI-research machine-learning-engineering

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 4 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Jupyter Notebook

License

MIT

Higher-rated alternatives

PacktPublishing/Mastering-NLP-from-Foundations-to-LLMs

Mastering NLP from Foundations to LLMs, Published by Packt

HandsOnLLM/Hands-On-Large-Language-Models

Official code repo for the O'Reilly Book - "Hands-On Large Language Models"

mlabonne/llm-course

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

louisfb01/start-llms

A complete guide to start and improve your LLM skills in 2026 with little background in the...

Denis2054/Transformers-for-NLP-and-Computer-Vision-3rd-Edition

Transformers 3rd Edition

Explore Transformer Models

All categories Trending Transformer directory Insights