kyegomez/MuonClip

This repository is an open source implementation of the MuonClip strategy from the KIMI K2 Model from Moonshot AI

/ 100

Experimental

This project provides an advanced method for training large language models. It takes your existing model parameters and gradient information, along with attention logit data, to produce updated, more stable model parameters. Data scientists and machine learning engineers working on transformer-based models will find this useful for improving training efficiency and robustness.

Use this if you are training large transformer models and need an optimizer that offers improved stability and token-efficient updates, especially when dealing with potential attention score explosions.

Not ideal if you are working with non-transformer models, small datasets, or if you prefer simpler optimization algorithms like Adam or SGD.

Large Language Models Transformer Training Deep Learning Optimization AI Model Stability Natural Language Processing

No Package No Dependents

Maintenance 6 / 25

Adoption 6 / 25

Maturity 15 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

—

License

Apache-2.0

Higher-rated alternatives

OFA-Sys/Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Kaushalya/medclip

A multi-modal CLIP model trained on the medical dataset ROCO

kastalimohammed1965/CLIP-fine-tune-registers-gated

Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny modality gap ensues!

BUAADreamer/SPN4CIR

[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives...

clip-italian/clip-italian

CLIP (Contrastive Language–Image Pre-training) for Italian

Explore Transformer Models

All categories Trending Transformer directory Insights