kyegomez/MuonClip
This repository is an open source implementation of the MuonClip strategy from the KIMI K2 Model from Moonshot AI
This project provides an advanced method for training large language models. It takes your existing model parameters and gradient information, along with attention logit data, to produce updated, more stable model parameters. Data scientists and machine learning engineers working on transformer-based models will find this useful for improving training efficiency and robustness.
Use this if you are training large transformer models and need an optimizer that offers improved stability and token-efficient updates, especially when dealing with potential attention score explosions.
Not ideal if you are working with non-transformer models, small datasets, or if you prefer simpler optimization algorithms like Adam or SGD.
Stars
17
Forks
—
Language
—
License
Apache-2.0
Category
Last pushed
Nov 07, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/kyegomez/MuonClip"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
OFA-Sys/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Kaushalya/medclip
A multi-modal CLIP model trained on the medical dataset ROCO
kastalimohammed1965/CLIP-fine-tune-registers-gated
Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny modality gap ensues!
BUAADreamer/SPN4CIR
[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives...
clip-italian/clip-italian
CLIP (Contrastive LanguageāImage Pre-training) for Italian