nttcslab/msm-mae

Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representations

46
/ 100
Emerging

MSM-MAE helps researchers and machine learning engineers create robust audio analysis models by learning general-purpose audio representations from raw sound data. It takes raw audio files as input and outputs rich, meaningful feature vectors that capture the essence of the audio, which can then be used for various downstream tasks like sound classification or event detection. This is for machine learning researchers and engineers working on audio applications who need powerful, pre-trained audio features.

100 stars.

Use this if you are a machine learning researcher or engineer looking for a foundational self-supervised learning method to extract general-purpose audio features from raw audio for new model development.

Not ideal if you are starting a new project requiring state-of-the-art audio representations, as a newer and significantly more performant successor, Masked Modeling Duo (M2D), is available and recommended.

audio-analysis sound-recognition machine-learning-research feature-extraction self-supervised-learning
No Package No Dependents
Maintenance 10 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 11 / 25

How are scores calculated?

Stars

100

Forks

8

Language

Jupyter Notebook

License

Last pushed

Feb 20, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/nttcslab/msm-mae"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.