nttcslab/msm-mae
Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representations
MSM-MAE helps researchers and machine learning engineers create robust audio analysis models by learning general-purpose audio representations from raw sound data. It takes raw audio files as input and outputs rich, meaningful feature vectors that capture the essence of the audio, which can then be used for various downstream tasks like sound classification or event detection. This is for machine learning researchers and engineers working on audio applications who need powerful, pre-trained audio features.
100 stars.
Use this if you are a machine learning researcher or engineer looking for a foundational self-supervised learning method to extract general-purpose audio features from raw audio for new model development.
Not ideal if you are starting a new project requiring state-of-the-art audio representations, as a newer and significantly more performant successor, Masked Modeling Duo (M2D), is available and recommended.
Stars
100
Forks
8
Language
Jupyter Notebook
License
—
Category
Last pushed
Feb 20, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/nttcslab/msm-mae"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Westlake-AI/openmixup
CAIRI Supervised, Semi- and Self-Supervised Visual Representation Learning Toolbox and Benchmark
YU1ut/MixMatch-pytorch
Code for "MixMatch - A Holistic Approach to Semi-Supervised Learning"
kamata1729/QATM_pytorch
Pytorch Implementation of QATM:Quality-Aware Template Matching For Deep Learning
rgeirhos/generalisation-humans-DNNs
Data, code & materials from the paper "Generalisation in humans and deep neural networks" (NeurIPS 2018)
elijahcole/sinr
Spatial Implicit Neural Representations for Global-Scale Species Mapping - ICML 2023