nttcslab/msm-mae

Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representations

/ 100

Emerging

MSM-MAE helps researchers and machine learning engineers create robust audio analysis models by learning general-purpose audio representations from raw sound data. It takes raw audio files as input and outputs rich, meaningful feature vectors that capture the essence of the audio, which can then be used for various downstream tasks like sound classification or event detection. This is for machine learning researchers and engineers working on audio applications who need powerful, pre-trained audio features.

100 stars.

Use this if you are a machine learning researcher or engineer looking for a foundational self-supervised learning method to extract general-purpose audio features from raw audio for new model development.

Not ideal if you are starting a new project requiring state-of-the-art audio representations, as a newer and significantly more performant successor, Masked Modeling Duo (M2D), is available and recommended.

audio-analysis sound-recognition machine-learning-research feature-extraction self-supervised-learning

No Package No Dependents

Maintenance 10 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 11 / 25

How are scores calculated?

Stars

100

Forks

Language

Jupyter Notebook

License

—

Higher-rated alternatives

Westlake-AI/openmixup

CAIRI Supervised, Semi- and Self-Supervised Visual Representation Learning Toolbox and Benchmark

YU1ut/MixMatch-pytorch

Code for "MixMatch - A Holistic Approach to Semi-Supervised Learning"

kamata1729/QATM_pytorch

Pytorch Implementation of QATM:Quality-Aware Template Matching For Deep Learning

rgeirhos/generalisation-humans-DNNs

Data, code & materials from the paper "Generalisation in humans and deep neural networks" (NeurIPS 2018)

elijahcole/sinr

Spatial Implicit Neural Representations for Global-Scale Species Mapping - ICML 2023

Explore ML Frameworks

All categories Trending ML Framework directory Insights