VyetGokyra/Speech_project_Vin

Multimodal Speech Emotion Recognition ViT (AST) for audio encoder and Multiscale Attention Net (MANet) for visual encoder

26
/ 100
Experimental

This helps researchers and practitioners analyze emotions expressed in speech, combining both auditory cues and visual expressions. It takes spoken audio and corresponding video footage as input and outputs a classification of the emotion being conveyed. This tool is ideal for scientists studying human emotion, psychologists, or anyone interested in automated affect analysis.

No commits in the last 6 months.

Use this if you need to automatically identify emotions from combined audio and video recordings of human speech.

Not ideal if you only have text or still images for emotion analysis, or if you need to detect subtle emotional nuances beyond a fixed set of classifications.

emotion-recognition affective-computing speech-analysis psychology-research human-computer-interaction
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 4 / 25
Maturity 8 / 25
Community 14 / 25

How are scores calculated?

Stars

7

Forks

3

Language

Python

License

Last pushed

Jan 21, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/VyetGokyra/Speech_project_Vin"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.