PigeonDan1/ps-slm

TASU: A New Style of Alignment of Speech LLM with only Text Training Data, zero-shot on ASR and Other SU tasks

/ 100

Emerging

This project helps speech AI researchers train Speech Large Language Models (Speech LLMs) to understand spoken language. It takes text-only datasets and audio files as input and outputs a trained Speech LLM capable of various speech understanding tasks. Scientists and researchers in natural language processing and speech technology would use this.

Use this if you are a speech AI researcher looking for a new, efficient way to align Speech LLMs for semantic understanding using primarily text-based training data.

Not ideal if you are an end-user looking for a ready-to-use application or a developer without expertise in large language model training and GPU/NPU cluster management.

speech-recognition natural-language-processing large-language-models audio-translation semantic-understanding

No License No Package No Dependents

Maintenance 10 / 25

Adoption 6 / 25

Maturity 7 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

TensorSpeech/TensorFlowASR

:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2....

dangvansam/viet-asr

VietASR - Vietnamese Automatic Speech Recognition

wenet-e2e/wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

xinjli/allosaurus

Allosaurus is a pretrained universal phone recognizer for more than 2000 languages

srvk/eesen

The official repository of the Eesen project

Explore Voice AI Tools

All categories Trending Voice AI directory Insights