huckiyang/Interspeech23-Tutorial-Para-Efficient-Cross-Modal-Tutorial

Interspeech Tutorial - Resource Efficient and Cross-Modal Learning Toward Foundation Modeling

/ 100

Experimental

This tutorial helps machine learning engineers and researchers adapt large, pre-trained speech and natural language processing models to new tasks more efficiently. It explains how to fine-tune these 'foundation models' using techniques like adapters and low-rank adaptation, reducing the computational resources needed. You'll learn how to take an existing massive model and customize it for specific applications like accent adaptation in text-to-speech or dialect identification, even with limited data or computing power.

No commits in the last 6 months.

Use this if you need to customize large pre-trained AI models for new speech or text tasks without retraining the entire model or requiring extensive computational resources.

Not ideal if you are looking for an off-the-shelf software tool for end-users, as this provides a technical deep-dive for AI practitioners rather than a ready-to-use application.

speech-recognition natural-language-processing machine-learning-engineering model-adaptation resource-efficient-ai

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 8 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

—

License

—

Higher-rated alternatives

TheShadow29/awesome-grounding

awesome grounding: A curated list of research papers in visual grounding

microsoft/XPretrain

Multi-modality pre-training

TheShadow29/zsgnet-pytorch

Official implementation of ICCV19 oral paper Zero-Shot grounding of Objects from Natural...

TheShadow29/VidSitu

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

zeyofu/BLINK_Benchmark

This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can...

Explore NLP Tools

All categories Trending NLP directory Insights