linjieli222/HERO

Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"

44
/ 100
Emerging

This project helps AI researchers train and evaluate models that understand video content alongside spoken dialogue or text descriptions. It takes video files and their associated subtitles or text queries as input, and outputs trained models capable of tasks like retrieving specific video moments based on text, answering questions about video content, or generating captions. It is designed for researchers working on advanced video and language understanding.

236 stars. No commits in the last 6 months.

Use this if you are an AI researcher looking to fine-tune a pre-trained model for tasks involving video understanding with accompanying text or dialogue, such as video question answering or moment retrieval.

Not ideal if you are an end-user without a technical background in deep learning, or if you need a plug-and-play solution without model training or specific hardware.

video-understanding natural-language-processing multimodal-ai deep-learning-research
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 18 / 25

How are scores calculated?

Stars

236

Forks

35

Language

Python

License

MIT

Last pushed

Sep 16, 2021

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/linjieli222/HERO"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.