antoyang/FrozenBiLM

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

/ 100

Emerging

This project helps anyone who works with video content and needs to quickly understand what's happening or extract specific information without extensive manual review. You input a video file and a natural language question about its content, and it outputs a precise answer. This is ideal for content analysts, researchers studying video data, or media professionals.

158 stars. No commits in the last 6 months.

Use this if you need to automatically answer questions about video content, especially when you have little to no existing labeled data for training.

Not ideal if you only work with text documents or still images, as this tool is specifically designed for video analysis.

video-analysis content-moderation media-intelligence research-data-extraction

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

158

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

Explore Transformer Models

All categories Trending Transformer directory Insights