gabeur/mmt

Multi-Modal Transformer for Video Retrieval

/ 100

Emerging

This project helps video content curators, media researchers, or content platform managers efficiently find specific videos using text descriptions. You input a search query (text) and get back a list of relevant video clips. It processes various aspects of a video, like visual content, motion, and audio, to understand and match your query.

265 stars. No commits in the last 6 months.

Use this if you need to build or enhance a system that allows users to search large video libraries with text queries, similar to how you search for images or documents.

Not ideal if your primary need is image retrieval, audio-only search, or if you don't have existing video features to work with.

video-search content-management media-asset-management information-retrieval multimedia-search

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 19 / 25

How are scores calculated?

Stars

265

Forks

Language

Python

License

Apache-2.0

Related models

JerryYLi/valhalla-nmt

Code repository for CVPR 2022 paper "VALHALLA: Visual Hallucination for Machine Translation"

MichiganNLP/Scalable-VLM-Probing

Probe Vision-Language Models

benywon/LALM

code and resource for ACL2021 paper 'Multi-Lingual Question Generation with Language Agnostic...

thunlp/cost-optimal-gqa

The code for the paper "Cost-Optimal Grouped-Query Attention for Long-Context Modeling"

PRITHIVSAKTHIUR/Molmo2-HF-Demo

A Gradio-based demonstration for the AllenAI Molmo2-8B multimodal model, enabling image QA,...

Explore Transformer Models

All categories Trending Transformer directory Insights