gabeur/mmt
Multi-Modal Transformer for Video Retrieval
This project helps video content curators, media researchers, or content platform managers efficiently find specific videos using text descriptions. You input a search query (text) and get back a list of relevant video clips. It processes various aspects of a video, like visual content, motion, and audio, to understand and match your query.
265 stars. No commits in the last 6 months.
Use this if you need to build or enhance a system that allows users to search large video libraries with text queries, similar to how you search for images or documents.
Not ideal if your primary need is image retrieval, audio-only search, or if you don't have existing video features to work with.
Stars
265
Forks
40
Language
Python
License
Apache-2.0
Category
Last pushed
Oct 09, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/gabeur/mmt"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
JerryYLi/valhalla-nmt
Code repository for CVPR 2022 paper "VALHALLA: Visual Hallucination for Machine Translation"
MichiganNLP/Scalable-VLM-Probing
Probe Vision-Language Models
benywon/LALM
code and resource for ACL2021 paper 'Multi-Lingual Question Generation with Language Agnostic...
thunlp/cost-optimal-gqa
The code for the paper "Cost-Optimal Grouped-Query Attention for Long-Context Modeling"
PRITHIVSAKTHIUR/Molmo2-HF-Demo
A Gradio-based demonstration for the AllenAI Molmo2-8B multimodal model, enabling image QA,...