Skyline-9/Shotluck-Holmes
[ACM MMGR '24] 🔍 Shotluck Holmes: A family of small-scale LLVMs for shot-level video understanding
This project helps video content creators and analysts automatically understand and describe video content at a granular level. It takes raw video files and outputs detailed captions for individual video shots or summarizes the entire video's narrative. This is ideal for anyone who needs to quickly extract meaningful descriptions from a large volume of video footage.
No commits in the last 6 months.
Use this if you need to generate accurate, concise descriptions for video segments or create summaries of entire videos efficiently, especially for large datasets.
Not ideal if you're looking for a simple drag-and-drop tool, as this requires some technical setup and command-line execution.
Stars
13
Forks
—
Language
Python
License
Apache-2.0
Category
Last pushed
Oct 26, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Skyline-9/Shotluck-Holmes"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
gabeur/mmt
Multi-Modal Transformer for Video Retrieval
JerryYLi/valhalla-nmt
Code repository for CVPR 2022 paper "VALHALLA: Visual Hallucination for Machine Translation"
MichiganNLP/Scalable-VLM-Probing
Probe Vision-Language Models
benywon/LALM
code and resource for ACL2021 paper 'Multi-Lingual Question Generation with Language Agnostic...
thunlp/cost-optimal-gqa
The code for the paper "Cost-Optimal Grouped-Query Attention for Long-Context Modeling"