Skyline-9/Shotluck-Holmes

[ACM MMGR '24] 🔍 Shotluck Holmes: A family of small-scale LLVMs for shot-level video understanding

/ 100

Experimental

This project helps video content creators and analysts automatically understand and describe video content at a granular level. It takes raw video files and outputs detailed captions for individual video shots or summarizes the entire video's narrative. This is ideal for anyone who needs to quickly extract meaningful descriptions from a large volume of video footage.

No commits in the last 6 months.

Use this if you need to generate accurate, concise descriptions for video segments or create summaries of entire videos efficiently, especially for large datasets.

Not ideal if you're looking for a simple drag-and-drop tool, as this requires some technical setup and command-line execution.

video-analysis content-creation media-asset-management video-annotation digital-archiving

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

Apache-2.0

Higher-rated alternatives

gabeur/mmt

Multi-Modal Transformer for Video Retrieval

JerryYLi/valhalla-nmt

Code repository for CVPR 2022 paper "VALHALLA: Visual Hallucination for Machine Translation"

MichiganNLP/Scalable-VLM-Probing

Probe Vision-Language Models

benywon/LALM

code and resource for ACL2021 paper 'Multi-Lingual Question Generation with Language Agnostic...

thunlp/cost-optimal-gqa

The code for the paper "Cost-Optimal Grouped-Query Attention for Long-Context Modeling"

Explore Transformer Models

All categories Trending Transformer directory Insights