ahmetkumass/yolo-gen
Train YOLO + VLM with one command. Auto-generate vision-language training data from YOLO labels - no extra labeling needed.
This tool helps quality control inspectors, manufacturing engineers, or medical imaging analysts automatically get detailed descriptions and classifications of objects detected in images. You provide images with basic bounding box labels for objects (like 'defect' or 'nodule'), and it outputs not just the object's location but also structured, descriptive text about its characteristics, such as `{"defect": true, "type": "scratch", "size": "2mm"}`. This allows for more precise automated analysis than simple object detection alone.
Use this if you need both fast object detection and detailed, descriptive information about those objects for tasks like automated inspection, damage assessment, or medical diagnostics.
Not ideal if you only need to know the location of objects without any further textual analysis or detailed categorization, or if you don't have existing YOLO-style bounding box labels for your data.
Stars
24
Forks
4
Language
Python
License
MIT
Category
Last pushed
Feb 07, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ahmetkumass/yolo-gen"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
kyegomez/RT-X
Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment:...
kyegomez/PALI3
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
chuanyangjin/MMToM-QA
[🏆Outstanding Paper Award at ACL 2024] MMToM-QA: Multimodal Theory of Mind Question Answering
lyuchenyang/Macaw-LLM
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
Muennighoff/vilio
🥶Vilio: State-of-the-art VL models in PyTorch & PaddlePaddle