ahmetkumass/yolo-gen

Train YOLO + VLM with one command. Auto-generate vision-language training data from YOLO labels - no extra labeling needed.

/ 100

Emerging

This tool helps quality control inspectors, manufacturing engineers, or medical imaging analysts automatically get detailed descriptions and classifications of objects detected in images. You provide images with basic bounding box labels for objects (like 'defect' or 'nodule'), and it outputs not just the object's location but also structured, descriptive text about its characteristics, such as `{"defect": true, "type": "scratch", "size": "2mm"}`. This allows for more precise automated analysis than simple object detection alone.

Use this if you need both fast object detection and detailed, descriptive information about those objects for tasks like automated inspection, damage assessment, or medical diagnostics.

Not ideal if you only need to know the location of objects without any further textual analysis or detailed categorization, or if you don't have existing YOLO-style bounding box labels for your data.

quality-control manufacturing-inspection medical-imaging-analysis defect-detection asset-damage-assessment

No Package No Dependents

Maintenance 10 / 25

Adoption 6 / 25

Maturity 13 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

kyegomez/RT-X

Pytorch implementation of the models RT-1-X and RT-2-X from the paper: "Open X-Embodiment:...

kyegomez/PALI3

Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"

chuanyangjin/MMToM-QA

[🏆Outstanding Paper Award at ACL 2024] MMToM-QA: Multimodal Theory of Mind Question Answering

lyuchenyang/Macaw-LLM

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

Muennighoff/vilio

🥶Vilio: State-of-the-art VL models in PyTorch & PaddlePaddle

Explore Transformer Models

All categories Trending Transformer directory Insights