InternLM/CapRL

[ICLR 2026] An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"

/ 100

Emerging

This project helps anyone needing highly detailed, accurate descriptions for images, especially those containing charts, infographics, or documents. You provide an image, and it outputs a well-structured, comprehensive text caption covering all visual information, with fewer inaccuracies than other models. Marketers, researchers, data analysts, or educators can use this to quickly generate descriptions for visual content.

193 stars.

Use this if you need to automatically generate very detailed and accurate textual descriptions for images, particularly those with complex visual information like charts, diagrams, or dense documents.

Not ideal if you only need very brief, high-level descriptions for natural images, or if your primary concern is extremely low computational cost over descriptive richness.

image-description data-visualization-analysis document-intelligence visual-content-analysis digital-asset-management

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 15 / 25

Community 8 / 25

How are scores calculated?

Stars

193

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

Explore Transformer Models

All categories Trending Transformer directory Insights