InternLM/CapRL
[ICLR 2026] An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"
This project helps anyone needing highly detailed, accurate descriptions for images, especially those containing charts, infographics, or documents. You provide an image, and it outputs a well-structured, comprehensive text caption covering all visual information, with fewer inaccuracies than other models. Marketers, researchers, data analysts, or educators can use this to quickly generate descriptions for visual content.
193 stars.
Use this if you need to automatically generate very detailed and accurate textual descriptions for images, particularly those with complex visual information like charts, diagrams, or dense documents.
Not ideal if you only need very brief, high-level descriptions for natural images, or if your primary concern is extremely low computational cost over descriptive richness.
Stars
193
Forks
6
Language
Python
License
Apache-2.0
Category
Last pushed
Feb 08, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/InternLM/CapRL"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
bytedance/SALMONN
SALMONN family: A suite of advanced multi-modal LLMs
NVlabs/OmniVinci
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
fixie-ai/ultravox
A fast multimodal LLM for real-time voice