unum-cloud/UForm

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

57
/ 100
Established

This helps you quickly understand and generate content from a mix of multilingual text and images, with upcoming support for video. You input texts, images, or both, and it outputs concise descriptions, answers to questions about images, or numerical representations that help with search and classification. Marketers, content strategists, or anyone needing to analyze and create multimedia content efficiently would use this.

1,221 stars. Available on PyPI.

Use this if you need to rapidly process and generate insights from diverse content formats like images and text, especially across multiple languages, or want to build smart search features into your applications.

Not ideal if your primary need is extremely deep, nuanced analysis of a single modality (e.g., complex legal text analysis or high-fidelity image editing) rather than multimodal understanding.

content-analysis multimedia-search image-captioning visual-question-answering cross-lingual-content
Maintenance 6 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 16 / 25

How are scores calculated?

Stars

1,221

Forks

76

Language

Python

License

Apache-2.0

Last pushed

Oct 30, 2025

Commits (30d)

0

Dependencies

4

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/unum-cloud/UForm"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.