kyegomez/NeVA

The open source implementation of "NeVA: NeMo Vision and Language Assistant"

/ 100

Experimental

This project helps you understand what's happening in images by asking questions in plain English. You provide an image and a question about it, and NeVA gives you a detailed text answer. Anyone who needs to extract insights from images using natural language queries, like a content analyst or a researcher, would find this useful.

No commits in the last 6 months.

Use this if you need to get textual answers from images by simply asking questions, like identifying objects, actions, or details within a picture.

Not ideal if you need to generate images from text, classify images into predefined categories, or perform tasks unrelated to visual question answering.

image-analysis content-understanding visual-search data-extraction multimedia-processing

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

open-mmlab/mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark

facebookresearch/mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

HuaizhengZhang/Awsome-Deep-Learning-for-Video-Analysis

Papers, code and datasets about deep learning and multi-modal learning for video analysis

KaiyangZhou/pytorch-vsumm-reinforce

Unsupervised video summarization with deep reinforcement learning (AAAI'18)

adambielski/siamese-triplet

Siamese and triplet networks with online pair/triplet mining in PyTorch

Explore ML Frameworks

All categories Trending ML Framework directory Insights