JerryYLi/valhalla-nmt

Code repository for CVPR 2022 paper "VALHALLA: Visual Hallucination for Machine Translation"

/ 100

Emerging

This project helps machine translation researchers and practitioners improve translation quality by incorporating visual information. It takes source language text and corresponding images as input, generating more accurate target language translations. This is particularly useful for those working with datasets where visual context is crucial for understanding the meaning of text.

No commits in the last 6 months.

Use this if you are a machine translation researcher or engineer looking to experiment with and implement state-of-the-art multimodal machine translation models that leverage visual context.

Not ideal if you are a general user needing a simple, off-the-shelf translation tool without deep technical expertise or specific multimodal data.

machine-translation natural-language-processing multimodal-AI computer-vision AI-research

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

gabeur/mmt

Multi-Modal Transformer for Video Retrieval

MichiganNLP/Scalable-VLM-Probing

Probe Vision-Language Models

benywon/LALM

code and resource for ACL2021 paper 'Multi-Lingual Question Generation with Language Agnostic...

thunlp/cost-optimal-gqa

The code for the paper "Cost-Optimal Grouped-Query Attention for Long-Context Modeling"

PRITHIVSAKTHIUR/Molmo2-HF-Demo

A Gradio-based demonstration for the AllenAI Molmo2-8B multimodal model, enabling image QA,...

Explore Transformer Models

All categories Trending Transformer directory Insights