JerryYLi/valhalla-nmt
Code repository for CVPR 2022 paper "VALHALLA: Visual Hallucination for Machine Translation"
This project helps machine translation researchers and practitioners improve translation quality by incorporating visual information. It takes source language text and corresponding images as input, generating more accurate target language translations. This is particularly useful for those working with datasets where visual context is crucial for understanding the meaning of text.
No commits in the last 6 months.
Use this if you are a machine translation researcher or engineer looking to experiment with and implement state-of-the-art multimodal machine translation models that leverage visual context.
Not ideal if you are a general user needing a simple, off-the-shelf translation tool without deep technical expertise or specific multimodal data.
Stars
28
Forks
4
Language
Python
License
MIT
Category
Last pushed
Feb 19, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/JerryYLi/valhalla-nmt"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
gabeur/mmt
Multi-Modal Transformer for Video Retrieval
MichiganNLP/Scalable-VLM-Probing
Probe Vision-Language Models
benywon/LALM
code and resource for ACL2021 paper 'Multi-Lingual Question Generation with Language Agnostic...
thunlp/cost-optimal-gqa
The code for the paper "Cost-Optimal Grouped-Query Attention for Long-Context Modeling"
PRITHIVSAKTHIUR/Molmo2-HF-Demo
A Gradio-based demonstration for the AllenAI Molmo2-8B multimodal model, enabling image QA,...