asmelashteka/HornMT
Machine translation (MT) benchmark dataset for languages in the Horn of Africa.
This project provides a comprehensive collection of news snippets translated across multiple languages spoken in the Horn of Africa, alongside English. You get parallel text data in formats like plain text, Excel, or JSON, with each snippet accompanied by metadata such as its category, source, and publication date. This is designed for researchers, language service providers, or AI developers working on machine translation for languages like Amharic, Oromo, Somali, and Tigrinya.
No commits in the last 6 months.
Use this if you need high-quality, pre-aligned textual data to train or evaluate machine translation systems for Horn of Africa languages.
Not ideal if you are looking for a translation API or an end-user translation tool, as this provides raw data for development.
Stars
42
Forks
13
Language
—
License
—
Category
Last pushed
Oct 13, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/asmelashteka/HornMT"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
nlp-uoregon/trankit
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
UBC-NLP/turjuman
TURJUMAN, a neural toolkit for translating from 20 languages into Modern Standard Arabic (MSA).
sagorbrur/codeswitch
CodeSwitch is a NLP tool, can use for language identification, pos tagging, name entity...
nusnlp/esc
The official code of the "Frustratingly Easy System Combination for Grammatical Error Correction" paper
nusnlp/greco
The official code for the "System Combination via Quality Estimation for Grammatical Error...