Babelscape/CroCoAlign
A Cross-Lingual, Context-Aware and Fully-Neural Sentence Alignment System for Long Texts.
This tool helps you quickly find corresponding sentences in very long documents that are written in different languages, such as a novel translated into several languages. You provide two versions of a document, each in a different language, and it outputs a list of matching sentences. This is ideal for linguists, translators, or researchers who need to analyze parallel texts.
No commits in the last 6 months.
Use this if you need to accurately identify and link equivalent sentences across lengthy documents written in two different languages.
Not ideal if you're working with short texts, only one language, or need to align at a word or phrase level rather than full sentences.
Stars
10
Forks
2
Language
Python
License
—
Category
Last pushed
Sep 11, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/Babelscape/CroCoAlign"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
luheng/deep_srl
Code and pre-trained model for: Deep Semantic Role Labeling: What Works and What's Next
sileod/tasksource
Datasets collection and preprocessings framework for NLP extreme multitask learning
loomchild/maligna
Bilingual sengence aligner
CK-Explorer/DuoSubs
Semantic subtitle aligner and merger for bilingual subtitle syncing.
coastalcph/lex-glue
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English