loomchild/maligna

Bilingual sengence aligner

/ 100

Emerging

This tool helps linguists, translators, and language researchers create bilingual text corpora by aligning sentences across two different language documents. You input two text documents, each in a different language, and it outputs a synchronized pair of files where corresponding sentences are matched. This is useful for building translation memories, training machine translation models, or creating probabilistic dictionaries.

Use this if you need to automatically align large volumes of bilingual text documents at the sentence level to create a parallel corpus or translation memory.

Not ideal if you only need to align short, informal texts or if you require alignment at a sub-sentence level, like word-for-word, without prior sentence alignment.

translation-memory computational-linguistics bilingual-corpus language-research localization

No Package No Dependents

Maintenance 6 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

Forks

Language

License

MIT

Higher-rated alternatives

luheng/deep_srl

Code and pre-trained model for: Deep Semantic Role Labeling: What Works and What's Next

sileod/tasksource

Datasets collection and preprocessings framework for NLP extreme multitask learning

CK-Explorer/DuoSubs

Semantic subtitle aligner and merger for bilingual subtitle syncing.

coastalcph/lex-glue

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

ChineseGLUE/ChineseGLUE

Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained...

Explore NLP Tools

All categories Trending NLP directory Insights