Text Alignment Systems NLP Tools
Tools for aligning texts across languages, documents, or modalities (word-level, sentence-level, or document-level). Includes cross-lingual alignment, monolingual alignment, and narrative/script synchronization. Does NOT include general translation, similarity matching without explicit alignment output, or semantic parsing.
There are 97 text alignment systems tools tracked. The highest-rated is luheng/deep_srl at 49/100 with 334 stars.
Get all 97 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=text-alignment-systems&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
luheng/deep_srl
Code and pre-trained model for: Deep Semantic Role Labeling: What Works and... |
|
Emerging |
| 2 |
sileod/tasksource
Datasets collection and preprocessings framework for NLP extreme multitask learning |
|
Emerging |
| 3 |
loomchild/maligna
Bilingual sengence aligner |
|
Emerging |
| 4 |
CK-Explorer/DuoSubs
Semantic subtitle aligner and merger for bilingual subtitle syncing. |
|
Emerging |
| 5 |
coastalcph/lex-glue
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English |
|
Emerging |
| 6 |
ChineseGLUE/ChineseGLUE
Language Understanding Evaluation benchmark for Chinese: datasets,... |
|
Emerging |
| 7 |
gkiril/benchie
Comprehensive evaluation framework for Open Information Extraction. |
|
Emerging |
| 8 |
PhilipMay/stsb-multi-mt
Machine translated multilingual STS benchmark dataset. |
|
Emerging |
| 9 |
naver-ai/korean-safety-benchmarks
Official datasets and pytorch implementation repository of SQuARe and KoSBi... |
|
Emerging |
| 10 |
scofield7419/HeSyFu
Code for the ACL2021 paper: Better Combine Them Together! Integrating... |
|
Emerging |
| 11 |
IINemo/isanlp_srl_framebank
SRL parser for Russian based on FrameBank corpus |
|
Emerging |
| 12 |
vecto-ai/word-benchmarks
Benchmarks for intrinsic word embeddings evaluation. |
|
Emerging |
| 13 |
TalSchuster/CrossLingualContextualEmb
Cross-Lingual Alignment of Contextual Word Embeddings |
|
Emerging |
| 14 |
ardoco/benchmark
A benchmark repository for TLR between (textual) Software Architecture... |
|
Emerging |
| 15 |
ubisoft/ubisoft-laforge-binaryalign
BinaryAlign: Word Alignment as Binary Sequence Labeling |
|
Emerging |
| 16 |
UKPLab/eacl2026-abcd-link
Repository for reproducing results from ABCD-Link |
|
Emerging |
| 17 |
Babelscape/ID10M
Data and code for the paper "ID10M: Idiom Identification in 10 Languages"... |
|
Emerging |
| 18 |
cdli-gh/Semantic-Role-Labeler
A semantic role labeling system for the Sumerian language. A Google Summer... |
|
Emerging |
| 19 |
SapienzaNLP/gsrl
GSRL is a seq2seq model for end-to-end dependency- and span-based SRL (IJCAI2021). |
|
Emerging |
| 20 |
GuillaumeDD/dialign
Automatic and generic measures of verbal alignment in dyadic dialogue based... |
|
Emerging |
| 21 |
Babelscape/CroCoAlign
A Cross-Lingual, Context-Aware and Fully-Neural Sentence Alignment System... |
|
Emerging |
| 22 |
ku-nlp/JKUSea
Utilitary tool aligning sentences of texts written in 2 different languages. |
|
Emerging |
| 23 |
thunlp/DictSKB
Code and data of the paper "Automatic Construction of Sememe Knowledge Bases... |
|
Emerging |
| 24 |
qiyuw/WSPAlign
WSPAlign: Word Alignment Pre-training via Large-Scale Weakly Supervised Span... |
|
Emerging |
| 25 |
doc-analysis/XFUND
XFUND: A Multilingual Form Understanding Benchmark |
|
Emerging |
| 26 |
LaVi-Lab/CLEVA
[EMNLP 2023 Demo] "CLEVA: Chinese Language Models EVAluation Platform" |
|
Emerging |
| 27 |
tschomacker/aligned-narrative-documents
A collection of scripts to create a Document-aligned corpus of German... |
|
Emerging |
| 28 |
scofield7419/LAGCN-SRL
Codes for the AAAI 2021 paper: Encoder-Decoder Based Unified Semantic Role... |
|
Emerging |
| 29 |
tyjiangU/fido
Code for the paper "Exploiting Definitions for Frame Identification" |
|
Emerging |
| 30 |
amazon-science/real-world-noisy-benchmarks-for-natural-language-understanding
Benchmark test sets for real-world noise phenomena in goal-directed... |
|
Emerging |
| 31 |
thespectrewithin/joint_align
Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple... |
|
Emerging |
| 32 |
orzhan/rusimscore
Code for paper "RuSimScore: unsupervised scoring function for Russian... |
|
Emerging |
| 33 |
UKPLab/acl2024-ircoder
Data creation, training and eval scripts for the IRCoder paper |
|
Emerging |
| 34 |
strubell/preprocess-conll05
Scripts for preprocessing the CoNLL-2005 SRL dataset. |
|
Emerging |
| 35 |
luciusssss/MiLiC-Eval
[ACL'25 Findings] MiLiC-Eval: Benchmarking Multilingual LLMs for China's... |
|
Emerging |
| 36 |
p-lambda/swords
The Stanford Word Substitution (Swords) Benchmark |
|
Emerging |
| 37 |
SapienzaNLP/dsrl
Code for "Semantic Role Labeling meets Definition Modeling: using natural... |
|
Experimental |
| 38 |
rggdmonk/hadal
A simple and efficient tool for mining and aligning sentences with pre-trained models. |
|
Experimental |
| 39 |
google/BEGIN-dataset
A benchmark dataset for evaluating dialog system and natural language... |
|
Experimental |
| 40 |
allenai/multicite
MultiCite code and data. Models are available on Huggingface. |
|
Experimental |
| 41 |
Tixierae/WECD
Code and data for the paper: 'Word Embeddings for the Construction Domain' |
|
Experimental |
| 42 |
v-hirak/explaining-MT-difficulty
Dataset of diverse typological language properties as part of "Assessing the... |
|
Experimental |
| 43 |
ryokamoi/wice
This repository contains the dataset and code for "WiCE: Real-World... |
|
Experimental |
| 44 |
longxudou/multispider
MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing |
|
Experimental |
| 45 |
lyutyuh/structured-span-selector
A Structured Span Selector (NAACL 2022). A structured span selector with a... |
|
Experimental |
| 46 |
liutianlin0121/decoding-time-realignment
Implementation of "Decoding-time Realignment of Language Models", ICML 2024. |
|
Experimental |
| 47 |
jacklxc/CORWA
CORWA: A Citation-Oriented Related Work Annotation Dataset, NAACL 2022 |
|
Experimental |
| 48 |
ShiZhengyan/IngredientParsing
Dataset and pytorch codes for the paper titled "Attention-based Ingredient... |
|
Experimental |
| 49 |
cvjena/chiasmus-detector
Code for paper "Data-Driven Detection of General Chiasmi Using Lexical and... |
|
Experimental |
| 50 |
Sam120204/Pluralistic-Alignment-for-Healthcare
Code of our paper - "Pluralistic Alignment for Healthcare: A Role-Driven... |
|
Experimental |
| 51 |
guilhermevarela/deep_srlbr
SRL task using PropBank 1.1 |
|
Experimental |
| 52 |
garfieldpigljy/CrowdWSA2019
Crowdsourced Word Sequence Aggregation 2019 |
|
Experimental |
| 53 |
yumoxu/detnet
Code and dataset for TACL 19: Weakly Supervised Domain Detection. |
|
Experimental |
| 54 |
Botfuel/benchmark-nlp
NLP benchmark test sentences and full results |
|
Experimental |
| 55 |
samchengcs/IKEA-Dataset
A dataset for multimodal machine translation |
|
Experimental |
| 56 |
tsar-workshop/tsar-2025-shared-task
Code and data for TSAR 2025 Shared Task |
|
Experimental |
| 57 |
ZurichNLP/ConLoan
A Contrastive Multilingual Dataset for Evaluating Loanwords - ACL2025 |
|
Experimental |
| 58 |
nikolayVv/MultiParaphrase
Comparing and evaluating monolingual paraphrasing of English, German, Czech,... |
|
Experimental |
| 59 |
pranav-ust/cognates
ACL SRW paper: Alignment Analysis of Sequential Segmentation of Lexicons to... |
|
Experimental |
| 60 |
DominiqueMercier/ImpactCite
ImpactCite: A XLNet-based Solution Enabling Qualitative CitationImpact... |
|
Experimental |
| 61 |
SapienzaNLP/conception
Code and experiments for the COLING2020 paper "Conception:... |
|
Experimental |
| 62 |
kukas/word-alignment-visualization
Word Alignment Visualization is a Python package for visualizing word... |
|
Experimental |
| 63 |
sileod/metaeval
Collection of tasks for meta-learning and extreme multitask learning |
|
Experimental |
| 64 |
SapienzaNLP/srl-pas-probing
Probing for Predicate Argument Structures in Pretrained Language Models (ACL 2022). |
|
Experimental |
| 65 |
gling07/Text2DRS
System Text2Drs takes English narrative as an input and outputs a discourse... |
|
Experimental |
| 66 |
maxkagamine/word-alignment-demo
Demonstration of AI/neural word alignment of English & Japanese text using... |
|
Experimental |
| 67 |
SapienzaNLP/united-srl
A unified dataset for span- and dependency-based multilingual and... |
|
Experimental |
| 68 |
qiyuw/WSPAlign.InferEval
Inference library and evaluation script for WSPAlign... |
|
Experimental |
| 69 |
ghomasHudson/muld
The Multitask Long Document Benchmark |
|
Experimental |
| 70 |
SapienzaNLP/usea
Universal Semantic Annotator (LREC 2022) |
|
Experimental |
| 71 |
mbanon/benchmarks
Several benchmarks on sentence splitting and language identification |
|
Experimental |
| 72 |
SapienzaNLP/exploring-srl
Repository for the paper "Exploring Non-Verbal Predicates in Semantic Role... |
|
Experimental |
| 73 |
hexuandeng/HExp4UDS
Implementation of the paper “Holistic Exploration on Universal... |
|
Experimental |
| 74 |
SapienzaNLP/unify-srl
Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic... |
|
Experimental |
| 75 |
okalai-ai/moimoe
Typology-Guided Adaption in Multilingual Models |
|
Experimental |
| 76 |
joshstephenson/SEAS
Tools for extracting and aligning sentences from subtitle language pairs... |
|
Experimental |
| 77 |
DorinK/Principal-Parts-Detection
Multilingual dataset for principal parts detection in inflectional... |
|
Experimental |
| 78 |
hmosousa/professor_heideltime
Create a multilingual corpus weakly labeled with HeidelTime. |
|
Experimental |
| 79 |
agneknie/com4520DarwinProject
Adjacent code related to the paper prepared for Joint Workshop on Multiword... |
|
Experimental |
| 80 |
bMagicLAB/human-alignment-pl-en-codeswitch
Human-in-the-Loop alignment dataset for Polish-English code-switching... |
|
Experimental |
| 81 |
Toavinarandrianarivo/Scene2Chapter-NLP-Aligner
📖 Align movie scripts with novel chapters seamlessly using advanced NLP... |
|
Experimental |
| 82 |
Youggls/ACROSS-ACL23
Official code repo for paper: ACROSS: An Alignment-based Framework for... |
|
Experimental |
| 83 |
multilingual-dataset-survey/multilingual-dataset-survey.github.io
The website implementation of Findings of EMNLP 2022, "Beyond Counting... |
|
Experimental |
| 84 |
xiaomeng-zhu/LIEDER
Repository for the ACL 2024 paper "LIEDER: Linguistically-Informed... |
|
Experimental |
| 85 |
heyjoonkim/APA
Pytorch implementation of "Aligning Language Models to Explicitly Handle... |
|
Experimental |
| 86 |
kinit-sk/multiclaim
MultiClaim dataset repository |
|
Experimental |
| 87 |
seinecle/umibench
Testbench for sentiment and factuality in texts. |
|
Experimental |
| 88 |
INTERACT-LLM/alignment-drift-llms
Dataset and analysis code for BEA2025 paper @ ACL: "Alignment Drift in... |
|
Experimental |
| 89 |
squirridge/omod
orthographic mapping ondemand dataset |
|
Experimental |
| 90 |
NUS-IDS/CW-CURE
This is the official data repository for the following CIKM 2022 paper from... |
|
Experimental |
| 91 |
MrShininnnnn/CECW
This repository is for the Colorful Extended Cleanup World (CECW) dataset, a... |
|
Experimental |
| 92 |
da03/Epanadiplosis_Benchmark
Benchmarking the performance of various language models in generating... |
|
Experimental |
| 93 |
zahra-parvizian/PersianLexicalSimplifier
Persian text simplification using lexical simplification |
|
Experimental |
| 94 |
BasRizk/DatasetAligner
Generating variant of TV-shows based labelled data-set in language B from... |
|
Experimental |
| 95 |
oooranz/MonoAlign
Unsupervised monolingual word aligner |
|
Experimental |
| 96 |
minnesotanlp/taddex
Code and dataset for Martin et al's paper "Complex Mathematical Symbol... |
|
Experimental |
| 97 |
ocramz/nlp-data-superglue
Dataset parsers from the SuperGLUE benchmark https://super.gluebenchmark.com/tasks/ |
|
Experimental |