Text Alignment Systems NLP Tools

Tools for aligning texts across languages, documents, or modalities (word-level, sentence-level, or document-level). Includes cross-lingual alignment, monolingual alignment, and narrative/script synchronization. Does NOT include general translation, similarity matching without explicit alignment output, or semantic parsing.

There are 97 text alignment systems tools tracked. The highest-rated is luheng/deep_srl at 49/100 with 334 stars.

Get all 97 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=text-alignment-systems&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	luheng/deep_srl Code and pre-trained model for: Deep Semantic Role Labeling: What Works and...	49	Emerging	334	Python
2	sileod/tasksource Datasets collection and preprocessings framework for NLP extreme multitask learning	48	Emerging	193	Python
3	loomchild/maligna Bilingual sengence aligner	46	Emerging	29	AL
4	CK-Explorer/DuoSubs Semantic subtitle aligner and merger for bilingual subtitle syncing.	41	Emerging	7	Python
5	coastalcph/lex-glue LexGLUE: A Benchmark Dataset for Legal Language Understanding in English	40	Emerging	244	Python
6	ChineseGLUE/ChineseGLUE Language Understanding Evaluation benchmark for Chinese: datasets,...	40	Emerging	1,786	Python
7	gkiril/benchie Comprehensive evaluation framework for Open Information Extraction.	40	Emerging	40	Python
8	PhilipMay/stsb-multi-mt Machine translated multilingual STS benchmark dataset.	40	Emerging	33	Python
9	naver-ai/korean-safety-benchmarks Official datasets and pytorch implementation repository of SQuARe and KoSBi...	39	Emerging	249	Python
10	scofield7419/HeSyFu Code for the ACL2021 paper: Better Combine Them Together! Integrating...	38	Emerging	14	Python
11	IINemo/isanlp_srl_framebank SRL parser for Russian based on FrameBank corpus	37	Emerging	27	Jupyter Notebook
12	vecto-ai/word-benchmarks Benchmarks for intrinsic word embeddings evaluation.	36	Emerging	66	—
13	TalSchuster/CrossLingualContextualEmb Cross-Lingual Alignment of Contextual Word Embeddings	36	Emerging	99	Python
14	ardoco/benchmark A benchmark repository for TLR between (textual) Software Architecture...	36	Emerging	3	Python
15	ubisoft/ubisoft-laforge-binaryalign BinaryAlign: Word Alignment as Binary Sequence Labeling	35	Emerging	11	Python
16	UKPLab/eacl2026-abcd-link Repository for reproducing results from ABCD-Link	35	Emerging	2	Python
17	Babelscape/ID10M Data and code for the paper "ID10M: Idiom Identification in 10 Languages"...	35	Emerging	8	Python
18	cdli-gh/Semantic-Role-Labeler A semantic role labeling system for the Sumerian language. A Google Summer...	35	Emerging	16	HTML
19	SapienzaNLP/gsrl GSRL is a seq2seq model for end-to-end dependency- and span-based SRL (IJCAI2021).	34	Emerging	18	Python
20	GuillaumeDD/dialign Automatic and generic measures of verbal alignment in dyadic dialogue based...	34	Emerging	13	Scala
21	Babelscape/CroCoAlign A Cross-Lingual, Context-Aware and Fully-Neural Sentence Alignment System...	34	Emerging	10	Python
22	ku-nlp/JKUSea Utilitary tool aligning sentences of texts written in 2 different languages.	33	Emerging	8	Perl
23	thunlp/DictSKB Code and data of the paper "Automatic Construction of Sememe Knowledge Bases...	33	Emerging	4	Python
24	qiyuw/WSPAlign WSPAlign: Word Alignment Pre-training via Large-Scale Weakly Supervised Span...	32	Emerging	12	Python
25	doc-analysis/XFUND XFUND: A Multilingual Form Understanding Benchmark	32	Emerging	217	—
26	LaVi-Lab/CLEVA [EMNLP 2023 Demo] "CLEVA: Chinese Language Models EVAluation Platform"	32	Emerging	64	Shell
27	tschomacker/aligned-narrative-documents A collection of scripts to create a Document-aligned corpus of German...	31	Emerging	4	Python
28	scofield7419/LAGCN-SRL Codes for the AAAI 2021 paper: Encoder-Decoder Based Unified Semantic Role...	31	Emerging	4	Python
29	tyjiangU/fido Code for the paper "Exploiting Definitions for Frame Identification"	31	Emerging	3	Python
30	amazon-science/real-world-noisy-benchmarks-for-natural-language-understanding Benchmark test sets for real-world noise phenomena in goal-directed...	31	Emerging	3	—
31	thespectrewithin/joint_align Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple...	31	Emerging	52	Python
32	orzhan/rusimscore Code for paper "RuSimScore: unsupervised scoring function for Russian...	31	Emerging	3	Python
33	UKPLab/acl2024-ircoder Data creation, training and eval scripts for the IRCoder paper	30	Emerging	20	Python
34	strubell/preprocess-conll05 Scripts for preprocessing the CoNLL-2005 SRL dataset.	30	Emerging	24	Shell
35	luciusssss/MiLiC-Eval [ACL'25 Findings] MiLiC-Eval: Benchmarking Multilingual LLMs for China's...	30	Emerging	5	Python
36	p-lambda/swords The Stanford Word Substitution (Swords) Benchmark	30	Emerging	33	Python
37	SapienzaNLP/dsrl Code for "Semantic Role Labeling meets Definition Modeling: using natural...	29	Experimental	7	Perl
38	rggdmonk/hadal A simple and eﬀicient tool for mining and aligning sentences with pre-trained models.	29	Experimental	6	Python
39	google/BEGIN-dataset A benchmark dataset for evaluating dialog system and natural language...	29	Experimental	39	—
40	allenai/multicite MultiCite code and data. Models are available on Huggingface.	28	Experimental	33	Python
41	Tixierae/WECD Code and data for the paper: 'Word Embeddings for the Construction Domain'	28	Experimental	6	Python
42	v-hirak/explaining-MT-difficulty Dataset of diverse typological language properties as part of "Assessing the...	27	Experimental	1	—
43	ryokamoi/wice This repository contains the dataset and code for "WiCE: Real-World...	27	Experimental	42	Python
44	longxudou/multispider MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing	27	Experimental	9	Python
45	lyutyuh/structured-span-selector A Structured Span Selector (NAACL 2022). A structured span selector with a...	26	Experimental	21	Python
46	liutianlin0121/decoding-time-realignment Implementation of "Decoding-time Realignment of Language Models", ICML 2024.	25	Experimental	21	Jupyter Notebook
47	jacklxc/CORWA CORWA: A Citation-Oriented Related Work Annotation Dataset, NAACL 2022	25	Experimental	17	Jupyter Notebook
48	ShiZhengyan/IngredientParsing Dataset and pytorch codes for the paper titled "Attention-based Ingredient...	25	Experimental	8	Python
49	cvjena/chiasmus-detector Code for paper "Data-Driven Detection of General Chiasmi Using Lexical and...	24	Experimental	2	Python
50	Sam120204/Pluralistic-Alignment-for-Healthcare Code of our paper - "Pluralistic Alignment for Healthcare: A Role-Driven...	24	Experimental	3	Python
51	guilhermevarela/deep_srlbr SRL task using PropBank 1.1	23	Experimental	3	Jupyter Notebook
52	garfieldpigljy/CrowdWSA2019 Crowdsourced Word Sequence Aggregation 2019	23	Experimental	4	Jupyter Notebook
53	yumoxu/detnet Code and dataset for TACL 19: Weakly Supervised Domain Detection.	22	Experimental	19	Python
54	Botfuel/benchmark-nlp NLP benchmark test sentences and full results	21	Experimental	13	—
55	samchengcs/IKEA-Dataset A dataset for multimodal machine translation	21	Experimental	13	—
56	tsar-workshop/tsar-2025-shared-task Code and data for TSAR 2025 Shared Task	21	Experimental	2	Python
57	ZurichNLP/ConLoan A Contrastive Multilingual Dataset for Evaluating Loanwords - ACL2025	20	Experimental	2	Python
58	nikolayVv/MultiParaphrase Comparing and evaluating monolingual paraphrasing of English, German, Czech,...	20	Experimental	6	Jupyter Notebook
59	pranav-ust/cognates ACL SRW paper: Alignment Analysis of Sequential Segmentation of Lexicons to...	20	Experimental	5	Jupyter Notebook
60	DominiqueMercier/ImpactCite ImpactCite: A XLNet-based Solution Enabling Qualitative CitationImpact...	20	Experimental	5	Jupyter Notebook
61	SapienzaNLP/conception Code and experiments for the COLING2020 paper "Conception:...	20	Experimental	11	Java
62	kukas/word-alignment-visualization Word Alignment Visualization is a Python package for visualizing word...	20	Experimental	7	Jupyter Notebook
63	sileod/metaeval Collection of tasks for meta-learning and extreme multitask learning	20	Experimental	5	Python
64	SapienzaNLP/srl-pas-probing Probing for Predicate Argument Structures in Pretrained Language Models (ACL 2022).	20	Experimental	6	Python
65	gling07/Text2DRS System Text2Drs takes English narrative as an input and outputs a discourse...	20	Experimental	8	Assembly
66	maxkagamine/word-alignment-demo Demonstration of AI/neural word alignment of English & Japanese text using...	19	Experimental	4	Python
67	SapienzaNLP/united-srl A unified dataset for span- and dependency-based multilingual and...	19	Experimental	3	—
68	qiyuw/WSPAlign.InferEval Inference library and evaluation script for WSPAlign...	19	Experimental	4	Python
69	ghomasHudson/muld The Multitask Long Document Benchmark	19	Experimental	42	Python
70	SapienzaNLP/usea Universal Semantic Annotator (LREC 2022)	19	Experimental	18	—
71	mbanon/benchmarks Several benchmarks on sentence splitting and language identification	19	Experimental	3	Mathematica
72	SapienzaNLP/exploring-srl Repository for the paper "Exploring Non-Verbal Predicates in Semantic Role...	19	Experimental	3	—
73	hexuandeng/HExp4UDS Implementation of the paper “Holistic Exploration on Universal...	19	Experimental	4	Python
74	SapienzaNLP/unify-srl Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic...	19	Experimental	17	Python
75	okalai-ai/moimoe Typology-Guided Adaption in Multilingual Models	19	Experimental	2	HTML
76	joshstephenson/SEAS Tools for extracting and aligning sentences from subtitle language pairs...	19	Experimental	1	Python
77	DorinK/Principal-Parts-Detection Multilingual dataset for principal parts detection in inflectional...	18	Experimental	1	—
78	hmosousa/professor_heideltime Create a multilingual corpus weakly labeled with HeidelTime.	17	Experimental	1	Python
79	agneknie/com4520DarwinProject Adjacent code related to the paper prepared for Joint Workshop on Multiword...	17	Experimental	1	Jupyter Notebook
80	bMagicLAB/human-alignment-pl-en-codeswitch Human-in-the-Loop alignment dataset for Polish-English code-switching...	15	Experimental	—	—
81	Toavinarandrianarivo/Scene2Chapter-NLP-Aligner 📖 Align movie scripts with novel chapters seamlessly using advanced NLP...	14	Experimental	—	Python
82	Youggls/ACROSS-ACL23 Official code repo for paper: ACROSS: An Alignment-based Framework for...	13	Experimental	12	—
83	multilingual-dataset-survey/multilingual-dataset-survey.github.io The website implementation of Findings of EMNLP 2022, "Beyond Counting...	13	Experimental	—	JavaScript
84	xiaomeng-zhu/LIEDER Repository for the ACL 2024 paper "LIEDER: Linguistically-Informed...	12	Experimental	5	R
85	heyjoonkim/APA Pytorch implementation of "Aligning Language Models to Explicitly Handle...	12	Experimental	5	Python
86	kinit-sk/multiclaim MultiClaim dataset repository	12	Experimental	—	Python
87	seinecle/umibench Testbench for sentiment and factuality in texts.	11	Experimental	3	Roff
88	INTERACT-LLM/alignment-drift-llms Dataset and analysis code for BEA2025 paper @ ACL: "Alignment Drift in...	11	Experimental	—	HTML
89	squirridge/omod orthographic mapping ondemand dataset	11	Experimental	1	—
90	NUS-IDS/CW-CURE This is the official data repository for the following CIKM 2022 paper from...	11	Experimental	3	—
91	MrShininnnnn/CECW This repository is for the Colorful Extended Cleanup World (CECW) dataset, a...	11	Experimental	3	Jupyter Notebook
92	da03/Epanadiplosis_Benchmark Benchmarking the performance of various language models in generating...	11	Experimental	3	Python
93	zahra-parvizian/PersianLexicalSimplifier Persian text simplification using lexical simplification	11	Experimental	—	Jupyter Notebook
94	BasRizk/DatasetAligner Generating variant of TV-shows based labelled data-set in language B from...	10	Experimental	2	Python
95	oooranz/MonoAlign Unsupervised monolingual word aligner	10	Experimental	2	Python
96	minnesotanlp/taddex Code and dataset for Martin et al's paper "Complex Mathematical Symbol...	10	Experimental	2	Python
97	ocramz/nlp-data-superglue Dataset parsers from the SuperGLUE benchmark https://super.gluebenchmark.com/tasks/	10	Experimental	2	Haskell