Structured Data Inference NLP Tools

Datasets and benchmarks for NLI, table understanding, text-to-SQL, and instruction-following tasks involving structured or semi-structured data. Does NOT include general sentiment analysis, classification tasks without structured reasoning components, or commonsense knowledge resources without explicit inference evaluation.

There are 78 structured data inference tools tracked. The highest-rated is ymcui/cmrc2018 at 49/100 with 451 stars.

Get all 78 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=structured-data-inference&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	ymcui/cmrc2018 A Span-Extraction Dataset for Chinese Machine Reading Comprehension (CMRC 2018)	49	Emerging	451	Python
2	princeton-nlp/DensePhrases [ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021:...	45	Emerging	606	Python
3	thunlp/MultiRD Code and data of the AAAI-20 paper "Multi-channel Reverse Dictionary Model"	45	Emerging	111	Python
4	IndexFziQ/KMRC-Papers A list of recent papers regarding knowledge-based machine reading comprehension.	42	Emerging	42	—
5	danqi/rc-cnn-dailymail CNN/Daily Mail Reading Comprehension Task	40	Emerging	292	Python
6	intfloat/SimKGC ACL 2022, SimKGC: Simple Contrastive Knowledge Graph Completion with...	39	Emerging	213	Python
7	declare-lab/CIDER This repository contains the dataset and the pytorch implementations of the...	39	Emerging	27	Python
8	ShiZhengyan/StepGame [AAAI 2022] Dataset and pytorch codes for the paper titled "StepGame: A New...	39	Emerging	32	Python
9	zjunlp/MKG_Analogy [ICLR 2023] Multimodal Analogical Reasoning over Knowledge Graphs	39	Emerging	132	Python
10	maastrichtlawtech/gdsr 🕸️ A graph-augmented dense statute retriever. (EACL 2023)	39	Emerging	25	Python
11	shmsw25/AmbigQA An original implementation of EMNLP 2020, "AmbigQA: Answering Ambiguous...	38	Emerging	121	Python
12	IndexFziQ/MSMARCO-MRC-Analysis Analysis on the MS-MARCO leaderboard regarding the machine reading...	37	Emerging	21	—
13	GeekDream-x/IDOL Repo for paper "IDOL: Indicator-oriented Logic Pre-training for Logical...	37	Emerging	22	Python
14	utahnlp/knowledge_infotabs Repository containing code for the NAACL 2021 paper (Incorporating External...	37	Emerging	17	Python
15	yuweihao/reclor Code for "ReClor: A Reading Comprehension Dataset Requiring Logical...	36	Emerging	83	Python
16	XingLuxi/KMRC-Research-Archive 🗂 Research about Knowledge-based Machine Reading Comprehension	35	Emerging	24	—
17	phanxuanphucnd/Active-learning-in-NLP Active learning in NLP	35	Emerging	14	Python
18	FeiWang96/GTR [SIGIR 2021] Retrieving Complex Tables with Multi-Granular Graph...	34	Emerging	48	Python
19	webis-de/acl22-revisiting-uncertainty-based-query-strategies-for-active-learning-with-transformers Revisiting Uncertainty-based Query Strategies for Active Learning with Transformers	34	Emerging	4	Python
20	anshitag/memit_csk Source repository for Editing Common Sense in Transformers (EMNLP 2023)	34	Emerging	6	Python
21	amazon-science/pizza-semantic-parsing-dataset The PIZZA dataset continues the exploration of task-oriented parsing by...	34	Emerging	20	Python
22	marceljahnke/negative-cache PyTorch Implementation of the Paper "Efficient Training of Retrieval Models...	33	Emerging	7	Python
23	amazon-science/wqa-multi-sentence-inference This repository contains code used for our Multi Sentence Inference NAACL'22 paper.	32	Emerging	12	Python
24	ymcui/expmrc ExpMRC: Explainability Evaluation for Machine Reading Comprehension	32	Emerging	62	Python
25	sherlcok314159/ChineseMRC-Data 收集了目前为止中文领域的MRC抽取式数据集	32	Emerging	122	—
26	thunlp/CokeBERT CokeBERT: Contextual Knowledge Selection and Embedding towards Enhanced...	32	Emerging	31	Python
27	acidAnn/semeval2022_task7_starter_kit :bulb: Starter kit for SemEval 2022 Task 7: Identifying Plausible...	32	Emerging	4	Python
28	humanlab/rare-class-AL AL for rare class strategies compared in the paper "Transfer and Active...	31	Emerging	4	Python
29	ict-bigdatalab/CorpusBrain CIKM 2022: CorpusBrain: Pre-train a Generative Retrieval Model for...	31	Emerging	34	Python
30	USSiamaboat/polytuplet-loss A Reverse Approach to Training Reading Comprehension and Logical Reasoning Models	31	Emerging	3	Python
31	ai-systems/tg2022task_premise_retrieval TextGraphs Shared Task on Natural Language Premise Selection	31	Emerging	4	Python
32	Jordy-VL/uncertainty-bench Code repository for **Benchmarking Scalable Predictive Uncertainty in Text...	31	Emerging	4	Jupyter Notebook
33	Dibyakanti/AutoTNLI-code This repository contains the official code for the paper : Realistic Data...	30	Emerging	6	HTML
34	psunlpgroup/XSemPLR Data and code for ACL 2023 paper XSemPLR: Cross-Lingual Semantic Parsing in...	29	Experimental	9	Shell
35	testzer0/AmbiQT Code and Assets for "Benchmarking and Improving Text-to-SQL Generation Under...	29	Experimental	9	Python
36	pietrolesci/anchoral This is the official PyTorch implementation for our NAACL 2024 paper:...	28	Experimental	22	Python
37	ZeinabAghahadi/Syllogistic-Commonsense-Reasoning Deductive Commonsense Reasoning	28	Experimental	8	Jupyter Notebook
38	krystalan/Multi-hopRC :notebook_with_decorative_cover: notes for Multi-hop Reading Comprehension...	28	Experimental	90	—
39	minnesotanlp/infoVerse Jaehyung Kim et al's ACL 2023 paper on "infoVerse: A Universal Framework for...	27	Experimental	16	Python
40	Pzoom522/xANLG Data and code for "Understanding Linearity of Cross-Lingual Word Embedding...	27	Experimental	12	Python
41	cognitiveailab/tg2021task Participant Kit for the TextGraphs-15 Shared Task on Explanation Regeneration	27	Experimental	19	Python
42	INK-USC/RiddleSense RiddleSense: Reasoning about Riddle Questions Featuring Linguistic...	27	Experimental	13	Python
43	phosseini/GisPy GisPy: A Tool for Measuring Gist Inference Score in Text...	27	Experimental	13	Assembly
44	THU-KEG/COPEN The official code and dataset for EMNLP 2022 paper "COPEN: Probing...	26	Experimental	21	Python
45	MultimodalGeo/GeoText-1652 An offical repo for ECCV 2024 Towards Natural Language-Guided Drones:...	26	Experimental	114	Python
46	ZhengZixiang/MRCPapers Worth-reading paper list and other awesome resources on Machine Reading...	25	Experimental	27	—
47	mariomeissner/AmbiNLI This is the code for the paper "Embracing Ambiguity: Shifting the Training...	24	Experimental	5	Jupyter Notebook
48	MSR-LIT/Splash Release of SPLASH: Dataset for semantic parse correction with natural...	24	Experimental	42	—
49	yul091/UnBED Codebase for the ACL 2023 paper: "Uncertainty-Aware Bootstrap Learning for...	24	Experimental	5	Python
50	rycolab/evidence-probing Code and data for the ACL 2022 paper "Probing as Quantifying Inductive Bias".	23	Experimental	3	Python
51	semeval-2026-kclarity/clarity Code release for KCLarity at SemEval-2026 Task 6: Encoder and Zero-Shot...	23	Experimental	2	Python
52	Advancing-Machine-Human-Reasoning-Lab/transformer-psychometrics Code to reproduce experiments in our *SEM 2021 Paper	22	Experimental	2	Python
53	Raising-hrx/MetGen An implementation for MetGen: A Module-Based Entailment Tree Generation...	21	Experimental	13	Python
54	maastrichtlawtech/fusion 🔗 Hybrid retrieval in the legal domain	21	Experimental	10	Python
55	salesforce/FewXC Official code and data release for Efficiently Aligned Cross-Lingual...	21	Experimental	3	Python
56	megagonlabs/xatu 🕊️ Code and Data for XATU: A Fine-grained Instruction-based Benchmark for...	20	Experimental	6	Python
57	nlp-waseda/dcsg-ja Dialogue Commonsense Graph in Japanese	20	Experimental	6	—
58	megagonlabs/ambignlg :dog: Data for AmbigNLG: Addressing Task Ambiguity in Instruction for NLG...	20	Experimental	6	Python
59	naver/ms-marco-shift A Fine-Grained Analysis of Distribution Shifts in MSMARCO (MS-Shift)....	20	Experimental	6	Jupyter Notebook
60	fajri91/discourse_probing Discourse Probing of Pretrained Language Models. In Proceedings of NAACL 2021.	20	Experimental	10	Jupyter Notebook
61	Nativeatom/FRoG Fuzzy reasoning of Generalized Quantifiers (EMNLP 2024)	20	Experimental	8	Python
62	XInfoTabS/dataset The Official dataset for "XINFOTABS: Evaluating Multilingual Tabular Natural...	19	Experimental	3	Python
63	INK-USC/ER-Test Code for ER-Test, accepted to the Findings of EMNLP 2022	19	Experimental	3	Python
64	amazon-science/resource-constrained-naturalized-semantic-parsing This repository is made public for reproducibility of our recent work on...	19	Experimental	3	—
65	zhengyima/Anchors Source code of CIKM2021 Paper 'Pre-training for Ad-hoc Retrieval: Hyperlink...	19	Experimental	16	Python
66	LaVi-Lab/C2LEVA [Findings of ACL 2025] "C2LEVA: Toward Comprehensive and Contamination-Free...	19	Experimental	2	—
67	gianluigilopardo/anchors_text_theory Code for the paper "A Sea of Words: An In-Depth Analysis of Anchors for Text...	19	Experimental	14	Python
68	IndexFziQ/IIE-NLP-Eyas-SemEval2021 Code of IIE-NLP-Eyas Team for ReCAM (Task 4) @SemEval2021...	18	Experimental	2	Python
69	Nativeatom/PRESQUE The repository for "Pragmatic Reasoning Unlocks Quantifier Semantics for...	18	Experimental	2	Python
70	HKUST-KnowComp/atomic-conceptualization Code and data for the paper Acquiring and Modelling Abstract Commonsense...	18	Experimental	23	Python
71	dyan-dy/Baidu-LIC2021-MRC models and codes for baiduAI LIC 2021 MRC tasks, based on paddlenlp	17	Experimental	1	Python
72	collapseindex/ci-curation CI-Guided Data Curation: Using prediction instability to detect label noise....	12	Experimental	1	Jupyter Notebook
73	RishiHazra/Actively-reducing-redundancies-in-Active-Learning-for-Sequence-Tagging Active Learning for sequence tagging	12	Experimental	8	Python
74	Lizhecheng02/DRS [ACL 2025] Repository for our paper "DRS: Deep Question Reformulation With...	12	Experimental	6	Python
75	Info-Sync/InfoSync Implementation of the semi-structured inference model in our ACL 2023 paper:...	11	Experimental	3	HTML
76	putmanmodel/putman-model-paper Preprint + pseudocode for the PUTMAN Model (relational meaning graphs,...	11	Experimental	—	—
77	rbhubert/recall Tool for the recovery of relevant information through classification in an...	10	Experimental	2	Python
78	trailerAI/KoDPR Korean Dense Passage Retrieval (KoDPR)	10	Experimental	2	Python

Comparisons in this category

KMRC-Papers and KMRC-Research-Archive (42 vs 35)