Structured Data Inference NLP Tools

Datasets and benchmarks for NLI, table understanding, text-to-SQL, and instruction-following tasks involving structured or semi-structured data. Does NOT include general sentiment analysis, classification tasks without structured reasoning components, or commonsense knowledge resources without explicit inference evaluation.

There are 78 structured data inference tools tracked. The highest-rated is ymcui/cmrc2018 at 49/100 with 451 stars.

Get all 78 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=structured-data-inference&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 ymcui/cmrc2018

A Span-Extraction Dataset for Chinese Machine Reading Comprehension (CMRC 2018)

49
Emerging
2 princeton-nlp/DensePhrases

[ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021:...

45
Emerging
3 thunlp/MultiRD

Code and data of the AAAI-20 paper "Multi-channel Reverse Dictionary Model"

45
Emerging
4 IndexFziQ/KMRC-Papers

A list of recent papers regarding knowledge-based machine reading comprehension.

42
Emerging
5 danqi/rc-cnn-dailymail

CNN/Daily Mail Reading Comprehension Task

40
Emerging
6 intfloat/SimKGC

ACL 2022, SimKGC: Simple Contrastive Knowledge Graph Completion with...

39
Emerging
7 declare-lab/CIDER

This repository contains the dataset and the pytorch implementations of the...

39
Emerging
8 ShiZhengyan/StepGame

[AAAI 2022] Dataset and pytorch codes for the paper titled "StepGame: A New...

39
Emerging
9 zjunlp/MKG_Analogy

[ICLR 2023] Multimodal Analogical Reasoning over Knowledge Graphs

39
Emerging
10 maastrichtlawtech/gdsr

🕸️ A graph-augmented dense statute retriever. (EACL 2023)

39
Emerging
11 shmsw25/AmbigQA

An original implementation of EMNLP 2020, "AmbigQA: Answering Ambiguous...

38
Emerging
12 IndexFziQ/MSMARCO-MRC-Analysis

Analysis on the MS-MARCO leaderboard regarding the machine reading...

37
Emerging
13 GeekDream-x/IDOL

Repo for paper "IDOL: Indicator-oriented Logic Pre-training for Logical...

37
Emerging
14 utahnlp/knowledge_infotabs

Repository containing code for the NAACL 2021 paper (Incorporating External...

37
Emerging
15 yuweihao/reclor

Code for "ReClor: A Reading Comprehension Dataset Requiring Logical...

36
Emerging
16 XingLuxi/KMRC-Research-Archive

🗂 Research about Knowledge-based Machine Reading Comprehension

35
Emerging
17 phanxuanphucnd/Active-learning-in-NLP

Active learning in NLP

35
Emerging
18 FeiWang96/GTR

[SIGIR 2021] Retrieving Complex Tables with Multi-Granular Graph...

34
Emerging
19 webis-de/acl22-revisiting-uncertainty-based-query-strategies-for-active-learning-with-transformers

Revisiting Uncertainty-based Query Strategies for Active Learning with Transformers

34
Emerging
20 anshitag/memit_csk

Source repository for Editing Common Sense in Transformers (EMNLP 2023)

34
Emerging
21 amazon-science/pizza-semantic-parsing-dataset

The PIZZA dataset continues the exploration of task-oriented parsing by...

34
Emerging
22 marceljahnke/negative-cache

PyTorch Implementation of the Paper "Efficient Training of Retrieval Models...

33
Emerging
23 amazon-science/wqa-multi-sentence-inference

This repository contains code used for our Multi Sentence Inference NAACL'22 paper.

32
Emerging
24 ymcui/expmrc

ExpMRC: Explainability Evaluation for Machine Reading Comprehension

32
Emerging
25 sherlcok314159/ChineseMRC-Data

收集了目前为止中文领域的MRC抽取式数据集

32
Emerging
26 thunlp/CokeBERT

CokeBERT: Contextual Knowledge Selection and Embedding towards Enhanced...

32
Emerging
27 acidAnn/semeval2022_task7_starter_kit

:bulb: Starter kit for SemEval 2022 Task 7: Identifying Plausible...

32
Emerging
28 humanlab/rare-class-AL

AL for rare class strategies compared in the paper "Transfer and Active...

31
Emerging
29 ict-bigdatalab/CorpusBrain

CIKM 2022: CorpusBrain: Pre-train a Generative Retrieval Model for...

31
Emerging
30 USSiamaboat/polytuplet-loss

A Reverse Approach to Training Reading Comprehension and Logical Reasoning Models

31
Emerging
31 ai-systems/tg2022task_premise_retrieval

TextGraphs Shared Task on Natural Language Premise Selection

31
Emerging
32 Jordy-VL/uncertainty-bench

Code repository for **Benchmarking Scalable Predictive Uncertainty in Text...

31
Emerging
33 Dibyakanti/AutoTNLI-code

This repository contains the official code for the paper : Realistic Data...

30
Emerging
34 psunlpgroup/XSemPLR

Data and code for ACL 2023 paper XSemPLR: Cross-Lingual Semantic Parsing in...

29
Experimental
35 testzer0/AmbiQT

Code and Assets for "Benchmarking and Improving Text-to-SQL Generation Under...

29
Experimental
36 pietrolesci/anchoral

This is the official PyTorch implementation for our NAACL 2024 paper:...

28
Experimental
37 ZeinabAghahadi/Syllogistic-Commonsense-Reasoning

Deductive Commonsense Reasoning

28
Experimental
38 krystalan/Multi-hopRC

:notebook_with_decorative_cover: notes for Multi-hop Reading Comprehension...

28
Experimental
39 minnesotanlp/infoVerse

Jaehyung Kim et al's ACL 2023 paper on "infoVerse: A Universal Framework for...

27
Experimental
40 Pzoom522/xANLG

Data and code for "Understanding Linearity of Cross-Lingual Word Embedding...

27
Experimental
41 cognitiveailab/tg2021task

Participant Kit for the TextGraphs-15 Shared Task on Explanation Regeneration

27
Experimental
42 INK-USC/RiddleSense

RiddleSense: Reasoning about Riddle Questions Featuring Linguistic...

27
Experimental
43 phosseini/GisPy

GisPy: A Tool for Measuring Gist Inference Score in Text...

27
Experimental
44 THU-KEG/COPEN

The official code and dataset for EMNLP 2022 paper "COPEN: Probing...

26
Experimental
45 MultimodalGeo/GeoText-1652

An offical repo for ECCV 2024 Towards Natural Language-Guided Drones:...

26
Experimental
46 ZhengZixiang/MRCPapers

Worth-reading paper list and other awesome resources on Machine Reading...

25
Experimental
47 mariomeissner/AmbiNLI

This is the code for the paper "Embracing Ambiguity: Shifting the Training...

24
Experimental
48 MSR-LIT/Splash

Release of SPLASH: Dataset for semantic parse correction with natural...

24
Experimental
49 yul091/UnBED

Codebase for the ACL 2023 paper: "Uncertainty-Aware Bootstrap Learning for...

24
Experimental
50 rycolab/evidence-probing

Code and data for the ACL 2022 paper "Probing as Quantifying Inductive Bias".

23
Experimental
51 semeval-2026-kclarity/clarity

Code release for KCLarity at SemEval-2026 Task 6: Encoder and Zero-Shot...

23
Experimental
52 Advancing-Machine-Human-Reasoning-Lab/transformer-psychometrics

Code to reproduce experiments in our *SEM 2021 Paper

22
Experimental
53 Raising-hrx/MetGen

An implementation for MetGen: A Module-Based Entailment Tree Generation...

21
Experimental
54 maastrichtlawtech/fusion

🔗 Hybrid retrieval in the legal domain

21
Experimental
55 salesforce/FewXC

Official code and data release for Efficiently Aligned Cross-Lingual...

21
Experimental
56 megagonlabs/xatu

🕊️ Code and Data for XATU: A Fine-grained Instruction-based Benchmark for...

20
Experimental
57 nlp-waseda/dcsg-ja

Dialogue Commonsense Graph in Japanese

20
Experimental
58 megagonlabs/ambignlg

:dog: Data for AmbigNLG: Addressing Task Ambiguity in Instruction for NLG...

20
Experimental
59 naver/ms-marco-shift

A Fine-Grained Analysis of Distribution Shifts in MSMARCO (MS-Shift)....

20
Experimental
60 fajri91/discourse_probing

Discourse Probing of Pretrained Language Models. In Proceedings of NAACL 2021.

20
Experimental
61 Nativeatom/FRoG

Fuzzy reasoning of Generalized Quantifiers (EMNLP 2024)

20
Experimental
62 XInfoTabS/dataset

The Official dataset for "XINFOTABS: Evaluating Multilingual Tabular Natural...

19
Experimental
63 INK-USC/ER-Test

Code for ER-Test, accepted to the Findings of EMNLP 2022

19
Experimental
64 amazon-science/resource-constrained-naturalized-semantic-parsing

This repository is made public for reproducibility of our recent work on...

19
Experimental
65 zhengyima/Anchors

Source code of CIKM2021 Paper 'Pre-training for Ad-hoc Retrieval: Hyperlink...

19
Experimental
66 LaVi-Lab/C2LEVA

[Findings of ACL 2025] "C2LEVA: Toward Comprehensive and Contamination-Free...

19
Experimental
67 gianluigilopardo/anchors_text_theory

Code for the paper "A Sea of Words: An In-Depth Analysis of Anchors for Text...

19
Experimental
68 IndexFziQ/IIE-NLP-Eyas-SemEval2021

Code of IIE-NLP-Eyas Team for ReCAM (Task 4) @SemEval2021...

18
Experimental
69 Nativeatom/PRESQUE

The repository for "Pragmatic Reasoning Unlocks Quantifier Semantics for...

18
Experimental
70 HKUST-KnowComp/atomic-conceptualization

Code and data for the paper Acquiring and Modelling Abstract Commonsense...

18
Experimental
71 dyan-dy/Baidu-LIC2021-MRC

models and codes for baiduAI LIC 2021 MRC tasks, based on paddlenlp

17
Experimental
72 collapseindex/ci-curation

CI-Guided Data Curation: Using prediction instability to detect label noise....

12
Experimental
73 RishiHazra/Actively-reducing-redundancies-in-Active-Learning-for-Sequence-Tagging

Active Learning for sequence tagging

12
Experimental
74 Lizhecheng02/DRS

[ACL 2025] Repository for our paper "DRS: Deep Question Reformulation With...

12
Experimental
75 Info-Sync/InfoSync

Implementation of the semi-structured inference model in our ACL 2023 paper:...

11
Experimental
76 putmanmodel/putman-model-paper

Preprint + pseudocode for the PUTMAN Model (relational meaning graphs,...

11
Experimental
77 rbhubert/recall

Tool for the recovery of relevant information through classification in an...

10
Experimental
78 trailerAI/KoDPR

Korean Dense Passage Retrieval (KoDPR)

10
Experimental