Llm Domain Datasets NLP Tools

There are 11 llm domain datasets tools tracked. The highest-rated is williamliujl/CMExam at 41/100 with 84 stars.

Get all 11 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=llm-domain-datasets&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 williamliujl/CMExam

A Chinese National Medical Licensing Examination dataset and large languge...

41
Emerging
2 zjunlp/IEPile

[ACL 2024] IEPile: A Large-Scale Information Extraction Corpus

39
Emerging
3 StefanHeng/ProgGen

Code for paper "ProgGen: Generating Named Entity Recognition Datasets...

37
Emerging
4 Yinghao-Li/GnO-IE

Code for "A Simple but Effective Approach to Improve Structured Language...

32
Emerging
5 MaheshJakkala/naamapadam-multilingual-ner

Benchmarking NER on Naamapadam across 7 Indic languages. EDA + model...

22
Experimental
6 yaoyiran/BLI-Reading-List

A 2024 Reading List for Bilingual Lexicon Induction (BLI) / Word...

21
Experimental
7 Maryam-Nasseri/SFA-Lexical-Complexity

Supplementary materials for the journal article Structural Factor Analysis...

21
Experimental
8 ryang1119/OOMB

Repo for "Can Large Language Models be Effective Online Opinion Miners?"...

17
Experimental
9 ryang1119/ATOSS

Repo for "Make Compound Sentences Simple to Analyze: Learning to Split...

15
Experimental
10 abhishekmaity/BhashaMind

Low-Resource Bengali LLM for Summarization and Classification

13
Experimental
11 MariaSahakyan/peer_review_project

This repository contains the custom code and anonymized data used to produce...

13
Experimental