NLP Resource Collections ML Frameworks

Curated lists, datasets, and reference materials for Natural Language Processing across languages and domains. Does NOT include implementations of NLP models, tutorials, or frameworks—only aggregated resources and paper collections.

There are 17 nlp resource collections frameworks tracked. 2 score above 50 (established tier). The highest-rated is jonathanwvd/awesome-industrial-datasets at 55/100 with 359 stars.

Get all 17 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=nlp-resource-collections&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Framework Score Tier
1 jonathanwvd/awesome-industrial-datasets

A curated collection of public industrial datasets.

55
Established
2 leomaurodesenv/game-datasets

:video_game: A curated list of awesome game datasets, and tools to...

53
Established
3 jsbroks/awesome-dataset-tools

🔧 A curated list of awesome dataset tools

48
Emerging
4 NTMC-Community/awesome-neural-models-for-semantic-match

A curated list of papers dedicated to neural text (semantic) matching.

48
Emerging
5 haiker2011/awesome-nlp-sentiment-analysis

:book: 收集NLP领域相关的数据集、论文、开源实现,尤其是情感分析、情绪原因识别、评价对象和评价词抽取方面。

47
Emerging
6 maastrichtlawtech/awesome-legal-nlp

📖 A curated list of LegalNLP resources from all around the web.

45
Emerging
7 ml4code/ml4code.github.io

Website for "A Survey of Machine Learning for Big Code and Naturalness"

41
Emerging
8 Jamie-Cui/paper-pulse

Automatically fetch, filter, and summarize research papers from arXiv & IACR...

38
Emerging
9 Huffon/NLP101

NLP 101: a resource repository for Deep Learning and Natural Language Processing

37
Emerging
10 coteries/cedille-ai

✒️ Cedille is a large French language model (6B), released under an...

36
Emerging
11 vandroogenbroeckmarc/doi2bib

Tool to convert a DOI to a BiBTeX entry (mainly "adapted" for the computer...

33
Emerging
12 enochkan/awesome-gans-and-deepfakes

A curated list of GAN & Deepfake papers and repositories.

33
Emerging
13 MEgooneh/awesome-Iran-datasets

Iranian/Persian Datasets. دیتاست‌های فارسی و ایرانی

33
Emerging
14 tushartushar/ML4SCA

Machine Learning for Source Code Analysis

31
Emerging
15 bdqnghi/awesome-ai4code

A collection of recent papers, benchmarks and datasets of AI4Code domain.

24
Experimental
16 sciknoworg/ald-ale-orkg-review

The repository contains code to automate extraction of review tables from...

21
Experimental
17 nlx-group/study-of-commonsense-reasoning

Code and data for Masters Dissertation "A Study of Commonsense Reasoning...

12
Experimental