Llm Domain Datasets Transformer Models

There are 37 llm domain datasets models tracked. The highest-rated is mlabonne/llm-datasets at 47/100 with 4,319 stars.

Get all 37 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-domain-datasets&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 mlabonne/llm-datasets

Curated list of datasets and tools for post-training.

47
Emerging
2 malteos/llm-datasets

A collection of datasets for language model pretraining including scripts...

45
Emerging
3 magpie-align/magpie

[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs...

43
Emerging
4 jd-coderepos/llms4subjects

The official SemEval 2025 Task 5 - LLMs4Subjects - Shared Task Dataset repository

42
Emerging
5 willxxy/ECG-Bench

A Unified Framework for Benchmarking Generative Electrocardiogram-Language...

41
Emerging
6 geobrain-ai/geogalactica

Code and datasets for paper "GeoGalactica: A Scientific Large Language Model...

40
Emerging
7 seedatnabeel/CLLM

Curated LLM (ICML 2024)

36
Emerging
8 shahriargolchin/time-travel-in-llms

The official repository for the paper entitled "Time Travel in LLMs: Tracing...

36
Emerging
9 marcobombieri/do-LLM-dream-of-ontologies

Repository containing code and dataset of the paper "Do LLM Dream Of Ontologies?"

35
Emerging
10 HaoAreYuDong/MachineLearningLM

Scaling In-context Learning from Few-shot to 1,024-shot on Tabular ML

34
Emerging
11 KRR-Oxford/LLMap-Prelim

A preliminary investigation for ontology alignment (OM) with large language...

33
Emerging
12 paulalesius/llmath

Large Language Math - The Mathematics of LLM Foundational Models - For Beginners

32
Emerging
13 dsdanielpark/open-llm-datasets

Repository for organizing datasets and papers used in Open LLM.

32
Emerging
14 asimsinan/LLM-Research

A collection of LLM related papers, thesis, tools, datasets, courses, open...

30
Emerging
15 sodascience/social_science_inferences_with_llms

Addressing LLM-related measurement error in social science modeling research.

30
Emerging
16 Nkluge-correa/Model-Library

The Model Library is a project that maps the risks associated with modern...

29
Experimental
17 OSU-NLP-Group/LLM-IOAA

Code and data for the paper "Large Language Models Achieve Gold Medal...

28
Experimental
18 nercone-dev/zeta-llm-dataset

Public Datasets for Zeta-Tool

25
Experimental
19 mahadi-nahid/TabSQLify

[NAACL 2024] TabSQLify: Enhancing Reasoning Capabilities of LLMs Through...

25
Experimental
20 artpli/CodeIE

[ACL 23] CodeIE: Large Code Generation Models are Better Few-Shot...

24
Experimental
21 sciknoworg/LLMs4OL-Challenge

LLMs4OL Challenge @ ISWC

22
Experimental
22 liyaooi/TAMO

TAMO: reimagine Table representation as an independent Modality for LLMs

22
Experimental
23 rmovva/LLM-publication-patterns-public

[NAACL 2024] Topics, Authors, and Institutions in Large Language Model...

22
Experimental
24 LHHegland/if-llm-behavior-ontology

Instruction-Following LLM Behavior Ontology (IF-LLM-BO) is a lightweight...

22
Experimental
25 lankamar/pragmatic-llm-alignment

Investigación sobre alineación pragmática de LLMs y Framework de Agentes...

21
Experimental
26 vicgalle/distilled-self-critique

distilled Self-Critique refines the outputs of a LLM with only synthetic data

21
Experimental
27 zabir-nabil/bangla-multilingual-llm-eval

Evaluation of Open and Closed-Source Multi-lingual LLMs for Low-Resource...

20
Experimental
28 HES-XPLAIN/mlxplain

An open platform for accelerating the development of eXplainable AI systems

20
Experimental
29 xwang297/metamate-dataset

MetaMate: Large Language Model to the Rescue of Automated Data Extraction...

19
Experimental
30 sefeoglu/llm-examples

LLM examples for the state of the art problems in knowledge graphs

19
Experimental
31 alemoraru/exceed-project-overview

Reproduction package for a framework that uses LLMs to generate tailored,...

18
Experimental
32 mahadi-nahid/NormTab

[EMNLP 2024] NormTab: Improving Symbolic Reasoning in LLMs Through Tabular...

18
Experimental
33 Uniquenetra/ml-based-ontology-matching

A project to enhance ontology matching accuracy using Large Language Models...

18
Experimental
34 AbhijitKumarJ/Meta_Abstraction

Meta Abstracting data to utilize emergent patterns

17
Experimental
35 ngc7292/query_of_cc

This project is dataset and model checkpoints for the paper "Query of CC:...

11
Experimental
36 oaimli/ModularMetaReview

[ACL 2025 Findings] Decomposed Opinion Summarization with Verified...

11
Experimental
37 deepbiolab/llm-paper-research

This repository contains implementations and illustrative code related to...

11
Experimental