Llm Domain Datasets Transformer Models

There are 37 llm domain datasets models tracked. The highest-rated is mlabonne/llm-datasets at 47/100 with 4,319 stars.

Get all 37 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-domain-datasets&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	mlabonne/llm-datasets Curated list of datasets and tools for post-training.	47	Emerging	4,319	—
2	malteos/llm-datasets A collection of datasets for language model pretraining including scripts...	45	Emerging	64	Python
3	magpie-align/magpie [ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs...	43	Emerging	834	Python
4	jd-coderepos/llms4subjects The official SemEval 2025 Task 5 - LLMs4Subjects - Shared Task Dataset repository	42	Emerging	7	—
5	willxxy/ECG-Bench A Unified Framework for Benchmarking Generative Electrocardiogram-Language...	41	Emerging	42	Python
6	geobrain-ai/geogalactica Code and datasets for paper "GeoGalactica: A Scientific Large Language Model...	40	Emerging	40	Python
7	seedatnabeel/CLLM Curated LLM (ICML 2024)	36	Emerging	14	Jupyter Notebook
8	shahriargolchin/time-travel-in-llms The official repository for the paper entitled "Time Travel in LLMs: Tracing...	36	Emerging	12	Python
9	marcobombieri/do-LLM-dream-of-ontologies Repository containing code and dataset of the paper "Do LLM Dream Of Ontologies?"	35	Emerging	1	Python
10	HaoAreYuDong/MachineLearningLM Scaling In-context Learning from Few-shot to 1,024-shot on Tabular ML	34	Emerging	59	Python
11	KRR-Oxford/LLMap-Prelim A preliminary investigation for ontology alignment (OM) with large language...	33	Emerging	5	Python
12	paulalesius/llmath Large Language Math - The Mathematics of LLM Foundational Models - For Beginners	32	Emerging	4	CSS
13	dsdanielpark/open-llm-datasets Repository for organizing datasets and papers used in Open LLM.	32	Emerging	101	—
14	asimsinan/LLM-Research A collection of LLM related papers, thesis, tools, datasets, courses, open...	30	Emerging	62	Python
15	sodascience/social_science_inferences_with_llms Addressing LLM-related measurement error in social science modeling research.	30	Emerging	10	—
16	Nkluge-correa/Model-Library The Model Library is a project that maps the risks associated with modern...	29	Experimental	1	Python
17	OSU-NLP-Group/LLM-IOAA Code and data for the paper "Large Language Models Achieve Gold Medal...	28	Experimental	17	TeX
18	nercone-dev/zeta-llm-dataset Public Datasets for Zeta-Tool	25	Experimental	3	Python
19	mahadi-nahid/TabSQLify [NAACL 2024] TabSQLify: Enhancing Reasoning Capabilities of LLMs Through...	25	Experimental	17	Python
20	artpli/CodeIE [ACL 23] CodeIE: Large Code Generation Models are Better Few-Shot...	24	Experimental	40	Python
21	sciknoworg/LLMs4OL-Challenge LLMs4OL Challenge @ ISWC	22	Experimental	7	—
22	liyaooi/TAMO TAMO: reimagine Table representation as an independent Modality for LLMs	22	Experimental	8	Python
23	rmovva/LLM-publication-patterns-public [NAACL 2024] Topics, Authors, and Institutions in Large Language Model...	22	Experimental	17	Jupyter Notebook
24	LHHegland/if-llm-behavior-ontology Instruction-Following LLM Behavior Ontology (IF-LLM-BO) is a lightweight...	22	Experimental	—	—
25	lankamar/pragmatic-llm-alignment Investigación sobre alineación pragmática de LLMs y Framework de Agentes...	21	Experimental	—	Python
26	vicgalle/distilled-self-critique distilled Self-Critique refines the outputs of a LLM with only synthetic data	21	Experimental	11	Jupyter Notebook
27	zabir-nabil/bangla-multilingual-llm-eval Evaluation of Open and Closed-Source Multi-lingual LLMs for Low-Resource...	20	Experimental	5	Jupyter Notebook
28	HES-XPLAIN/mlxplain An open platform for accelerating the development of eXplainable AI systems	20	Experimental	5	Jupyter Notebook
29	xwang297/metamate-dataset MetaMate: Large Language Model to the Rescue of Automated Data Extraction...	19	Experimental	4	—
30	sefeoglu/llm-examples LLM examples for the state of the art problems in knowledge graphs	19	Experimental	3	Jupyter Notebook
31	alemoraru/exceed-project-overview Reproduction package for a framework that uses LLMs to generate tailored,...	18	Experimental	1	—
32	mahadi-nahid/NormTab [EMNLP 2024] NormTab: Improving Symbolic Reasoning in LLMs Through Tabular...	18	Experimental	6	Python
33	Uniquenetra/ml-based-ontology-matching A project to enhance ontology matching accuracy using Large Language Models...	18	Experimental	2	Jupyter Notebook
34	AbhijitKumarJ/Meta_Abstraction Meta Abstracting data to utilize emergent patterns	17	Experimental	1	HTML
35	ngc7292/query_of_cc This project is dataset and model checkpoints for the paper "Query of CC:...	11	Experimental	4	—
36	oaimli/ModularMetaReview [ACL 2025 Findings] Decomposed Opinion Summarization with Verified...	11	Experimental	—	Python
37	deepbiolab/llm-paper-research This repository contains implementations and illustrative code related to...	11	Experimental	—	—