Llm Domain Datasets Transformer Models
There are 37 llm domain datasets models tracked. The highest-rated is mlabonne/llm-datasets at 47/100 with 4,319 stars.
Get all 37 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-domain-datasets&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Model | Score | Tier |
|---|---|---|---|
| 1 |
mlabonne/llm-datasets
Curated list of datasets and tools for post-training. |
|
Emerging |
| 2 |
malteos/llm-datasets
A collection of datasets for language model pretraining including scripts... |
|
Emerging |
| 3 |
magpie-align/magpie
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs... |
|
Emerging |
| 4 |
jd-coderepos/llms4subjects
The official SemEval 2025 Task 5 - LLMs4Subjects - Shared Task Dataset repository |
|
Emerging |
| 5 |
willxxy/ECG-Bench
A Unified Framework for Benchmarking Generative Electrocardiogram-Language... |
|
Emerging |
| 6 |
geobrain-ai/geogalactica
Code and datasets for paper "GeoGalactica: A Scientific Large Language Model... |
|
Emerging |
| 7 |
seedatnabeel/CLLM
Curated LLM (ICML 2024) |
|
Emerging |
| 8 |
shahriargolchin/time-travel-in-llms
The official repository for the paper entitled "Time Travel in LLMs: Tracing... |
|
Emerging |
| 9 |
marcobombieri/do-LLM-dream-of-ontologies
Repository containing code and dataset of the paper "Do LLM Dream Of Ontologies?" |
|
Emerging |
| 10 |
HaoAreYuDong/MachineLearningLM
Scaling In-context Learning from Few-shot to 1,024-shot on Tabular ML |
|
Emerging |
| 11 |
KRR-Oxford/LLMap-Prelim
A preliminary investigation for ontology alignment (OM) with large language... |
|
Emerging |
| 12 |
paulalesius/llmath
Large Language Math - The Mathematics of LLM Foundational Models - For Beginners |
|
Emerging |
| 13 |
dsdanielpark/open-llm-datasets
Repository for organizing datasets and papers used in Open LLM. |
|
Emerging |
| 14 |
asimsinan/LLM-Research
A collection of LLM related papers, thesis, tools, datasets, courses, open... |
|
Emerging |
| 15 |
sodascience/social_science_inferences_with_llms
Addressing LLM-related measurement error in social science modeling research. |
|
Emerging |
| 16 |
Nkluge-correa/Model-Library
The Model Library is a project that maps the risks associated with modern... |
|
Experimental |
| 17 |
OSU-NLP-Group/LLM-IOAA
Code and data for the paper "Large Language Models Achieve Gold Medal... |
|
Experimental |
| 18 |
nercone-dev/zeta-llm-dataset
Public Datasets for Zeta-Tool |
|
Experimental |
| 19 |
mahadi-nahid/TabSQLify
[NAACL 2024] TabSQLify: Enhancing Reasoning Capabilities of LLMs Through... |
|
Experimental |
| 20 |
artpli/CodeIE
[ACL 23] CodeIE: Large Code Generation Models are Better Few-Shot... |
|
Experimental |
| 21 |
sciknoworg/LLMs4OL-Challenge
LLMs4OL Challenge @ ISWC |
|
Experimental |
| 22 |
liyaooi/TAMO
TAMO: reimagine Table representation as an independent Modality for LLMs |
|
Experimental |
| 23 |
rmovva/LLM-publication-patterns-public
[NAACL 2024] Topics, Authors, and Institutions in Large Language Model... |
|
Experimental |
| 24 |
LHHegland/if-llm-behavior-ontology
Instruction-Following LLM Behavior Ontology (IF-LLM-BO) is a lightweight... |
|
Experimental |
| 25 |
lankamar/pragmatic-llm-alignment
Investigación sobre alineación pragmática de LLMs y Framework de Agentes... |
|
Experimental |
| 26 |
vicgalle/distilled-self-critique
distilled Self-Critique refines the outputs of a LLM with only synthetic data |
|
Experimental |
| 27 |
zabir-nabil/bangla-multilingual-llm-eval
Evaluation of Open and Closed-Source Multi-lingual LLMs for Low-Resource... |
|
Experimental |
| 28 |
HES-XPLAIN/mlxplain
An open platform for accelerating the development of eXplainable AI systems |
|
Experimental |
| 29 |
xwang297/metamate-dataset
MetaMate: Large Language Model to the Rescue of Automated Data Extraction... |
|
Experimental |
| 30 |
sefeoglu/llm-examples
LLM examples for the state of the art problems in knowledge graphs |
|
Experimental |
| 31 |
alemoraru/exceed-project-overview
Reproduction package for a framework that uses LLMs to generate tailored,... |
|
Experimental |
| 32 |
mahadi-nahid/NormTab
[EMNLP 2024] NormTab: Improving Symbolic Reasoning in LLMs Through Tabular... |
|
Experimental |
| 33 |
Uniquenetra/ml-based-ontology-matching
A project to enhance ontology matching accuracy using Large Language Models... |
|
Experimental |
| 34 |
AbhijitKumarJ/Meta_Abstraction
Meta Abstracting data to utilize emergent patterns |
|
Experimental |
| 35 |
ngc7292/query_of_cc
This project is dataset and model checkpoints for the paper "Query of CC:... |
|
Experimental |
| 36 |
oaimli/ModularMetaReview
[ACL 2025 Findings] Decomposed Opinion Summarization with Verified... |
|
Experimental |
| 37 |
deepbiolab/llm-paper-research
This repository contains implementations and illustrative code related to... |
|
Experimental |