joelbarmettlerUZH/ConceptFormer

Towards Finding the Essence of Everything in Large Language Models

37
/ 100
Emerging

This project is for AI researchers or data scientists working on understanding how large language models (LLMs) connect to real-world knowledge. It helps create specialized datasets like T-Rex Star and Tri-Rex by extracting entities from text and linking them to knowledge graphs like Wikidata, and then generating synthetic sentences using local LLMs. The output is structured datasets and trained model configurations that can be used to pretrain and evaluate LLMs on knowledge-intensive tasks.

Use this if you are a researcher focused on the interpretability or knowledge representation of large language models and need to generate complex, knowledge-graph-infused datasets for experimentation and training.

Not ideal if you are looking for a plug-and-play solution for general LLM fine-tuning or do not have significant computational resources (e.g., multiple high-end GPUs and hundreds of GBs of RAM).

LLM Research Knowledge Graph Generation Dataset Creation AI Model Pretraining Natural Language Processing
No License No Package No Dependents
Maintenance 10 / 25
Adoption 5 / 25
Maturity 8 / 25
Community 14 / 25

How are scores calculated?

Stars

13

Forks

3

Language

Python

License

Last pushed

Feb 01, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/joelbarmettlerUZH/ConceptFormer"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.