joelbarmettlerUZH/ConceptFormer
Towards Finding the Essence of Everything in Large Language Models
This project is for AI researchers or data scientists working on understanding how large language models (LLMs) connect to real-world knowledge. It helps create specialized datasets like T-Rex Star and Tri-Rex by extracting entities from text and linking them to knowledge graphs like Wikidata, and then generating synthetic sentences using local LLMs. The output is structured datasets and trained model configurations that can be used to pretrain and evaluate LLMs on knowledge-intensive tasks.
Use this if you are a researcher focused on the interpretability or knowledge representation of large language models and need to generate complex, knowledge-graph-infused datasets for experimentation and training.
Not ideal if you are looking for a plug-and-play solution for general LLM fine-tuning or do not have significant computational resources (e.g., multiple high-end GPUs and hundreds of GBs of RAM).
Stars
13
Forks
3
Language
Python
License
—
Category
Last pushed
Feb 01, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/joelbarmettlerUZH/ConceptFormer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
rasbt/LLMs-from-scratch
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
facebookresearch/LayerSkip
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
FareedKhan-dev/train-llm-from-scratch
A straightforward method for training your LLM, from downloading data to generating text.
kmeng01/rome
Locating and editing factual associations in GPT (NeurIPS 2022)
datawhalechina/llms-from-scratch-cn
仅需Python基础,从0构建大语言模型;从0逐步构建GLM4\Llama3\RWKV6, 深入理解大模型原理