AKSW/LLM-KG-Bench
LLM-KG-Bench is a Framework and task collection for automated benchmarking of Large Language Models (LLMs) on Knowledge Graph (KG) related tasks.
This framework helps you objectively compare how well different large language models (LLMs) perform on tasks involving knowledge graphs. You input the LLMs you want to test and the specific knowledge graph tasks, and it outputs detailed performance metrics. This is for researchers, engineers, and developers who work with knowledge graphs and need to select or evaluate LLMs for their applications.
Use this if you need to systematically benchmark and compare various LLMs on their ability to create, comprehend, and interact with knowledge graphs.
Not ideal if you are looking for a simple, no-code tool to analyze or visualize an existing knowledge graph, rather than benchmarking LLMs.
Stars
56
Forks
5
Language
Python
License
MPL-2.0
Category
Last pushed
Mar 10, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/AKSW/LLM-KG-Bench"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
sierra-research/tau2-bench
τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment
xlang-ai/OSWorld
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
bigcode-project/bigcodebench
[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI
scicode-bench/SciCode
A benchmark that challenges language models to code solutions for scientific problems
THUDM/AgentBench
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)