zhuohaoyu/KIEval
[ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
This tool helps AI researchers and developers accurately assess how well large language models (LLMs) understand and apply domain-specific knowledge. It takes a conventional LLM benchmark question and generates a multi-round, knowledge-focused dialogue to evaluate if the model truly comprehends the subject or is merely recalling pre-trained answers. The output is a robust evaluation of the LLM's true knowledge application, even when benchmark data might be contaminated.
No commits in the last 6 months.
Use this if you need to reliably evaluate the deep comprehension and real-world applicability of large language models on knowledge-intensive tasks, beyond simple memorization.
Not ideal if you are looking for a quick, high-level performance score that doesn't account for data contamination or interactive knowledge application.
Stars
39
Forks
2
Language
Python
License
—
Category
Last pushed
Jul 19, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/zhuohaoyu/KIEval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
cvs-health/langfair
LangFair is a Python library for conducting use-case level LLM bias and fairness assessments
BetterForAll/HonestyMeter
HonestyMeter: An NLP-powered framework for evaluating objectivity and bias in media content,...
bws82/biasclear
Structural bias detection and correction engine built on Persistent Influence Theory (PIT)
KID-22/LLM-IR-Bias-Fairness-Survey
This is the repo for the survey of Bias and Fairness in IR with LLMs.
Hanpx20/SafeSwitch
Official code repository for the paper "Internal Activation as the Polar Star for Steering...