ChanLiang/CONNER

[EMNLP 2023] Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators

/ 100

Experimental

This project provides a systematic way to evaluate how well Large Language Models (LLMs) generate knowledge. You input the text produced by an LLM and it provides scores across various dimensions like factuality, relevance, and helpfulness. It's designed for researchers or practitioners who are developing or deploying LLMs and need to rigorously assess their quality.

No commits in the last 6 months.

Use this if you are developing or using Large Language Models and need a comprehensive, multi-faceted evaluation of the knowledge they generate beyond just factual accuracy.

Not ideal if you are looking for a simple, single-metric 'pass/fail' evaluation or if you are not working directly with LLM outputs.

Large Language Model Evaluation AI Model Quality Assurance Generative AI Research Natural Language Processing Benchmarking

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 8 / 25

Community 6 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

PaddlePaddle/PaddleNLP

Easy-to-use and powerful LLM and SLM library with awesome model zoo.

meta-llama/llama-cookbook

Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started...

arcee-ai/mergekit

Tools for merging pretrained large language models.

changyeyu/LLM-RL-Visualized

🌟100+ 原创 LLM / RL 原理图📚，《大模型算法》作者巨献！💥（100+ LLM/RL Algorithm Maps ）

mindspore-lab/step_into_llm

MindSpore online courses: Step into LLM

Explore Transformer Models

All categories Trending Transformer directory Insights