ChanLiang/CONNER

[EMNLP 2023] Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators

21
/ 100
Experimental

This project provides a systematic way to evaluate how well Large Language Models (LLMs) generate knowledge. You input the text produced by an LLM and it provides scores across various dimensions like factuality, relevance, and helpfulness. It's designed for researchers or practitioners who are developing or deploying LLMs and need to rigorously assess their quality.

No commits in the last 6 months.

Use this if you are developing or using Large Language Models and need a comprehensive, multi-faceted evaluation of the knowledge they generate beyond just factual accuracy.

Not ideal if you are looking for a simple, single-metric 'pass/fail' evaluation or if you are not working directly with LLM outputs.

Large Language Model Evaluation AI Model Quality Assurance Generative AI Research Natural Language Processing Benchmarking
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 8 / 25
Community 6 / 25

How are scores calculated?

Stars

33

Forks

2

Language

Python

License

Last pushed

Jan 22, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ChanLiang/CONNER"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.