allenai/CommonGen-Eval

Evaluating LLMs with CommonGen-Lite

/ 100

Emerging

This project helps evaluate how well different large language models (LLMs) can generate sentences from a given set of concepts. You input a list of nouns and verbs, and the system outputs a natural, common sentence created by an LLM that includes all those words. This is for researchers and developers working with LLMs who need to benchmark their models' constrained text generation abilities.

No commits in the last 6 months.

Use this if you are developing or fine-tuning an LLM and need to objectively measure its proficiency in generating coherent sentences that incorporate specific, provided concepts.

Not ideal if you are an end-user simply looking to generate text, as this tool focuses on comparative evaluation rather than direct text generation for content creation.

LLM evaluation natural language generation text generation quality constrained text synthesis model benchmarking

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 6 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

vibrantlabsai/ragas

Supercharge Your LLM Application Evaluations 🚀

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

EuroEval/EuroEval

The robust European language model benchmark.

Giskard-AI/giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

Explore LLM Tools

All categories Trending LLM Tool directory Insights