allenai/CommonGen-Eval
Evaluating LLMs with CommonGen-Lite
This project helps evaluate how well different large language models (LLMs) can generate sentences from a given set of concepts. You input a list of nouns and verbs, and the system outputs a natural, common sentence created by an LLM that includes all those words. This is for researchers and developers working with LLMs who need to benchmark their models' constrained text generation abilities.
No commits in the last 6 months.
Use this if you are developing or fine-tuning an LLM and need to objectively measure its proficiency in generating coherent sentences that incorporate specific, provided concepts.
Not ideal if you are an end-user simply looking to generate text, as this tool focuses on comparative evaluation rather than direct text generation for content creation.
Stars
95
Forks
3
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 21, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/allenai/CommonGen-Eval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
EvolvingLMMs-Lab/lmms-eval
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
vibrantlabsai/ragas
Supercharge Your LLM Application Evaluations 🚀
open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
EuroEval/EuroEval
The robust European language model benchmark.
Giskard-AI/giskard-oss
🐢 Open-Source Evaluation & Testing library for LLM Agents