krystalan/chatgpt_as_nlg_evaluator

Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Study

/ 100

Experimental

This project helps researchers and developers evaluate the quality of text generated by large language models, specifically focusing on summarization and story generation tasks. It takes human-generated or model-generated text and outputs quantitative correlation scores, indicating how well ChatGPT's evaluations align with human judgments. Natural Language Generation (NLG) researchers and practitioners developing or comparing text generation models would use this.

No commits in the last 6 months.

Use this if you need to understand how reliable ChatGPT is for automatically scoring aspects like coherence, relevance, or fluency of generated text, without extensive human evaluation.

Not ideal if you are looking for a tool to generate text, or if you require human-level evaluation with nuanced qualitative feedback rather than quantitative correlation scores.

natural-language-generation text-summarization story-generation model-evaluation AI-research

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 8 / 25

Community 3 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

openai/openai-cookbook

Examples and guides for using the OpenAI API

rgbkrk/dangermode

Execute IPython & Jupyter from the comforts of chat.openai.com

CogStack/OpenGPT

A framework for creating grounded instruction based datasets and training conversational domain...

Declipsonator/GPTZzzs

Large language model detection evasion through grammar and vocabulary modifcation.

antononcube/Python-JupyterChatbook

Python package of a Jupyter extension that facilitates the interaction with LLMs.

Explore LLM Tools

All categories Trending LLM Tool directory Insights