deshwalmahesh/PHUDGE

Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute, relative and much more. It contains a list of all the available tool, methods, repo, code etc to detect hallucination, LLM evaluation, grading and much more.

/ 100

Experimental

This tool helps you objectively assess the quality of responses generated by your Large Language Models (LLMs) or even human-written answers. You provide a question and a response, and it gives you a quality score from 1-5. It's ideal for anyone who needs to ensure the accuracy and helpfulness of AI-generated content or human agents in customer support, content creation, or knowledge management.

No commits in the last 6 months.

Use this if you need a scalable and robust way to grade LLM or human responses, especially when you want to use custom scoring criteria or don't have a perfect reference answer available.

Not ideal if you are looking for a simple, out-of-the-box solution that doesn't require any technical setup or if you only need basic, qualitative feedback without numerical grading.

LLM-evaluation AI-content-moderation customer-service-QA chatbot-performance content-quality-assessment

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 8 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

—

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

vibrantlabsai/ragas

Supercharge Your LLM Application Evaluations 🚀

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

EuroEval/EuroEval

The robust European language model benchmark.

Giskard-AI/giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

Explore LLM Tools

All categories Trending LLM Tool directory Insights