deshwalmahesh/PHUDGE

Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute, relative and much more. It contains a list of all the available tool, methods, repo, code etc to detect hallucination, LLM evaluation, grading and much more.

29
/ 100
Experimental

This tool helps you objectively assess the quality of responses generated by your Large Language Models (LLMs) or even human-written answers. You provide a question and a response, and it gives you a quality score from 1-5. It's ideal for anyone who needs to ensure the accuracy and helpfulness of AI-generated content or human agents in customer support, content creation, or knowledge management.

No commits in the last 6 months.

Use this if you need a scalable and robust way to grade LLM or human responses, especially when you want to use custom scoring criteria or don't have a perfect reference answer available.

Not ideal if you are looking for a simple, out-of-the-box solution that doesn't require any technical setup or if you only need basic, qualitative feedback without numerical grading.

LLM-evaluation AI-content-moderation customer-service-QA chatbot-performance content-quality-assessment
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 8 / 25
Community 13 / 25

How are scores calculated?

Stars

52

Forks

7

Language

Jupyter Notebook

License

Last pushed

Jul 10, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/deshwalmahesh/PHUDGE"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.