gmelli/llm-judge
A robust Python library for evaluating content using Large Language Models as judges
22
/ 100
Experimental
No Package
No Dependents
Maintenance
13 / 25
Adoption
0 / 25
Maturity
9 / 25
Community
0 / 25
Stars
—
Forks
—
Language
Python
License
MIT
Category
Last pushed
Mar 22, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/gmelli/llm-judge"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
allenai/RL4LMs
A modular RL library to fine-tune language models to human preferences
45
emredeveloper/Mem-LLM
Mem-LLM is a Python library for building memory-enabled AI assistants that run entirely on local...
44
cloudguruab/modsysML
Human reinforcement learning (RLHF) framework for AI models. Evaluate and compare LLM outputs,...
41
ManasVardhan/bench-my-llm
🏎️ Dead-simple LLM benchmarking CLI - latency, cost, and quality metrics
38
modal-labs/stopwatch
A tool for benchmarking LLMs on Modal
36