spenceryonce/LLMeval
Evaluate and compare large language models (LLMs) for chatbot applications, using various LLMs as evaluators, and manage prompt templates and binary preferences.
When building a chatbot, comparing different large language models (LLMs) to see which performs best can be challenging. This tool helps you systematically evaluate chatbot responses against specific goals and compare how various LLMs stack up. It takes in your desired chatbot objectives and responses from different LLMs, then helps you determine which model is most effective. This is designed for AI product managers, developers, and researchers who are building and refining AI-powered conversational agents.
Use this if you need to rigorously test and compare multiple LLMs to identify the best one for a specific chatbot application based on predefined objectives.
Not ideal if you're looking for a low-code platform for building chatbots, or if you only need to evaluate a single LLM without comparison.
Stars
11
Forks
1
Language
Python
License
—
Category
Last pushed
Oct 16, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/spenceryonce/LLMeval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
EvolvingLMMs-Lab/lmms-eval
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
vibrantlabsai/ragas
Supercharge Your LLM Application Evaluations 🚀
open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
EuroEval/EuroEval
The robust European language model benchmark.
Giskard-AI/giskard-oss
🐢 Open-Source Evaluation & Testing library for LLM Agents