spenceryonce/LLMeval

Evaluate and compare large language models (LLMs) for chatbot applications, using various LLMs as evaluators, and manage prompt templates and binary preferences.

26
/ 100
Experimental

When building a chatbot, comparing different large language models (LLMs) to see which performs best can be challenging. This tool helps you systematically evaluate chatbot responses against specific goals and compare how various LLMs stack up. It takes in your desired chatbot objectives and responses from different LLMs, then helps you determine which model is most effective. This is designed for AI product managers, developers, and researchers who are building and refining AI-powered conversational agents.

Use this if you need to rigorously test and compare multiple LLMs to identify the best one for a specific chatbot application based on predefined objectives.

Not ideal if you're looking for a low-code platform for building chatbots, or if you only need to evaluate a single LLM without comparison.

chatbot-development AI-evaluation conversational-AI LLM-selection AI-product-management
No License No Package No Dependents
Maintenance 6 / 25
Adoption 5 / 25
Maturity 8 / 25
Community 7 / 25

How are scores calculated?

Stars

11

Forks

1

Language

Python

License

Last pushed

Oct 16, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/spenceryonce/LLMeval"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.