ccarvalho-eng/aludel
LLM Evaluation Workbench
This tool helps developers and product managers evaluate and compare how different AI language models (LLMs) respond to specific prompts. You provide a prompt with variables, choose models from providers like OpenAI, Anthropic, or Ollama, and it shows you how each model performs on metrics like quality, speed, and cost. It's for anyone building applications with LLMs who needs to choose the best model for a task or track performance over time.
Use this if you need to systematically test and compare different LLMs to find the best one for your application or track how changes to your prompts affect model performance.
Not ideal if you're a casual user just trying out a single LLM or if your primary need is general-purpose LLM experimentation without detailed performance tracking.
Stars
9
Forks
—
Language
JavaScript
License
Apache-2.0
Category
Last pushed
Mar 28, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/ccarvalho-eng/aludel"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral,...
IBM/unitxt
🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the...
lean-dojo/LeanDojo
Tool for data extraction and interacting with Lean programmatically.
GoodStartLabs/AI_Diplomacy
Frontier Models playing the board game Diplomacy.
google/litmus
Litmus is a comprehensive LLM testing and evaluation tool designed for GenAI Application...