prompt-evaluator and eval-data
About prompt-evaluator
syamsasi99/prompt-evaluator
prompt-evaluator is an open-source toolkit for evaluating, testing, and comparing LLM prompts. It provides a GUI-driven workflow for running prompt tests, tracking token usage, visualizing results, and ensuring reliability across models like OpenAI, Claude, and Gemini.
About eval-data
paradite/eval-data
Prompts and evaluation data for LLMs on real world coding and writing tasks
This provides a collection of prompts and expected outputs for evaluating how well large language models (LLMs) perform on various real-world coding and writing tasks. It takes in a specific task scenario (like writing a Next.js todo app or explaining Kanji) and provides benchmark data to assess an LLM's generated code or text. This is designed for AI researchers, prompt engineers, and product managers who are developing or integrating LLM-powered applications.
Scores updated daily from GitHub, PyPI, and npm data. How scores work