paradite/eval-data
Prompts and evaluation data for LLMs on real world coding and writing tasks
This provides a collection of prompts and expected outputs for evaluating how well large language models (LLMs) perform on various real-world coding and writing tasks. It takes in a specific task scenario (like writing a Next.js todo app or explaining Kanji) and provides benchmark data to assess an LLM's generated code or text. This is designed for AI researchers, prompt engineers, and product managers who are developing or integrating LLM-powered applications.
No commits in the last 6 months.
Use this if you need pre-defined, diverse datasets to systematically test and compare the performance of different LLMs or prompt strategies on practical development and content creation challenges.
Not ideal if you are looking for a tool to generate new code or content directly, rather than evaluate an LLM's output.
Stars
17
Forks
3
Language
TypeScript
License
—
Category
Last pushed
Sep 13, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/prompt-engineering/paradite/eval-data"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Compare
Higher-rated alternatives
microsoft/promptbench
A unified evaluation framework for large language models
uptrain-ai/uptrain
UpTrain is an open-source unified platform to evaluate and improve Generative AI applications....
levitation-opensource/Manipulative-Expression-Recognition
MER is a software that identifies and highlights manipulative communication in text from human...
microsoftarchive/promptbench
A unified evaluation framework for large language models
gabe-mousa/Apolien
AI Safety Evaluation Library