ai-twinkle/Eval

Twinkle Eval:高效且準確的 AI 評測工具

44
/ 100
Emerging

This tool helps AI practitioners and researchers objectively compare and analyze the performance of different Large Language Models (LLMs). You provide your LLM API details and datasets (CSV, JSON, etc., with questions and answers), and it generates comprehensive reports on model accuracy, stability, and inference speed. It's designed for anyone who needs to rigorously evaluate LLMs, such as MMLU or TMMLU+ benchmarks.

No commits in the last 6 months.

Use this if you need an efficient and accurate way to benchmark various Large Language Models (LLMs) against specific datasets to understand their performance and stability.

Not ideal if you only need to run a single, quick test on an LLM without deep analysis or comparative benchmarking.

LLM evaluation AI model benchmarking natural language processing AI research model comparison
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 17 / 25

How are scores calculated?

Stars

89

Forks

16

Language

Python

License

MIT

Last pushed

Aug 14, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/ai-twinkle/Eval"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.