Llm Comparison Evaluation Transformer Models

There are 10 llm comparison evaluation models tracked. The highest-rated is UBC-MDS/fixml at 33/100 with 4 stars.

Get all 10 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-comparison-evaluation&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 UBC-MDS/fixml

LLM Tool for effective test evaluation of ML projects with curated...

33
Emerging
2 AstraBert/DebateLLM-Championship

5 LLMs, 1vs1 matches to produce the most convincing argumentation in favor...

24
Experimental
3 brains-on-code/IterativeRefactoringLLM

Replication package, supplementary materials, and analysis pipeline for our...

21
Experimental
4 JosephTLucas/llm_test

A suite of tests to verify bias, safety, trust, and security concerns for LLMs.

20
Experimental
5 ash-jyc/db84llm

College policy debate as a verbal reasoning benchmark for LLMs

17
Experimental
6 RodillasJavier/debate-fallacy-detector

Logical Fallacy Detection in Presidential Debates using a Random Forest...

17
Experimental
7 iSEngLab/LLM4UT_Empirical

[ISSTA 2025] A Large-scale Empirical Study on Fine-tuning Large Language...

13
Experimental
8 danpozmanter/llm-comparative-eval

Compare how llm models stack up

13
Experimental
9 iSEngLab/RetriGen

[2025 TOSEM] Improving Deep Assertion Generation via Fine-Tuning...

12
Experimental
10 iSEngLab/LLM4AG

[2025 TOSEM] Exploring Automated Assertion Generation via Large Language Models

12
Experimental