Llm Comparison Evaluation Transformer Models

There are 10 llm comparison evaluation models tracked. The highest-rated is UBC-MDS/fixml at 33/100 with 4 stars.

Get all 10 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-comparison-evaluation&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	UBC-MDS/fixml LLM Tool for effective test evaluation of ML projects with curated...	33	Emerging	4	Python
2	AstraBert/DebateLLM-Championship 5 LLMs, 1vs1 matches to produce the most convincing argumentation in favor...	24	Experimental	4	Jupyter Notebook
3	brains-on-code/IterativeRefactoringLLM Replication package, supplementary materials, and analysis pipeline for our...	21	Experimental	—	Java
4	JosephTLucas/llm_test A suite of tests to verify bias, safety, trust, and security concerns for LLMs.	20	Experimental	7	Python
5	ash-jyc/db84llm College policy debate as a verbal reasoning benchmark for LLMs	17	Experimental	1	Jupyter Notebook
6	RodillasJavier/debate-fallacy-detector Logical Fallacy Detection in Presidential Debates using a Random Forest...	17	Experimental	—	Jupyter Notebook
7	iSEngLab/LLM4UT_Empirical [ISSTA 2025] A Large-scale Empirical Study on Fine-tuning Large Language...	13	Experimental	13	Python
8	danpozmanter/llm-comparative-eval Compare how llm models stack up	13	Experimental	—	Rust
9	iSEngLab/RetriGen [2025 TOSEM] Improving Deep Assertion Generation via Fine-Tuning...	12	Experimental	6	Python
10	iSEngLab/LLM4AG [2025 TOSEM] Exploring Automated Assertion Generation via Large Language Models	12	Experimental	8	Python