Llm Evaluation Platforms Transformer Models

There are 14 llm evaluation platforms models tracked. The highest-rated is radlab-dev-group/llm-router at 41/100 with 5 stars.

Get all 14 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-evaluation-platforms&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	radlab-dev-group/llm-router LLM Router is a service that can be deployed on‑premises or in the cloud. It...	41	Emerging	5	Python
2	yonahgraphics/openevalkit Production-grade Python framework for evaluating LLM and agentic systems...	33	Emerging	3	Python
3	Aryan-202/cookbooks An intelligent optimization engine that dynamically adjusts LLM selection,...	26	Experimental	—	Jupyter Notebook
4	squishai/squish 🤖🗜️⚡️ Compress local LLMs once, run them forever at sub-second load times....	26	Experimental	2	Python
5	wesleyscholl/squish 🤖🗜️⚡️ Compress local LLMs once, run them forever at sub-second load times....	22	Experimental	1	Python
6	Yu-amd/Multiverse Lightweight model inference playground	21	Experimental	—	Python
7	adityonugrohoid/ollama-multi-llm-server Multi-model inference API and playground powered by Ollama. Serve, switch,...	21	Experimental	—	Python
8	sylym/subtext LLM-Based Steganography Framework \| 基于大语言模型概率分布的隐秘信息传输框架	21	Experimental	2	Python
9	awaescher/Olmolo Ollama Model Loader: Keeping Ollama models warm	18	Experimental	1	C#
10	ghr8635/LLM-based-Agent-for-Driver-Sleepiness-Detection-and-Mitigation-in-Automotive-Systems An AI-driven automotive agent utilizing Large Language Models (LLMs) and...	17	Experimental	3	Python
11	charanpool/llm-cogs-optmizer Intelligent middleware that reduces LLM COGS by routing queries between...	13	Experimental	—	Python
12	Deepakkasyapa11/LLMops-Computed-Grid-Training Production-centric LLMOps framework designed to bridge the gap between AI...	13	Experimental	—	Python
13	CrackedResearcher/LLMVerify Verify outputs generated by LLMs backed with real time data	11	Experimental	—	Python
14	brettdidonato/BSD_Evals LLM evaluation framework	11	Experimental	—	Jupyter Notebook