Alsace08/Chain-of-Embedding
[ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"
This tool helps you evaluate how well a large language model (LLM) understands and answers questions without needing to manually check its written output. You provide a question and a known correct answer, and it analyzes the model's internal processing to produce a "CoE score" and visualizations. This is ideal for researchers and engineers who build and test LLMs, especially for tasks requiring factual accuracy like math or precise question answering.
No commits in the last 6 months.
Use this if you need a novel way to automatically assess an LLM's understanding and correctness, particularly when you have a dataset with clear, verifiable answers.
Not ideal if you are looking for a tool to evaluate an LLM's creativity, writing style, or subjective performance on open-ended prompts.
Stars
95
Forks
8
Language
Python
License
Apache-2.0
Category
Last pushed
Dec 19, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Alsace08/Chain-of-Embedding"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
cvs-health/uqlm
UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM...
PRIME-RL/TTRL
[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning
sapientinc/HRM
Hierarchical Reasoning Model Official Release
tigerchen52/query_level_uncertainty
query-level uncertainty in LLMs
reasoning-survey/Awesome-Reasoning-Foundation-Models
✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models