Alsace08/Chain-of-Embedding

[ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"

/ 100

Emerging

This tool helps you evaluate how well a large language model (LLM) understands and answers questions without needing to manually check its written output. You provide a question and a known correct answer, and it analyzes the model's internal processing to produce a "CoE score" and visualizations. This is ideal for researchers and engineers who build and test LLMs, especially for tasks requiring factual accuracy like math or precise question answering.

No commits in the last 6 months.

Use this if you need a novel way to automatically assess an LLM's understanding and correctness, particularly when you have a dataset with clear, verifiable answers.

Not ideal if you are looking for a tool to evaluate an LLM's creativity, writing style, or subjective performance on open-ended prompts.

LLM-evaluation AI-model-testing natural-language-processing model-interpretability

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 11 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

cvs-health/uqlm

UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM...

PRIME-RL/TTRL

[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning

sapientinc/HRM

Hierarchical Reasoning Model Official Release

tigerchen52/query_level_uncertainty

query-level uncertainty in LLMs

reasoning-survey/Awesome-Reasoning-Foundation-Models

✨✨Latest Papers and Benchmarks in Reasoning with Foundation Models

Explore Transformer Models

All categories Trending Transformer directory Insights