Alsace08/Chain-of-Embedding

[ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"

36
/ 100
Emerging

This tool helps you evaluate how well a large language model (LLM) understands and answers questions without needing to manually check its written output. You provide a question and a known correct answer, and it analyzes the model's internal processing to produce a "CoE score" and visualizations. This is ideal for researchers and engineers who build and test LLMs, especially for tasks requiring factual accuracy like math or precise question answering.

No commits in the last 6 months.

Use this if you need a novel way to automatically assess an LLM's understanding and correctness, particularly when you have a dataset with clear, verifiable answers.

Not ideal if you are looking for a tool to evaluate an LLM's creativity, writing style, or subjective performance on open-ended prompts.

LLM-evaluation AI-model-testing natural-language-processing model-interpretability
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 11 / 25

How are scores calculated?

Stars

95

Forks

8

Language

Python

License

Apache-2.0

Last pushed

Dec 19, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Alsace08/Chain-of-Embedding"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.