waltonfuture/Diff-eRank

[NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models

31
/ 100
Emerging

This project offers a new way to evaluate Large Language Models (LLMs) and Multi-modal Large Language Models (MLLMs). It takes the internal data representations of a trained LLM and an untrained version of the same model. The output is a "Diff-eRank" score, which helps you understand how efficiently the model has learned to discard redundant information. It's for researchers, data scientists, or AI evaluators who need to assess the quality and efficiency of LLMs and MLLMs.

No commits in the last 6 months.

Use this if you need an alternative, information-theory-based metric to quantify how well an LLM or MLLM processes and compresses information during training, especially when traditional metrics like loss and accuracy don't fully capture what you need.

Not ideal if you are looking for metrics related to the model's external performance on specific tasks, like response quality or factual accuracy, rather than its internal representational efficiency.

LLM-evaluation AI-model-assessment natural-language-processing-research machine-learning-engineering multi-modal-AI
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 5 / 25

How are scores calculated?

Stars

57

Forks

2

Language

Python

License

Apache-2.0

Last pushed

May 28, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/waltonfuture/Diff-eRank"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.