marcusm117/IdentityChain

[ICLR 2024] Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain

36
/ 100
Emerging

This framework helps AI researchers and developers evaluate how well their Code Large Language Models (LLMs) can consistently translate between natural language descriptions and code, and vice versa. It takes a Code LLM and an evaluation dataset, then provides detailed insights into where the model fails to maintain self-consistency across these tasks. The end-user is typically an AI researcher or machine learning engineer focused on developing and improving code generation and summarization models.

Available on PyPI.

Use this if you are developing or fine-tuning Code LLMs and need a rigorous method to assess their consistency and pinpoint specific errors in their ability to generate code from descriptions or summarize code accurately.

Not ideal if you are looking for a general-purpose code testing tool or a simple accuracy metric for a pre-trained model without needing deep self-consistency analysis.

Code LLM evaluation Natural Language to Code Code to Natural Language AI model debugging Model self-consistency
No Dependents
Maintenance 6 / 25
Adoption 5 / 25
Maturity 25 / 25
Community 0 / 25

How are scores calculated?

Stars

10

Forks

Language

Python

License

Apache-2.0

Last pushed

Nov 24, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/marcusm117/IdentityChain"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.