marcusm117/IdentityChain
[ICLR 2024] Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain
This framework helps AI researchers and developers evaluate how well their Code Large Language Models (LLMs) can consistently translate between natural language descriptions and code, and vice versa. It takes a Code LLM and an evaluation dataset, then provides detailed insights into where the model fails to maintain self-consistency across these tasks. The end-user is typically an AI researcher or machine learning engineer focused on developing and improving code generation and summarization models.
Available on PyPI.
Use this if you are developing or fine-tuning Code LLMs and need a rigorous method to assess their consistency and pinpoint specific errors in their ability to generate code from descriptions or summarize code accurately.
Not ideal if you are looking for a general-purpose code testing tool or a simple accuracy metric for a pre-trained model without needing deep self-consistency analysis.
Stars
10
Forks
—
Language
Python
License
Apache-2.0
Category
Last pushed
Nov 24, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/marcusm117/IdentityChain"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
filipnaudot/llmSHAP
llmSHAP: a multi-threaded explainability framework using Shapley values for LLM-based outputs.
microsoft/automated-brain-explanations
Generating and validating natural-language explanations for the brain.
CAS-SIAT-XinHai/CPsyCoun
[ACL 2024] CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework...
wesg52/universal-neurons
Universal Neurons in GPT2 Language Models
ICTMCG/LLM-for-misinformation-research
Paper list of misinformation research using (multi-modal) large language models, i.e., (M)LLMs.