marcusm117/IdentityChain

[ICLR 2024] Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain

/ 100

Emerging

This framework helps AI researchers and developers evaluate how well their Code Large Language Models (LLMs) can consistently translate between natural language descriptions and code, and vice versa. It takes a Code LLM and an evaluation dataset, then provides detailed insights into where the model fails to maintain self-consistency across these tasks. The end-user is typically an AI researcher or machine learning engineer focused on developing and improving code generation and summarization models.

Available on PyPI.

Use this if you are developing or fine-tuning Code LLMs and need a rigorous method to assess their consistency and pinpoint specific errors in their ability to generate code from descriptions or summarize code accurately.

Not ideal if you are looking for a general-purpose code testing tool or a simple accuracy metric for a pre-trained model without needing deep self-consistency analysis.

Code LLM evaluation Natural Language to Code Code to Natural Language AI model debugging Model self-consistency

No Dependents

Maintenance 6 / 25

Adoption 5 / 25

Maturity 25 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

Apache-2.0

Higher-rated alternatives

filipnaudot/llmSHAP

llmSHAP: a multi-threaded explainability framework using Shapley values for LLM-based outputs.

microsoft/automated-brain-explanations

Generating and validating natural-language explanations for the brain.

CAS-SIAT-XinHai/CPsyCoun

[ACL 2024] CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework...

wesg52/universal-neurons

Universal Neurons in GPT2 Language Models

ICTMCG/LLM-for-misinformation-research

Paper list of misinformation research using (multi-modal) large language models, i.e., (M)LLMs.

Explore LLM Tools

All categories Trending LLM Tool directory Insights