amazon-science/recode

Releasing code for "ReCode: Robustness Evaluation of Code Generation Models"

/ 100

Emerging

This project helps evaluate how well code-generating AI models perform when presented with slightly altered inputs. It takes your existing code generation models and datasets, applies subtle changes to docstrings, function names, or code syntax, and then measures the model's ability to still produce correct code. This tool is for AI researchers and engineers who develop or deploy code generation models and need to understand their reliability.

No commits in the last 6 months.

Use this if you need to thoroughly test the practical robustness of your code generation models against common, subtle variations in input.

Not ideal if you are looking to test general code quality, functional correctness, or performance of your models under normal, unperturbed conditions.

AI-model-evaluation code-generation-AI model-robustness natural-language-processing software-engineering-AI

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 11 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

ExtensityAI/symbolicai

A neurosymbolic perspective on LLMs

TIGER-AI-Lab/MMLU-Pro

The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding...

deep-symbolic-mathematics/LLM-SR

[ICLR 2025 Oral] This is the official repo for the paper "LLM-SR" on Scientific Equation...

microsoft/interwhen

A framework for verifiable reasoning with language models.

zhudotexe/fanoutqa

Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering for Large Language...

Explore Transformer Models

All categories Trending Transformer directory Insights