amazon-science/recode
Releasing code for "ReCode: Robustness Evaluation of Code Generation Models"
This project helps evaluate how well code-generating AI models perform when presented with slightly altered inputs. It takes your existing code generation models and datasets, applies subtle changes to docstrings, function names, or code syntax, and then measures the model's ability to still produce correct code. This tool is for AI researchers and engineers who develop or deploy code generation models and need to understand their reliability.
No commits in the last 6 months.
Use this if you need to thoroughly test the practical robustness of your code generation models against common, subtle variations in input.
Not ideal if you are looking to test general code quality, functional correctness, or performance of your models under normal, unperturbed conditions.
Stars
58
Forks
6
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 20, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/amazon-science/recode"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ExtensityAI/symbolicai
A neurosymbolic perspective on LLMs
TIGER-AI-Lab/MMLU-Pro
The code and data for "MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding...
deep-symbolic-mathematics/LLM-SR
[ICLR 2025 Oral] This is the official repo for the paper "LLM-SR" on Scientific Equation...
microsoft/interwhen
A framework for verifiable reasoning with language models.
zhudotexe/fanoutqa
Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering for Large Language...