amazon-science/llm-code-preference

Training and Benchmarking LLMs for Code Preference.

/ 100

Experimental

This project helps AI researchers evaluate and train models that can judge the quality of code. It takes in pairs of code snippets and determines which one is better based on criteria like correctness, efficiency, and security. Researchers working on improving AI models for software development would use this to refine their code-generating systems.

No commits in the last 6 months.

Use this if you are an AI researcher developing or evaluating large language models that generate code and need to assess their outputs based on various quality criteria.

Not ideal if you are a software developer looking for a tool to automatically fix or improve your application code, as this is a research tool for model training and evaluation.

AI research LLM evaluation code quality model training software engineering AI

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 6 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

oripress/AlgoTune

AlgoTune is a NeurIPS 2025 benchmark made up of 154 math, physics, and computer science...

xjywhu/Awesome-Multimodal-LLM-for-Code

Multimodal Large Language Models for Code Generation under Multimodal Scenarios

jie-jw-wu/human-eval-comm

HumanEvalComm: Evaluating Communication Skill of Code LLM and LLM Agent

juyongjiang/CodeUp

CodeUp: A Multilingual Code Generation Llama-X Model with Parameter-Efficient Instruction-Tuning

JHansiduYapa/Fine-Tuning-a-Small-Language-Model-for-Cypher-Query-Generation

This project fine-tunes Unsloth's Gemma-3 4B IT (4-bit) model to translate natural language into...

Explore Transformer Models

All categories Trending Transformer directory Insights