amazon-science/llm-code-preference
Training and Benchmarking LLMs for Code Preference.
This project helps AI researchers evaluate and train models that can judge the quality of code. It takes in pairs of code snippets and determines which one is better based on criteria like correctness, efficiency, and security. Researchers working on improving AI models for software development would use this to refine their code-generating systems.
No commits in the last 6 months.
Use this if you are an AI researcher developing or evaluating large language models that generate code and need to assess their outputs based on various quality criteria.
Not ideal if you are a software developer looking for a tool to automatically fix or improve your application code, as this is a research tool for model training and evaluation.
Stars
38
Forks
2
Language
Python
License
—
Category
Last pushed
Nov 15, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/amazon-science/llm-code-preference"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
oripress/AlgoTune
AlgoTune is a NeurIPS 2025 benchmark made up of 154 math, physics, and computer science...
xjywhu/Awesome-Multimodal-LLM-for-Code
Multimodal Large Language Models for Code Generation under Multimodal Scenarios
jie-jw-wu/human-eval-comm
HumanEvalComm: Evaluating Communication Skill of Code LLM and LLM Agent
juyongjiang/CodeUp
CodeUp: A Multilingual Code Generation Llama-X Model with Parameter-Efficient Instruction-Tuning
JHansiduYapa/Fine-Tuning-a-Small-Language-Model-for-Cypher-Query-Generation
This project fine-tunes Unsloth's Gemma-3 4B IT (4-bit) model to translate natural language into...