Code Model Training AI Coding Tools

Tools and frameworks for pre-training, fine-tuning, and optimizing language models specifically for code generation and programming tasks. Does NOT include inference-only tools, deployment platforms, or general LLM training frameworks.

There are 68 code model training tools tracked. 2 score above 50 (established tier). The highest-rated is k4black/codebleu at 56/100 with 130 stars.

Get all 68 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ai-coding&subcategory=code-model-training&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	k4black/codebleu Pip compatible CodeBLEU metric implementation available for linux/macos/win	56	Established	130	Python
2	LiveCodeBench/LiveCodeBench Official repository for the paper "LiveCodeBench: Holistic and Contamination...	53	Established	818	Python
3	EdinburghNLP/code-docstring-corpus Preprocessed Python functions and docstrings for automated code...	48	Emerging	211	Python
4	hendrycks/apps APPS: Automated Programming Progress Standard (NeurIPS 2021)	46	Emerging	520	Python
5	solis-team/Hydra [FSE 2026] Do Not Treat Code as Natural Language: Implications for...	44	Emerging	5	Python
6	alxschwrz/codex_py2cpp Converts python code into c++ by using OpenAI CODEX.	43	Emerging	505	Python
7	AS-SiliconMind/SiliconMind-V1 Inference Engine for SiliconMind-V1 Verilog Coding Models	40	Emerging	16	Python
8	tongye98/Awesome-Code-Benchmark A comprehensive code domain benchmark review of LLM researches.	40	Emerging	208	—
9	reddy-lab-code-research/PPOCoder Code for the TMLR 2023 paper "PPOCoder: Execution-based Code Generation...	40	Emerging	117	Python
10	bharathsudharsan/OTA-TinyML Code for IEEE Internet Computing Journal paper 'OTA-TinyML: Over the Air...	39	Emerging	29	C++
11	logpai/LogBench A benchmark for logging statement generation.	38	Emerging	26	Python
12	s2e-lab/Code-Smell-Code-Generation Source code for "An Empirical Study of Code Smells in Transformer-based Code...	37	Emerging	11	Python
13	zorazrw/odex [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation	36	Emerging	49	Python
14	vl2g/floco Flow Chart Image-to-Code Generation	35	Emerging	36	Python
15	code-gen/cscg Code Generation as a Dual Task of Code Summarization.	35	Emerging	30	Jupyter Notebook
16	CloudIDEaaS-zz/hydra Hydra is a app generation product. Hydra aims to reduce the "concept to...	35	Emerging	5	JavaScript
17	99EnriqueD/verilog_autocompletion Code implementation for "A Deep Learning Framework for Verilog...	35	Emerging	8	Jupyter Notebook
18	s2e-lab/SecurityEval Repository for "SecurityEval Dataset: Mining Vulnerability Examples to...	34	Emerging	85	Python
19	devashish-gupta/Geode A zero-shot geospatial question answering agent with precise spatiotemporal...	34	Emerging	8	Python
20	matlab-deep-learning/Deep_Learning_Poker_Player_using_MATLAB_and_Raspberry_Pi This example shows how to use automatic code generation to deploy a deep...	33	Emerging	6	MATLAB
21	Gen-Verse/CURE [NeurIPS 2025 Spotlight] Co-Evolving LLM Coder and Unit Tester via...	33	Emerging	159	Python
22	madaan/pie-perf Training language models to make programs faster	32	Emerging	98	Jupyter Notebook
23	formula-code/fc-eval Evaluation harness for FormulaCode	31	Emerging	4	Python
24	WebPAI/Interaction2Code [ASE 2025] Benchmarking MLLM-based Interactive Webpage Code Generation from...	31	Emerging	53	Python
25	Pavansomisetty21/Automated-Code-Generation-and-Execution-Agent-using-LangChain-and-Cohere-LLM In this we implement an agent which generates and executes code using cohere...	30	Emerging	2	Jupyter Notebook
26	Rudra5417/Code-Generator-using-GPT-3 Natural Language to Code	29	Experimental	14	Jupyter Notebook
27	HIT-SCIR/Abacus 珠算代码大模型（Abacus Code LLM）	29	Experimental	58	—
28	HySonLab/Design2Code Large Language Model in combination with Large Vision Model for the task of...	29	Experimental	10	Python
29	matthewdeanmartin/paipi Pypi search, except the backend is an LLM's pixelated memory of Pypi.	29	Experimental	1	Python
30	aswathselvam/Potholes Realtime pothole detection on Android phone's IMU data. SVM model in C++, ...	29	Experimental	3	C
31	aixcoder-plugin/nl2code-dataset Aix-bench, the Java benchmark for code synthesis problem.	27	Experimental	51	Java
32	jszheng21/RACE RACE is a multi-dimensional benchmark for code generation that focuses on...	27	Experimental	12	Python
33	domaineval/DomainEval DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation...	27	Experimental	14	Python
34	KohlerHECTOR/interpreter-py Implementation of Interpretable and Editable Programmatic Tree Policies for...	27	Experimental	15	Python
35	albertusk95/intention-to-code-lstm Source Code Generation Based On User Intention Using LSTM Networks	26	Experimental	19	Python
36	seal-research/OmniCode OmniCode: A Diverse Software Engineering Benchmark for Evaluating Large...	26	Experimental	13	Python
37	CodeEff/ECCO [EMNLP 2024] Code for the paper "ECCO: Can We Improve Model-Generated Code...	25	Experimental	7	Python
38	medxiaorudan/CodeGeneration Prompt engineering with Langchain and fine-tuning the CodeLlama model. The...	25	Experimental	8	C++
39	formula-code/terminal-bench Evaluation harness for FormulaCode	25	Experimental	4	Python
40	LiuZeJie97/Code-Generation-From-Flowcharts-with-Texts-A-Benchmark-Dataset-and-An-Approach Code for the paper "Code Generation From Flowcharts with Texts: A Benchmark...	24	Experimental	13	Jupyter Notebook
41	yunbow/ai-dev-os-benchmark Benchmark: how AI coding guidelines affect code quality — 3 conditions × 9...	23	Experimental	1	TypeScript
42	adpena/vertigo-lora Domain-specialized LoRA fine-tuning pipeline for Roblox/Luau code generation...	23	Experimental	1	Python
43	kroq86/honeybadger formal VM benchmark and inspectable reasoning runtime for testing whether...	22	Experimental	—	Python
44	sephirxth/LLM_code_test LLM code generation benchmark — Claude vs Gemini vs DeepSeek vs Grok on a...	22	Experimental	—	Python
45	LIANGQINGYUAN/Lyra Lyra: A Benchmark for Turducken-Style Code Generation	22	Experimental	15	Python
46	Meisdy/Speech-to-Code-Generation-for-Collaborative-Robots A modular pipeline that lets users program collaborative robots through...	22	Experimental	—	Python
47	yueyueL/ReliableLM4Code Collections of research, benchmarks and tools towards more robust and...	21	Experimental	30	—
48	ftrou/Decodifier The Compiler for AI-Generated Software LLMs don’t write code. ...	20	Experimental	1	Python
49	kabirjaipal/Evil-Codes Evil Codes is a repository where you will find many useful code snippets and...	20	Experimental	5	C++
50	jacopotagliabue/LLMs-to-Alloy Example of LLM generated Alloy code for deductive reasoning from English...	20	Experimental	4	Alloy
51	sssszh/CodePLAN The code repository for the paper “Enhancing Code Generation Performance of...	20	Experimental	8	Python
52	falconvn2006/GPasT GPT for Pascal code generation :)	18	Experimental	2	Jupyter Notebook
53	AngelicaArabe/OTA-IOT 🔧 Develop IoT applications with ESP32-S3 using OTA updates, SPIFFS web...	16	Experimental	—	C++
54	ada994/prism-bench 🌐 Benchmark models using the PRISM framework and access the FLUX-Reason-6M...	14	Experimental	—	Python
55	ALM3ARQ/character-prefix-conditioning 🔍 Streamline token sampling with character prefix conditioning using a...	14	Experimental	—	Python
56	cloudrishi/springboot-ai-generator AI-powered Spring Boot code generator using CodeLlama LLM running locally via Ollama	14	Experimental	—	Python
57	gokhanercan/gen-atomic An LLM-based code generation framework aims to support a wide range of...	14	Experimental	7	Python
58	HWH-2000/DynaCode [ACL'2025 Findings] DynaCode: A Dynamic Complexity-Aware Code Benchmark for...	14	Experimental	10	Python
59	AshrafMorningstar/omni-code-polyglot A massive, SEO‑optimized collection of 300+ ready‑to‑run code snippets in...	14	Experimental	1	—
60	Bifrost-Technologies/Prometheus A developer platform for generating complete Solana programs in one-shot...	13	Experimental	—	C#
61	przeprogramowani/10x-bench-eval Scoring criteria for 10x-bench (10xbench.ai)	13	Experimental	—	—
62	evalops/llmcc LLM-native compiler toolchain - implementing 'LLM ≈ probabilistic compiler'...	12	Experimental	1	TypeScript
63	Jayveersinh-Raj/code_generation_gpt2 Fine tuning a gpt2 model for code generation/completion. This is the work...	12	Experimental	7	Python
64	navneetprabhakar/telegram-bot-llm Telegram bot with LLM code gen capabilities	11	Experimental	—	Java
65	runaicode/ai-coding-benchmarks Standardized test prompts and benchmarks for evaluating AI coding...	11	Experimental	—	—
66	moritzWa/BugDetectionBench A benchmark dataset of real-world code review comments, designed to evaluate...	11	Experimental	1	TypeScript
67	rajat-kumar-thakur/LLMs-for-Resource-Constrained-Devices This work was done as part of SRIP 2025 Internship, IIT Gandhinagar	11	Experimental	—	Jupyter Notebook
68	rudijetson/grammar-ops LLM-native codebase grammar system - Transform natural language patterns...	10	Experimental	1	Shell