Code Model Training AI Coding Tools

Tools and frameworks for pre-training, fine-tuning, and optimizing language models specifically for code generation and programming tasks. Does NOT include inference-only tools, deployment platforms, or general LLM training frameworks.

There are 68 code model training tools tracked. 2 score above 50 (established tier). The highest-rated is k4black/codebleu at 56/100 with 130 stars.

Get all 68 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ai-coding&subcategory=code-model-training&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 k4black/codebleu

Pip compatible CodeBLEU metric implementation available for linux/macos/win

56
Established
2 LiveCodeBench/LiveCodeBench

Official repository for the paper "LiveCodeBench: Holistic and Contamination...

53
Established
3 EdinburghNLP/code-docstring-corpus

Preprocessed Python functions and docstrings for automated code...

48
Emerging
4 hendrycks/apps

APPS: Automated Programming Progress Standard (NeurIPS 2021)

46
Emerging
5 solis-team/Hydra

[FSE 2026] Do Not Treat Code as Natural Language: Implications for...

44
Emerging
6 alxschwrz/codex_py2cpp

Converts python code into c++ by using OpenAI CODEX.

43
Emerging
7 AS-SiliconMind/SiliconMind-V1

Inference Engine for SiliconMind-V1 Verilog Coding Models

40
Emerging
8 tongye98/Awesome-Code-Benchmark

A comprehensive code domain benchmark review of LLM researches.

40
Emerging
9 reddy-lab-code-research/PPOCoder

Code for the TMLR 2023 paper "PPOCoder: Execution-based Code Generation...

40
Emerging
10 bharathsudharsan/OTA-TinyML

Code for IEEE Internet Computing Journal paper 'OTA-TinyML: Over the Air...

39
Emerging
11 logpai/LogBench

A benchmark for logging statement generation.

38
Emerging
12 s2e-lab/Code-Smell-Code-Generation

Source code for "An Empirical Study of Code Smells in Transformer-based Code...

37
Emerging
13 zorazrw/odex

[EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation

36
Emerging
14 vl2g/floco

Flow Chart Image-to-Code Generation

35
Emerging
15 code-gen/cscg

Code Generation as a Dual Task of Code Summarization.

35
Emerging
16 CloudIDEaaS-zz/hydra

Hydra is a app generation product. Hydra aims to reduce the "concept to...

35
Emerging
17 99EnriqueD/verilog_autocompletion

Code implementation for "A Deep Learning Framework for Verilog...

35
Emerging
18 s2e-lab/SecurityEval

Repository for "SecurityEval Dataset: Mining Vulnerability Examples to...

34
Emerging
19 devashish-gupta/Geode

A zero-shot geospatial question answering agent with precise spatiotemporal...

34
Emerging
20 matlab-deep-learning/Deep_Learning_Poker_Player_using_MATLAB_and_Raspberry_Pi

This example shows how to use automatic code generation to deploy a deep...

33
Emerging
21 Gen-Verse/CURE

[NeurIPS 2025 Spotlight] Co-Evolving LLM Coder and Unit Tester via...

33
Emerging
22 madaan/pie-perf

Training language models to make programs faster

32
Emerging
23 formula-code/fc-eval

Evaluation harness for FormulaCode

31
Emerging
24 WebPAI/Interaction2Code

[ASE 2025] Benchmarking MLLM-based Interactive Webpage Code Generation from...

31
Emerging
25 Pavansomisetty21/Automated-Code-Generation-and-Execution-Agent-using-LangChain-and-Cohere-LLM

In this we implement an agent which generates and executes code using cohere...

30
Emerging
26 Rudra5417/Code-Generator-using-GPT-3

Natural Language to Code

29
Experimental
27 HIT-SCIR/Abacus

珠算代码大模型(Abacus Code LLM)

29
Experimental
28 HySonLab/Design2Code

Large Language Model in combination with Large Vision Model for the task of...

29
Experimental
29 matthewdeanmartin/paipi

Pypi search, except the backend is an LLM's pixelated memory of Pypi.

29
Experimental
30 aswathselvam/Potholes

Realtime pothole detection on Android phone's IMU data. SVM model in C++, ...

29
Experimental
31 aixcoder-plugin/nl2code-dataset

Aix-bench, the Java benchmark for code synthesis problem.

27
Experimental
32 jszheng21/RACE

RACE is a multi-dimensional benchmark for code generation that focuses on...

27
Experimental
33 domaineval/DomainEval

DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation...

27
Experimental
34 KohlerHECTOR/interpreter-py

Implementation of Interpretable and Editable Programmatic Tree Policies for...

27
Experimental
35 albertusk95/intention-to-code-lstm

Source Code Generation Based On User Intention Using LSTM Networks

26
Experimental
36 seal-research/OmniCode

OmniCode: A Diverse Software Engineering Benchmark for Evaluating Large...

26
Experimental
37 CodeEff/ECCO

[EMNLP 2024] Code for the paper "ECCO: Can We Improve Model-Generated Code...

25
Experimental
38 medxiaorudan/CodeGeneration

Prompt engineering with Langchain and fine-tuning the CodeLlama model. The...

25
Experimental
39 formula-code/terminal-bench

Evaluation harness for FormulaCode

25
Experimental
40 LiuZeJie97/Code-Generation-From-Flowcharts-with-Texts-A-Benchmark-Dataset-and-An-Approach

Code for the paper "Code Generation From Flowcharts with Texts: A Benchmark...

24
Experimental
41 yunbow/ai-dev-os-benchmark

Benchmark: how AI coding guidelines affect code quality — 3 conditions × 9...

23
Experimental
42 adpena/vertigo-lora

Domain-specialized LoRA fine-tuning pipeline for Roblox/Luau code generation...

23
Experimental
43 kroq86/honeybadger

formal VM benchmark and inspectable reasoning runtime for testing whether...

22
Experimental
44 sephirxth/LLM_code_test

LLM code generation benchmark — Claude vs Gemini vs DeepSeek vs Grok on a...

22
Experimental
45 LIANGQINGYUAN/Lyra

Lyra: A Benchmark for Turducken-Style Code Generation

22
Experimental
46 Meisdy/Speech-to-Code-Generation-for-Collaborative-Robots

A modular pipeline that lets users program collaborative robots through...

22
Experimental
47 yueyueL/ReliableLM4Code

Collections of research, benchmarks and tools towards more robust and...

21
Experimental
48 ftrou/Decodifier

**The Compiler for AI-Generated Software** **LLMs don’t write code.** ...

20
Experimental
49 kabirjaipal/Evil-Codes

Evil Codes is a repository where you will find many useful code snippets and...

20
Experimental
50 jacopotagliabue/LLMs-to-Alloy

Example of LLM generated Alloy code for deductive reasoning from English...

20
Experimental
51 sssszh/CodePLAN

The code repository for the paper “Enhancing Code Generation Performance of...

20
Experimental
52 falconvn2006/GPasT

GPT for Pascal code generation :)

18
Experimental
53 AngelicaArabe/OTA-IOT

🔧 Develop IoT applications with ESP32-S3 using OTA updates, SPIFFS web...

16
Experimental
54 ada994/prism-bench

🌐 Benchmark models using the PRISM framework and access the FLUX-Reason-6M...

14
Experimental
55 ALM3ARQ/character-prefix-conditioning

🔍 Streamline token sampling with character prefix conditioning using a...

14
Experimental
56 cloudrishi/springboot-ai-generator

AI-powered Spring Boot code generator using CodeLlama LLM running locally via Ollama

14
Experimental
57 gokhanercan/gen-atomic

An LLM-based code generation framework aims to support a wide range of...

14
Experimental
58 HWH-2000/DynaCode

[ACL'2025 Findings] DynaCode: A Dynamic Complexity-Aware Code Benchmark for...

14
Experimental
59 AshrafMorningstar/omni-code-polyglot

A massive, SEO‑optimized collection of 300+ ready‑to‑run code snippets in...

14
Experimental
60 Bifrost-Technologies/Prometheus

A developer platform for generating complete Solana programs in one-shot...

13
Experimental
61 przeprogramowani/10x-bench-eval

Scoring criteria for 10x-bench (10xbench.ai)

13
Experimental
62 evalops/llmcc

LLM-native compiler toolchain - implementing 'LLM ≈ probabilistic compiler'...

12
Experimental
63 Jayveersinh-Raj/code_generation_gpt2

Fine tuning a gpt2 model for code generation/completion. This is the work...

12
Experimental
64 navneetprabhakar/telegram-bot-llm

Telegram bot with LLM code gen capabilities

11
Experimental
65 runaicode/ai-coding-benchmarks

Standardized test prompts and benchmarks for evaluating AI coding...

11
Experimental
66 moritzWa/BugDetectionBench

A benchmark dataset of real-world code review comments, designed to evaluate...

11
Experimental
67 rajat-kumar-thakur/LLMs-for-Resource-Constrained-Devices

This work was done as part of SRIP 2025 Internship, IIT Gandhinagar

11
Experimental
68 rudijetson/grammar-ops

LLM-native codebase grammar system - Transform natural language patterns...

10
Experimental