HPAI-BSC/TuRTLe
TuRTLe: A Unified Evaluation of LLMs for RTL Generation 🐢 (MLCAD 2025)
This tool helps hardware engineers and chip designers evaluate how well Large Language Models (LLMs) can generate Register Transfer Level (RTL) code. It takes natural language specifications or incomplete RTL code as input, and outputs generated RTL, along with detailed performance metrics. Chip designers and verification engineers would use this to benchmark and select the best LLMs for their hardware design automation tasks.
Use this if you need to systematically compare and understand the capabilities of various LLMs for creating or completing Verilog and other RTL designs, ensuring correctness and efficiency.
Not ideal if you are looking for a standalone RTL design tool or an LLM for general programming tasks, as its focus is specifically on benchmarking LLMs for hardware description languages.
Stars
40
Forks
8
Language
Python
License
Apache-2.0
Category
Last pushed
Feb 23, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/HPAI-BSC/TuRTLe"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
eth-sri/matharena
Evaluation of LLMs on latest math competitions
tatsu-lab/alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality,...
nlp-uoregon/mlmm-evaluation
Multilingual Large Language Models Evaluation Benchmark
haesleinhuepf/human-eval-bia
Benchmarking Large Language Models for Bio-Image Analysis Code Generation
ShuntaroOkuma/adapt-gauge-core
Measure LLM adaptation efficiency — how fast models learn from few examples