HPAI-BSC/TuRTLe

TuRTLe: A Unified Evaluation of LLMs for RTL Generation 🐢 (MLCAD 2025)

/ 100

Established

This tool helps hardware engineers and chip designers evaluate how well Large Language Models (LLMs) can generate Register Transfer Level (RTL) code. It takes natural language specifications or incomplete RTL code as input, and outputs generated RTL, along with detailed performance metrics. Chip designers and verification engineers would use this to benchmark and select the best LLMs for their hardware design automation tasks.

Use this if you need to systematically compare and understand the capabilities of various LLMs for creating or completing Verilog and other RTL designs, ensuring correctness and efficiency.

Not ideal if you are looking for a standalone RTL design tool or an LLM for general programming tasks, as its focus is specifically on benchmarking LLMs for hardware description languages.

hardware-design chip-design RTL-verification EDA-tools Verilog-generation

No Package No Dependents

Maintenance 10 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Related models

eth-sri/matharena

Evaluation of LLMs on latest math competitions

tatsu-lab/alpaca_eval

An automatic evaluator for instruction-following language models. Human-validated, high-quality,...

nlp-uoregon/mlmm-evaluation

Multilingual Large Language Models Evaluation Benchmark

haesleinhuepf/human-eval-bia

Benchmarking Large Language Models for Bio-Image Analysis Code Generation

ShuntaroOkuma/adapt-gauge-core

Measure LLM adaptation efficiency — how fast models learn from few examples

Explore Transformer Models

All categories Trending Transformer directory Insights