alphadl/OOP-eval

The first Object-Oriented Programming (OOP) Evaluation Benchmark for LLMs

/ 100

Experimental

This project helps evaluate how well large language models (LLMs) can generate object-oriented programming (OOP) code. It takes an LLM's code output and assesses its OOP quality and correctness across different difficulty levels. This is for researchers and developers who are building or comparing LLMs and need to rigorously quantify their OOP code generation capabilities.

No commits in the last 6 months.

Use this if you are developing or evaluating large language models and need a standardized way to measure their proficiency in generating object-oriented programming code.

Not ideal if you are an application developer looking for a tool to help you write or debug your own object-oriented code.

LLM-evaluation code-generation-benchmarking AI-model-development natural-language-processing-research

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 8 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

eth-sri/matharena

Evaluation of LLMs on latest math competitions

tatsu-lab/alpaca_eval

An automatic evaluator for instruction-following language models. Human-validated, high-quality,...

HPAI-BSC/TuRTLe

TuRTLe: A Unified Evaluation of LLMs for RTL Generation 🐢 (MLCAD 2025)

nlp-uoregon/mlmm-evaluation

Multilingual Large Language Models Evaluation Benchmark

haesleinhuepf/human-eval-bia

Benchmarking Large Language Models for Bio-Image Analysis Code Generation

Explore Transformer Models

All categories Trending Transformer directory Insights