alphadl/OOP-eval
The first Object-Oriented Programming (OOP) Evaluation Benchmark for LLMs
This project helps evaluate how well large language models (LLMs) can generate object-oriented programming (OOP) code. It takes an LLM's code output and assesses its OOP quality and correctness across different difficulty levels. This is for researchers and developers who are building or comparing LLMs and need to rigorously quantify their OOP code generation capabilities.
No commits in the last 6 months.
Use this if you are developing or evaluating large language models and need a standardized way to measure their proficiency in generating object-oriented programming code.
Not ideal if you are an application developer looking for a tool to help you write or debug your own object-oriented code.
Stars
27
Forks
3
Language
Python
License
—
Category
Last pushed
Jan 15, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/alphadl/OOP-eval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
eth-sri/matharena
Evaluation of LLMs on latest math competitions
tatsu-lab/alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality,...
HPAI-BSC/TuRTLe
TuRTLe: A Unified Evaluation of LLMs for RTL Generation 🐢 (MLCAD 2025)
nlp-uoregon/mlmm-evaluation
Multilingual Large Language Models Evaluation Benchmark
haesleinhuepf/human-eval-bia
Benchmarking Large Language Models for Bio-Image Analysis Code Generation