iNeil77/vllm-code-harness

Run code inference-only benchmarks quickly using vLLM

21
/ 100
Experimental

This tool helps AI researchers and MLOps engineers quickly benchmark the performance of large language models (LLMs) specifically trained for code generation or completion tasks. You input an autoregressive code generation model from Hugging Face, and it outputs detailed evaluation metrics, generation samples, and references. It's designed for those who need to assess how well their code models perform against various coding benchmarks.

No commits in the last 6 months.

Use this if you are a researcher or MLOps engineer evaluating the inference speed and accuracy of autoregressive code LLMs on standard benchmarks.

Not ideal if you need to evaluate encoder-decoder models or if your primary concern is fine-tuning or training LLMs rather than just benchmarking inference.

LLM-benchmarking code-generation-evaluation MLOps AI-research model-performance
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 0 / 25

How are scores calculated?

Stars

9

Forks

Language

Python

License

Apache-2.0

Last pushed

Mar 20, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/iNeil77/vllm-code-harness"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.