iNeil77/vllm-code-harness
Run code inference-only benchmarks quickly using vLLM
This tool helps AI researchers and MLOps engineers quickly benchmark the performance of large language models (LLMs) specifically trained for code generation or completion tasks. You input an autoregressive code generation model from Hugging Face, and it outputs detailed evaluation metrics, generation samples, and references. It's designed for those who need to assess how well their code models perform against various coding benchmarks.
No commits in the last 6 months.
Use this if you are a researcher or MLOps engineer evaluating the inference speed and accuracy of autoregressive code LLMs on standard benchmarks.
Not ideal if you need to evaluate encoder-decoder models or if your primary concern is fine-tuning or training LLMs rather than just benchmarking inference.
Stars
9
Forks
—
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 20, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/iNeil77/vllm-code-harness"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
alibaba/MNN
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...
xorbitsai/inference
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...
tensorzero/tensorzero
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...