iNeil77/vllm-code-harness

Run code inference-only benchmarks quickly using vLLM

/ 100

Experimental

This tool helps AI researchers and MLOps engineers quickly benchmark the performance of large language models (LLMs) specifically trained for code generation or completion tasks. You input an autoregressive code generation model from Hugging Face, and it outputs detailed evaluation metrics, generation samples, and references. It's designed for those who need to assess how well their code models perform against various coding benchmarks.

No commits in the last 6 months.

Use this if you are a researcher or MLOps engineer evaluating the inference speed and accuracy of autoregressive code LLMs on standard benchmarks.

Not ideal if you need to evaluate encoder-decoder models or if your primary concern is fine-tuning or training LLMs rather than just benchmarking inference.

LLM-benchmarking code-generation-evaluation MLOps AI-research model-performance

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

Apache-2.0

Higher-rated alternatives

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

sgl-project/sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

alibaba/MNN

MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...

xorbitsai/inference

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...

tensorzero/tensorzero

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...

Explore Transformer Models

All categories Trending Transformer directory Insights