AMDResearch/NPUEval

NPUEval is an LLM evaluation dataset written specifically to target AIE kernel code generation on RyzenAI hardware.

/ 100

Emerging

This is a dataset designed to evaluate how well Large Language Models (LLMs) can generate specialized code for AMD's RyzenAI hardware's Neural Processing Unit (NPU). It provides a set of prompts as input and allows you to test the generated code's functional correctness on AIE kernel architecture. AI software developers and researchers working with AMD's RyzenAI platform would use this to benchmark and improve LLM code generation for NPU applications.

Use this if you are developing or evaluating LLMs that generate low-level code for AMD's AI Engine (AIE) kernels on RyzenAI hardware and need a standardized way to measure their performance.

Not ideal if you are working with NPU hardware other than AMD's AIE2/AIE2P or if your focus is on high-level application development rather than kernel-level code generation.

AI Engine development NPU code generation LLM evaluation Hardware acceleration RyzenAI programming

No License No Package No Dependents

Maintenance 6 / 25

Adoption 7 / 25

Maturity 8 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

C++

License

—

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

vibrantlabsai/ragas

Supercharge Your LLM Application Evaluations 🚀

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

EuroEval/EuroEval

The robust European language model benchmark.

Giskard-AI/giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

Explore LLM Tools

All categories Trending LLM Tool directory Insights