kekzl/imp

High-performance LLM inference engine in C++/CUDA for NVIDIA Blackwell GPUs (RTX 5090)

37
/ 100
Emerging

This project helps AI developers and researchers quickly run large language models (LLMs) on powerful NVIDIA GPUs. It takes in a trained LLM in GGUF format and efficiently generates text responses, like an interactive chat or a single prompt. The ideal users are those who build and deploy LLM-powered applications or conduct performance benchmarks.

Use this if you need to achieve the highest possible speed for LLM inference on NVIDIA Blackwell or Hopper GPUs, especially with quantized models.

Not ideal if you're looking for a general-purpose LLM development library that runs on consumer-grade or older GPUs, or if you prefer a Python-only solution.

LLM deployment AI inference GPU optimization ML model serving Generative AI
No Package No Dependents
Maintenance 10 / 25
Adoption 6 / 25
Maturity 11 / 25
Community 10 / 25

How are scores calculated?

Stars

15

Forks

2

Language

Cuda

License

MIT

Last pushed

Mar 11, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/kekzl/imp"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.