kekzl/imp
High-performance LLM inference engine in C++/CUDA for NVIDIA Blackwell GPUs (RTX 5090)
This project helps AI developers and researchers quickly run large language models (LLMs) on powerful NVIDIA GPUs. It takes in a trained LLM in GGUF format and efficiently generates text responses, like an interactive chat or a single prompt. The ideal users are those who build and deploy LLM-powered applications or conduct performance benchmarks.
Use this if you need to achieve the highest possible speed for LLM inference on NVIDIA Blackwell or Hopper GPUs, especially with quantized models.
Not ideal if you're looking for a general-purpose LLM development library that runs on consumer-grade or older GPUs, or if you prefer a Python-only solution.
Stars
15
Forks
2
Language
Cuda
License
MIT
Category
Last pushed
Mar 11, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/kekzl/imp"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ggml-org/ggml
Tensor library for machine learning
onnx/ir-py
Efficient in-memory representation for ONNX, in Python
SandAI-org/MagiCompiler
A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.
R-D-BioTech-Alaska/Qelm
Qelm - Quantum Enhanced Language Model
bytedance/lightseq
LightSeq: A High Performance Library for Sequence Processing and Generation