kekzl/imp

High-performance LLM inference engine in C++/CUDA for NVIDIA Blackwell GPUs (RTX 5090)

/ 100

Emerging

This project helps AI developers and researchers quickly run large language models (LLMs) on powerful NVIDIA GPUs. It takes in a trained LLM in GGUF format and efficiently generates text responses, like an interactive chat or a single prompt. The ideal users are those who build and deploy LLM-powered applications or conduct performance benchmarks.

Use this if you need to achieve the highest possible speed for LLM inference on NVIDIA Blackwell or Hopper GPUs, especially with quantized models.

Not ideal if you're looking for a general-purpose LLM development library that runs on consumer-grade or older GPUs, or if you prefer a Python-only solution.

LLM deployment AI inference GPU optimization ML model serving Generative AI

No Package No Dependents

Maintenance 10 / 25

Adoption 6 / 25

Maturity 11 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Cuda

License

MIT

Higher-rated alternatives

ggml-org/ggml

Tensor library for machine learning

onnx/ir-py

Efficient in-memory representation for ONNX, in Python

SandAI-org/MagiCompiler

A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.

R-D-BioTech-Alaska/Qelm

Qelm - Quantum Enhanced Language Model

bytedance/lightseq

LightSeq: A High Performance Library for Sequence Processing and Generation

Explore LLM Tools

All categories Trending LLM Tool directory Insights