wangsiping97/FastGEMV

High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.

36
/ 100
Emerging

This tool helps high-performance computing engineers and machine learning infrastructure developers optimize matrix-vector multiplication, a fundamental operation in many AI models. It takes large matrices (up to 16,384x16,384) and vectors, in various data formats (fp16, int8, int4), and produces the resulting product at significantly faster speeds on NVIDIA GPUs. This is for professionals building and deploying AI systems who need to squeeze maximum performance out of their hardware.

128 stars. No commits in the last 6 months.

Use this if you are a developer optimizing deep learning model inference or other GPU-accelerated linear algebra tasks and need to speed up matrix-vector multiplications beyond what standard libraries offer.

Not ideal if you are a data scientist or researcher working at a higher level of abstraction and not directly optimizing CUDA kernel performance.

GPU-optimization deep-learning-inference high-performance-computing CUDA-development
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 10 / 25

How are scores calculated?

Stars

128

Forks

8

Language

Cuda

License

MIT

Last pushed

Jul 13, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/wangsiping97/FastGEMV"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.