RAZZULLIX/fast_topk_batched
High-performance batched Top-K selection for CPU inference. Up to 80x faster than PyTorch, optimized for LLM sampling with AVX2 SIMD.
This project helps machine learning engineers accelerate the selection of the most probable words or tokens when generating text with large language models (LLMs) on standard CPUs. You provide raw prediction scores for many possible next tokens across multiple input sequences, and it quickly returns the top 'K' most likely token IDs for each sequence. It's designed for developers building or deploying LLM inference systems who need to maximize performance without dedicated GPU hardware.
Use this if you are a machine learning engineer running LLM inference on CPU and need to significantly speed up the 'top-K' sampling step for text generation.
Not ideal if you are primarily running LLM inference on GPUs, or if your application does not involve LLM text generation.
Stars
16
Forks
2
Language
C++
License
MIT
Category
Last pushed
Jan 19, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/RAZZULLIX/fast_topk_batched"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit...
mlcommons/inference
Reference implementations of MLPerf® inference benchmarks
mlcommons/training
Reference implementations of MLPerf® training benchmarks
datamade/usaddress
:us: a python library for parsing unstructured United States address strings into address components
GRAAL-Research/deepparse
Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning