RAZZULLIX/fast_topk_batched

High-performance batched Top-K selection for CPU inference. Up to 80x faster than PyTorch, optimized for LLM sampling with AVX2 SIMD.

36
/ 100
Emerging

This project helps machine learning engineers accelerate the selection of the most probable words or tokens when generating text with large language models (LLMs) on standard CPUs. You provide raw prediction scores for many possible next tokens across multiple input sequences, and it quickly returns the top 'K' most likely token IDs for each sequence. It's designed for developers building or deploying LLM inference systems who need to maximize performance without dedicated GPU hardware.

Use this if you are a machine learning engineer running LLM inference on CPU and need to significantly speed up the 'top-K' sampling step for text generation.

Not ideal if you are primarily running LLM inference on GPUs, or if your application does not involve LLM text generation.

LLM inference NLP engineering CPU optimization text generation machine learning deployment
No Package No Dependents
Maintenance 10 / 25
Adoption 6 / 25
Maturity 11 / 25
Community 9 / 25

How are scores calculated?

Stars

16

Forks

2

Language

C++

License

MIT

Last pushed

Jan 19, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/RAZZULLIX/fast_topk_batched"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.