Lucien2468/Ollama-TurboQuant-Integration

TurboQuant: Native 3-Bit Quantization for Ollama - Achieve 25-28% better compression than Q4_0 while maintaining high-speed CPU inference. Experimentally integrated into Ollama with custom GGML kernels for LLM efficiency.

/ 100

Experimental

No Package No Dependents

Maintenance 13 / 25

Adoption 4 / 25

Maturity 9 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

License

MIT

Last pushed

Apr 04, 2026

Commits (30d)

GitHub

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Lucien2468/Ollama-TurboQuant-Integration"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Higher-rated alternatives

Dao-AILab/flash-attention

Fast and memory-efficient exact attention

wuwangzhang1216/abliterix

Fully automatic censorship removal for language models. LoRA abliteration + Optuna TPE optimization.

lucidrains/deep-cross-attention

Implementation of the proposed DeepCrossAttention by Heddes et al at Google research, in Pytorch

modelscope/mcore-bridge

MCore-Bridge: Providing Megatron-Core model definitions for state-of-the-art large models and...

assembly-automation-hub/repo-governance

⚙️ Reusable GitHub repository governance kit: CI/CD workflows, CodeQL SAST, Dependabot...

Explore Transformer Models

All categories Trending Transformer directory Insights