itsnamgyu/block-transformer

Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)

/ 100

Emerging

This project offers a new way to build large language models (LLMs) that generate text much faster. It takes text data for training and outputs a language model capable of significantly quicker text generation compared to standard models with similar quality. This is for AI researchers and machine learning engineers who develop and deploy large language models.

163 stars. No commits in the last 6 months.

Use this if you are developing or deploying large language models and need to achieve 10-20x faster text generation throughput without sacrificing output quality.

Not ideal if you are looking for a pre-trained model for direct use without custom training or fine-tuning, or if you don't work with large-scale language model inference.

large-language-models natural-language-generation model-inference-optimization deep-learning-architecture

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 10 / 25

How are scores calculated?

Stars

163

Forks

Language

Python

License

MIT

Higher-rated alternatives

LMCache/LMCache

Supercharge Your LLM with the Fastest KV Cache Layer

Zefan-Cai/KVCache-Factory

Unified KV Cache Compression Methods for Auto-Regressive Models

dataflowr/llm_efficiency

KV Cache & LoRA for minGPT

OnlyTerp/kvtc

First open-source KVTC implementation (NVIDIA, ICLR 2026) -- 8-32x KV cache compression via PCA...

OnlyTerp/turboquant

First open-source implementation of Google TurboQuant (ICLR 2026) -- near-optimal KV cache...

Explore Transformer Models

All categories Trending Transformer directory Insights