Beomi/BitNet-Transformers

0️⃣1️⃣🤗 BitNet-Transformers: Huggingface Transformers Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch with Llama(2) Architecture

/ 100

Emerging

This project helps machine learning engineers and researchers explore and implement highly efficient large language models. It provides a way to train and use models with significantly reduced memory footprint by converting standard Llama models to a 'bit' representation. The output is a Llama-architecture language model that uses considerably less GPU memory.

313 stars. No commits in the last 6 months.

Use this if you are developing or deploying large language models and need to drastically reduce the GPU memory consumption for training and inference.

Not ideal if you are a non-technical end-user simply looking to use an off-the-shelf language model.

large-language-models model-optimization deep-learning natural-language-processing gpu-memory-management

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 16 / 25

How are scores calculated?

Stars

313

Forks

Language

Python

License

—

Higher-rated alternatives

ModelCloud/GPTQModel

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD...

intel/auto-round

🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality...

pytorch/ao

PyTorch native quantization and sparsity for training and inference

bodaay/HuggingFaceModelDownloader

Simple go utility to download HuggingFace Models and Datasets

NVIDIA/kvpress

LLM KV cache compression made easy

Explore Transformer Models

All categories Trending Transformer directory Insights