Beomi/BitNet-Transformers
0️⃣1️⃣🤗 BitNet-Transformers: Huggingface Transformers Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch with Llama(2) Architecture
This project helps machine learning engineers and researchers explore and implement highly efficient large language models. It provides a way to train and use models with significantly reduced memory footprint by converting standard Llama models to a 'bit' representation. The output is a Llama-architecture language model that uses considerably less GPU memory.
313 stars. No commits in the last 6 months.
Use this if you are developing or deploying large language models and need to drastically reduce the GPU memory consumption for training and inference.
Not ideal if you are a non-technical end-user simply looking to use an off-the-shelf language model.
Stars
313
Forks
34
Language
Python
License
—
Category
Last pushed
Mar 17, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Beomi/BitNet-Transformers"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ModelCloud/GPTQModel
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD...
intel/auto-round
🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality...
pytorch/ao
PyTorch native quantization and sparsity for training and inference
bodaay/HuggingFaceModelDownloader
Simple go utility to download HuggingFace Models and Datasets
NVIDIA/kvpress
LLM KV cache compression made easy