dvmazur/mixtral-offloading
Run Mixtral-8x7B models in Colab or consumer desktops
This tool helps developers and researchers run large Mixtral-8x7B language models on hardware with limited GPU memory, such as consumer desktops or Google Colab environments. It takes a Mixtral model and processes it to efficiently use both GPU and CPU memory, allowing for text generation or other inference tasks that would otherwise require more powerful, expensive hardware. It's designed for machine learning practitioners experimenting with large language models.
2,327 stars. No commits in the last 6 months.
Use this if you need to run Mixtral-8x7B models for inference but only have access to consumer-grade GPUs or cloud environments like Google Colab.
Not ideal if you already have access to high-end GPUs with ample memory for large language models, or if you need a command-line interface for local execution without coding.
Stars
2,327
Forks
234
Language
Python
License
MIT
Category
Last pushed
Apr 08, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/dvmazur/mixtral-offloading"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
mistralai/mistral-inference
Official inference library for Mistral models
open-compass/MixtralKit
A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI
vicuna-tools/vicuna-installation-guide
The "vicuna-installation-guide" provides step-by-step instructions for installing and...
pleisto/yuren-13b
Yuren 13B is an information synthesis large language model that has been continuously trained...
hkproj/mistral-llm-notes
Notes on the Mistral AI model