kolinko/effort
An implementation of bucketMul LLM inference
This project helps machine learning practitioners fine-tune the performance of large language models (LLMs) on Apple Silicon. It takes an LLM model and allows you to adjust the 'effort' level in real-time, controlling the balance between inference speed and output quality. This is ideal for researchers and developers experimenting with LLM deployment on macOS.
227 stars. No commits in the last 6 months.
Use this if you need to rapidly experiment with different LLM inference speeds and quality levels on Apple Silicon hardware.
Not ideal if you are looking for a solution for LLM inference on non-Apple hardware or if you require maximum possible quality without any compromise on speed.
Stars
227
Forks
16
Language
Swift
License
MIT
Category
Last pushed
Jul 01, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/kolinko/effort"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Blaizzy/mlx-vlm
MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac...
b4rtaz/distributed-llama
Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM...
armbues/SiLLM
SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple...
microsoft/batch-inference
Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.
armbues/SiLLM-examples
Examples for using the SiLLM framework for training and running Large Language Models (LLMs) on...