nanowell/Q-Sparse-LLM

My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated

26
/ 100
Experimental

This project helps machine learning engineers and researchers optimize large language models (LLMs) for deployment. It takes existing transformer-based LLMs and applies techniques to make them run more efficiently. The output is a functionally similar LLM that requires less computational power and memory, making it suitable for environments with resource constraints.

No commits in the last 6 months.

Use this if you need to reduce the computational cost and memory footprint of large language models while maintaining their performance, especially for deployment on resource-limited hardware.

Not ideal if you are looking for a pre-trained, ready-to-use LLM for direct application without modification or advanced optimization needs.

large-language-models model-optimization edge-ai efficient-inference machine-learning-deployment
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 3 / 25

How are scores calculated?

Stars

34

Forks

1

Language

Python

License

MIT

Last pushed

Aug 14, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/nanowell/Q-Sparse-LLM"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.