FMInference/H2O

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

38
/ 100
Emerging

Building and deploying Large Language Models (LLMs) for tasks like writing stories or powering chatbots is often expensive due to high memory use. This project helps reduce the significant memory footprint of these models, especially during long content generation, making them more affordable and efficient to run. Developers and ML engineers deploying LLMs will use this to optimize their inference systems.

506 stars. No commits in the last 6 months.

Use this if you are a developer or ML engineer working with Large Language Models and need to reduce memory consumption and improve throughput for generative inference, especially for long content.

Not ideal if you are a business user looking for a no-code solution or if your primary concern is fine-tuning an LLM rather than optimizing its deployment efficiency.

LLM deployment ML system optimization Generative AI inference Deep learning engineering Model serving
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 20 / 25

How are scores calculated?

Stars

506

Forks

74

Language

Python

License

Last pushed

Aug 01, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/FMInference/H2O"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.