di37/LLM-Load-Unload-Ollama

This is a simple demonstration to show how to keep an LLM loaded for prolonged time in the memory or unloading the model immediately after inferencing when using it via Ollama.

19
/ 100
Experimental

When working with large language models (LLMs) through Ollama, this project helps you manage how they use your computer's memory. It demonstrates how to keep an LLM actively loaded for continuous use or unload it immediately after getting a response. This is useful for anyone running LLMs locally who needs to optimize memory usage.

No commits in the last 6 months.

Use this if you are running LLMs via Ollama and need to control whether the model stays in memory for quick subsequent queries or unloads to free up resources.

Not ideal if you are not using Ollama, or if you are not concerned with optimizing memory usage for local LLM inference.

local-LLM-deployment memory-management resource-optimization AI-application-hosting
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 8 / 25
Community 6 / 25

How are scores calculated?

Stars

13

Forks

1

Language

Jupyter Notebook

License

Last pushed

May 04, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/di37/LLM-Load-Unload-Ollama"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.