di37/LLM-Load-Unload-Ollama

This is a simple demonstration to show how to keep an LLM loaded for prolonged time in the memory or unloading the model immediately after inferencing when using it via Ollama.

/ 100

Experimental

When working with large language models (LLMs) through Ollama, this project helps you manage how they use your computer's memory. It demonstrates how to keep an LLM actively loaded for continuous use or unload it immediately after getting a response. This is useful for anyone running LLMs locally who needs to optimize memory usage.

No commits in the last 6 months.

Use this if you are running LLMs via Ollama and need to control whether the model stays in memory for quick subsequent queries or unloads to free up resources.

Not ideal if you are not using Ollama, or if you are not concerned with optimizing memory usage for local LLM inference.

local-LLM-deployment memory-management resource-optimization AI-application-hosting

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 6 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

—

Higher-rated alternatives

NX-AI/xlstm

Official repository of the xLSTM.

sinanuozdemir/oreilly-hands-on-gpt-llm

Mastering the Art of Scalable and Efficient AI Model Deployment

DashyDashOrg/pandas-llm

Pandas-LLM

wxhcore/bumblecore

An LLM training framework built from the ground up, featuring a custom BumbleBee architecture...

MiniMax-AI/MiniMax-01

The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model &...

Explore Transformer Models

All categories Trending Transformer directory Insights