willbnu/Qwen-3.5-16G-Vram-Local
Configs, launchers, benchmarks, and tooling for running Qwen3.5 GGUF models locally with llama.cpp on a 16GB NVIDIA GPU
This project helps individual users, like researchers or advanced hobbyists, run large language models (specifically Qwen 3.5) on their local computer with a 16GB NVIDIA graphics card. It provides configurations and tools to get the best performance for tasks like coding, reasoning, or multimodal interactions. You input specific Qwen 3.5 model files and it helps you get optimized language model outputs.
Use this if you want to run powerful Qwen 3.5 language models on your personal machine for local data analysis, creative writing, or coding assistance, without relying on cloud services.
Not ideal if you don't have a dedicated NVIDIA GPU with at least 16GB VRAM, or if you need to deploy models for large-scale production environments.
Stars
21
Forks
4
Language
Python
License
MIT
Category
Last pushed
Mar 14, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/willbnu/Qwen-3.5-16G-Vram-Local"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
QwenLM/Qwen
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
LLM-Red-Team/qwen-free-api
🚀...
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by...
QwenLM/qwen.cpp
C++ implementation of Qwen-LM
yassa9/qwen600
Static suckless single batch CUDA-only qwen3-0.6B mini inference engine