wangcx18/llm-vscode-inference-server
An endpoint server for efficiently serving quantized open-source LLMs for code.
This project provides an alternative server for the 'llm-vscode' extension, enabling developers to run open-source code completion models locally. It takes a quantized language model file as input and serves it as an endpoint, allowing the VSCode extension to provide code suggestions and completions directly on your machine. This is ideal for software developers who want to self-host code-generating AI for their programming tasks.
No commits in the last 6 months.
Use this if you are a software developer who wants to run an open-source, quantized code Large Language Model (LLM) locally within VSCode for code completion and generation, reducing reliance on cloud services.
Not ideal if you are not a developer or if you prefer to use cloud-hosted code completion services without managing local model serving.
Stars
58
Forks
10
Language
Python
License
Apache-2.0
Category
Last pushed
Oct 15, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/wangcx18/llm-vscode-inference-server"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
thu-pacman/chitu
High-performance inference framework for large language models, focusing on efficiency,...
sophgo/LLM-TPU
Run generative AI models in sophgo BM1684X/BM1688
NotPunchnox/rkllama
Ollama alternative for Rockchip NPU: An efficient solution for running AI and Deep learning...
Deep-Spark/DeepSparkHub
DeepSparkHub selects hundreds of application algorithms and models, covering various fields of...
howard-hou/VisualRWKV
VisualRWKV is the visual-enhanced version of the RWKV language model, enabling RWKV to handle...