ngxson/wllama
WebAssembly binding for llama.cpp - Enabling on-browser LLM inference
This project helps web developers integrate large language models (LLMs) directly into web browsers. It takes LLM model files (GGUF format) and outputs generated text completions or embeddings, all running client-side. Web application developers are the primary users, enabling powerful AI features without needing a backend server or specialized GPU on the user's machine.
1,013 stars.
Use this if you are a web developer building interactive browser-based applications that need to perform natural language processing tasks using LLMs without relying on a server.
Not ideal if you need to run models larger than 2GB on a single file, require WebGPU support, or prefer server-side inference for very large-scale or high-performance applications.
Stars
1,013
Forks
73
Language
TypeScript
License
MIT
Category
Last pushed
Dec 17, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ngxson/wllama"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ludwig-ai/ludwig
Low-code framework for building custom LLMs, neural networks, and other AI models
withcatai/node-llama-cpp
Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema...
mudler/LocalAI
:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and...
zhudotexe/kani
kani (カニ) is a highly hackable microframework for tool-calling language models. (NLP-OSS @ EMNLP 2023)
SciSharp/LLamaSharp
A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.