tairov/llama2.mojo
Inference Llama 2 in one file of pure 🔥
This project helps developers run small Llama 2 language models very quickly on their local machines. It takes a pre-trained Llama 2 model file and a text prompt as input, and outputs the generated text continuation. Developers looking to integrate fast, local language model inference into their applications would use this.
2,119 stars.
Use this if you are a developer who needs to run a small Llama 2 model for text generation or completion directly on your CPU with exceptionally high performance, especially on Apple M1 or Intel CPUs.
Not ideal if you are looking to train large language models, fine-tune models on GPUs, or if you are not a developer and need a ready-to-use application.
Stars
2,119
Forks
135
Language
Mojo
License
MIT
Category
Last pushed
Feb 09, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/tairov/llama2.mojo"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
ludwig-ai/ludwig
Low-code framework for building custom LLMs, neural networks, and other AI models
withcatai/node-llama-cpp
Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema...
mudler/LocalAI
:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and...
zhudotexe/kani
kani (カニ) is a highly hackable microframework for tool-calling language models. (NLP-OSS @ EMNLP 2023)
SciSharp/LLamaSharp
A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.