tairov/llama2.mojo

Inference Llama 2 in one file of pure 🔥

54
/ 100
Established

This project helps developers run small Llama 2 language models very quickly on their local machines. It takes a pre-trained Llama 2 model file and a text prompt as input, and outputs the generated text continuation. Developers looking to integrate fast, local language model inference into their applications would use this.

2,119 stars.

Use this if you are a developer who needs to run a small Llama 2 model for text generation or completion directly on your CPU with exceptionally high performance, especially on Apple M1 or Intel CPUs.

Not ideal if you are looking to train large language models, fine-tune models on GPUs, or if you are not a developer and need a ready-to-use application.

local-LLM-inference text-generation developer-tools high-performance-computing edge-AI
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 18 / 25

How are scores calculated?

Stars

2,119

Forks

135

Language

Mojo

License

MIT

Last pushed

Feb 09, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/tairov/llama2.mojo"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.