frost-beta/llama2-high-level-cpp

Inference Llama2 with High-Level C++.

/ 100

Experimental

This project helps C++ developers understand how large language models (specifically Llama 2) work at a fundamental level. It takes a pre-trained Llama 2 model (represented as simple text files) and processes it to generate text. The primary users are C++ programmers who want to learn the internal mechanics of LLM inference in a clear, abstract way.

No commits in the last 6 months.

Use this if you are a C++ developer seeking an educational, high-level code example to learn the core principles of Llama 2 model inference.

Not ideal if you need to run large Llama 2 models, require a production-ready inference solution, or are not a C++ programmer.

C++ programming LLM development Model understanding Educational code Language model inference

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

License

MIT

Higher-rated alternatives

beehive-lab/GPULlama3.java

GPU-accelerated Llama3.java inference in pure Java using TornadoVM.

gitkaz/mlx_gguf_server

This is a FastAPI based LLM server. Load multiple LLM models (MLX or llama.cpp) simultaneously...

srgtuszy/llama-cpp-swift

Swift bindings for llama-cpp library

JackZeng0208/llama.cpp-android-tutorial

llama.cpp tutorial on Android phone

awinml/llama-cpp-python-bindings

Run fast LLM Inference using Llama.cpp in Python

Explore Transformer Models

All categories Trending Transformer directory Insights