abhisheknair10/llama3.cu

Lightweight Llama 3 8B Inference Engine in CUDA C

36
/ 100
Emerging

This project helps developers integrate Llama 3 8B large language models into their applications. It takes pre-trained Llama 3 8B model weights (from HuggingFace) as input and provides an efficient CUDA-native engine for generating text. It's designed for machine learning engineers or AI developers who need to deploy Llama 3 models on Nvidia GPUs.

No commits in the last 6 months.

Use this if you are a developer looking for a lightweight, high-performance inference engine to run Llama 3 8B models on a CUDA-enabled GPU.

Not ideal if you are an end-user without programming experience or do not have access to a high-end Nvidia GPU with at least 24GB VRAM.

LLM deployment GPU inference natural language generation AI development machine learning engineering
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 12 / 25

How are scores calculated?

Stars

54

Forks

7

Language

Cuda

License

MIT

Last pushed

Mar 21, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/abhisheknair10/llama3.cu"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.