abhisheknair10/llama3.cu
Lightweight Llama 3 8B Inference Engine in CUDA C
This project helps developers integrate Llama 3 8B large language models into their applications. It takes pre-trained Llama 3 8B model weights (from HuggingFace) as input and provides an efficient CUDA-native engine for generating text. It's designed for machine learning engineers or AI developers who need to deploy Llama 3 models on Nvidia GPUs.
No commits in the last 6 months.
Use this if you are a developer looking for a lightweight, high-performance inference engine to run Llama 3 8B models on a CUDA-enabled GPU.
Not ideal if you are an end-user without programming experience or do not have access to a high-end Nvidia GPU with at least 24GB VRAM.
Stars
54
Forks
7
Language
Cuda
License
MIT
Category
Last pushed
Mar 21, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/abhisheknair10/llama3.cu"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ludwig-ai/ludwig
Low-code framework for building custom LLMs, neural networks, and other AI models
withcatai/node-llama-cpp
Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema...
mudler/LocalAI
:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and...
zhudotexe/kani
kani (カニ) is a highly hackable microframework for tool-calling language models. (NLP-OSS @ EMNLP 2023)
SciSharp/LLamaSharp
A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.