abhisheknair10/llama3.cu

Lightweight Llama 3 8B Inference Engine in CUDA C

/ 100

Emerging

This project helps developers integrate Llama 3 8B large language models into their applications. It takes pre-trained Llama 3 8B model weights (from HuggingFace) as input and provides an efficient CUDA-native engine for generating text. It's designed for machine learning engineers or AI developers who need to deploy Llama 3 models on Nvidia GPUs.

No commits in the last 6 months.

Use this if you are a developer looking for a lightweight, high-performance inference engine to run Llama 3 8B models on a CUDA-enabled GPU.

Not ideal if you are an end-user without programming experience or do not have access to a high-end Nvidia GPU with at least 24GB VRAM.

LLM deployment GPU inference natural language generation AI development machine learning engineering

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Cuda

License

MIT

Higher-rated alternatives

ludwig-ai/ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models

withcatai/node-llama-cpp

Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema...

mudler/LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and...

zhudotexe/kani

kani (カニ) is a highly hackable microframework for tool-calling language models. (NLP-OSS @ EMNLP 2023)

SciSharp/LLamaSharp

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.

Explore Transformer Models

All categories Trending Transformer directory Insights