tpoisonooo/llama.onnx

LLaMa/RWKV onnx models, quantization and testcase

/ 100

Emerging

This project helps machine learning engineers and researchers by providing optimized versions of large language models (LLMs) like LLaMa and RWKV. It takes existing models and converts them into the ONNX format, making them easier to deploy on various hardware. The output is a highly portable and efficient model ready for inference, even on devices with limited memory.

366 stars. No commits in the last 6 months.

Use this if you need to run LLaMa or RWKV language models efficiently on diverse hardware, including embedded devices or systems with limited computational resources, without needing PyTorch or Hugging Face Transformers during inference.

Not ideal if you are looking for a high-level API for model training or fine-tuning, as this project focuses on optimizing existing models for deployment rather than development.

Machine Learning Deployment Edge AI Large Language Models Model Optimization AI Inference

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 14 / 25

How are scores calculated?

Stars

366

Forks

Language

Python

License

GPL-3.0

Higher-rated alternatives

hkproj/pytorch-llama

LLaMA 2 implemented from scratch in PyTorch

4AI/LS-LLaMA

A Simple but Powerful SOTA NER Model | Official Code For Label Supervised LLaMA Finetuning

luchangli03/export_llama_to_onnx

export llama to onnx

ayaka14732/llama-2-jax

JAX implementation of the Llama 2 model

harleyszhang/lite_llama

A light llama-like llm inference framework based on the triton kernel.

Explore Transformer Models

All categories Trending Transformer directory Insights