tpoisonooo/llama.onnx
LLaMa/RWKV onnx models, quantization and testcase
This project helps machine learning engineers and researchers by providing optimized versions of large language models (LLMs) like LLaMa and RWKV. It takes existing models and converts them into the ONNX format, making them easier to deploy on various hardware. The output is a highly portable and efficient model ready for inference, even on devices with limited memory.
366 stars. No commits in the last 6 months.
Use this if you need to run LLaMa or RWKV language models efficiently on diverse hardware, including embedded devices or systems with limited computational resources, without needing PyTorch or Hugging Face Transformers during inference.
Not ideal if you are looking for a high-level API for model training or fine-tuning, as this project focuses on optimizing existing models for deployment rather than development.
Stars
366
Forks
29
Language
Python
License
GPL-3.0
Category
Last pushed
Jul 06, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/tpoisonooo/llama.onnx"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
hkproj/pytorch-llama
LLaMA 2 implemented from scratch in PyTorch
4AI/LS-LLaMA
A Simple but Powerful SOTA NER Model | Official Code For Label Supervised LLaMA Finetuning
luchangli03/export_llama_to_onnx
export llama to onnx
ayaka14732/llama-2-jax
JAX implementation of the Llama 2 model
harleyszhang/lite_llama
A light llama-like llm inference framework based on the triton kernel.