tpoisonooo/llama.onnx

LLaMa/RWKV onnx models, quantization and testcase

40
/ 100
Emerging

This project helps machine learning engineers and researchers by providing optimized versions of large language models (LLMs) like LLaMa and RWKV. It takes existing models and converts them into the ONNX format, making them easier to deploy on various hardware. The output is a highly portable and efficient model ready for inference, even on devices with limited memory.

366 stars. No commits in the last 6 months.

Use this if you need to run LLaMa or RWKV language models efficiently on diverse hardware, including embedded devices or systems with limited computational resources, without needing PyTorch or Hugging Face Transformers during inference.

Not ideal if you are looking for a high-level API for model training or fine-tuning, as this project focuses on optimizing existing models for deployment rather than development.

Machine Learning Deployment Edge AI Large Language Models Model Optimization AI Inference
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 14 / 25

How are scores calculated?

Stars

366

Forks

29

Language

Python

License

GPL-3.0

Last pushed

Jul 06, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/tpoisonooo/llama.onnx"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.