luchangli03/export_llama_to_onnx
export llama to onnx
This tool helps machine learning engineers and MLOps professionals convert large language models (LLMs) like LLaMA, Qwen, and ChatGLM into ONNX format for efficient deployment. You provide your existing Hugging Face LLM model files, and it produces optimized ONNX files. This is used by anyone looking to deploy LLMs more efficiently in production environments.
135 stars. No commits in the last 6 months.
Use this if you need to optimize and standardize your trained LLMs (like LLaMA, Qwen, ChatGLM, Gemma, Bloom) into ONNX format for faster inference and easier deployment across various platforms.
Not ideal if you are looking for a tool to train LLMs or if you do not have a technical understanding of model deployment and ONNX.
Stars
135
Forks
18
Language
Python
License
MIT
Category
Last pushed
Dec 28, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/luchangli03/export_llama_to_onnx"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
hkproj/pytorch-llama
LLaMA 2 implemented from scratch in PyTorch
4AI/LS-LLaMA
A Simple but Powerful SOTA NER Model | Official Code For Label Supervised LLaMA Finetuning
ayaka14732/llama-2-jax
JAX implementation of the Llama 2 model
harleyszhang/lite_llama
A light llama-like llm inference framework based on the triton kernel.
liangyuwang/zo2
ZO2 (Zeroth-Order Offloading): Full Parameter Fine-Tuning 175B LLMs with 18GB GPU Memory [COLM2025]