luchangli03/export_llama_to_onnx

export llama to onnx

/ 100

Emerging

This tool helps machine learning engineers and MLOps professionals convert large language models (LLMs) like LLaMA, Qwen, and ChatGLM into ONNX format for efficient deployment. You provide your existing Hugging Face LLM model files, and it produces optimized ONNX files. This is used by anyone looking to deploy LLMs more efficiently in production environments.

135 stars. No commits in the last 6 months.

Use this if you need to optimize and standardize your trained LLMs (like LLaMA, Qwen, ChatGLM, Gemma, Bloom) into ONNX format for faster inference and easier deployment across various platforms.

Not ideal if you are looking for a tool to train LLMs or if you do not have a technical understanding of model deployment and ONNX.

LLM deployment model optimization machine learning operations AI inference deep learning engineering

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 16 / 25

How are scores calculated?

Stars

135

Forks

Language

Python

License

MIT

Higher-rated alternatives

hkproj/pytorch-llama

LLaMA 2 implemented from scratch in PyTorch

4AI/LS-LLaMA

A Simple but Powerful SOTA NER Model | Official Code For Label Supervised LLaMA Finetuning

ayaka14732/llama-2-jax

JAX implementation of the Llama 2 model

harleyszhang/lite_llama

A light llama-like llm inference framework based on the triton kernel.

liangyuwang/zo2

ZO2 (Zeroth-Order Offloading): Full Parameter Fine-Tuning 175B LLMs with 18GB GPU Memory [COLM2025]

Explore Transformer Models

All categories Trending Transformer directory Insights