npuichigo/openai_trtllm

OpenAI compatible API for TensorRT LLM triton backend

/ 100

Emerging

This project helps developers integrate custom large language models (LLMs) deployed with NVIDIA's TensorRT-LLM and Triton Inference Server into applications that expect an OpenAI-compatible API. It takes your deployed TensorRT-LLM model and makes it accessible as if it were an OpenAI model, outputting generated text responses. AI/ML engineers and application developers who build with LLMs will find this useful.

219 stars. No commits in the last 6 months.

Use this if you need to expose your high-performance TensorRT-LLM deployments through an OpenAI-like API, especially when integrating with tools like LangChain that are designed for OpenAI's interface.

Not ideal if you are looking for a pre-trained LLM or do not have a TensorRT-LLM model already deployed with Triton Inference Server.

LLM deployment AI model serving API integration AI application development Machine Learning Engineering

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

219

Forks

Language

Rust

License

MIT

Higher-rated alternatives

trymirai/uzu

A high-performance inference engine for AI models

justrach/bhumi

⚡ Bhumi – The fastest AI inference client for Python, built with Rust for unmatched speed,...

lipish/llm-connector

LLM Connector - A unified interface for connecting to various Large Language Model providers

keyvank/femtoGPT

Pure Rust implementation of a minimal Generative Pretrained Transformer

ShelbyJenkins/llm_client

The Easiest Rust Interface for Local LLMs and an Interface for Deterministic Signals from...

Explore LLM Tools

All categories Trending LLM Tool directory Insights