tenstorrent/tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.

/ 100

Verified

This project offers a Python and C++ library for deploying and running large language models (LLMs) and other neural networks on Tenstorrent AI hardware. It takes trained models and efficiently runs them, providing metrics like tokens per second and first token generation time. AI infrastructure engineers and machine learning practitioners who need to deploy and optimize neural network inference on Tenstorrent hardware would use this.

1,379 stars. Actively maintained with 1,025 commits in the last 30 days.

Use this if you are a machine learning engineer or researcher focused on deploying and optimizing the performance of large AI models, especially LLMs and Whisper, on Tenstorrent's AI accelerators.

Not ideal if you are looking for a general-purpose machine learning framework or if your primary interest is in training models on other hardware platforms.

AI inference large language models neural network deployment AI hardware optimization machine learning engineering

No Package No Dependents

Maintenance 22 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 25 / 25

How are scores calculated?

Stars

1,379

Forks

375

Language

C++

License

Apache-2.0

Related models

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

sgl-project/sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

alibaba/MNN

MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...

xorbitsai/inference

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...

tensorzero/tensorzero

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...

Explore Transformer Models

All categories Trending Transformer directory Insights