tenstorrent/tt-metal
:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
This project offers a Python and C++ library for deploying and running large language models (LLMs) and other neural networks on Tenstorrent AI hardware. It takes trained models and efficiently runs them, providing metrics like tokens per second and first token generation time. AI infrastructure engineers and machine learning practitioners who need to deploy and optimize neural network inference on Tenstorrent hardware would use this.
1,379 stars. Actively maintained with 1,025 commits in the last 30 days.
Use this if you are a machine learning engineer or researcher focused on deploying and optimizing the performance of large AI models, especially LLMs and Whisper, on Tenstorrent's AI accelerators.
Not ideal if you are looking for a general-purpose machine learning framework or if your primary interest is in training models on other hardware platforms.
Stars
1,379
Forks
375
Language
C++
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
1025
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/tenstorrent/tt-metal"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
alibaba/MNN
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...
xorbitsai/inference
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...
tensorzero/tensorzero
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...