AI-Hypercomputer/JetStream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

51
/ 100
Established

This project helps machine learning engineers and researchers efficiently run large language models (LLMs) on specialized hardware like Google's TPUs. It takes your trained LLM (built with frameworks like JAX or PyTorch) and outputs predictions or generated text much faster and using less memory, even under heavy demand. It's designed for those who need to deploy and serve LLMs to end-users at scale.

415 stars.

Use this if you are a machine learning engineer or researcher looking to optimize the performance, speed, and memory usage of your large language models when running them on XLA devices such as TPUs, especially for high-throughput serving.

Not ideal if you are a business user or data analyst without a technical background in machine learning deployment and infrastructure, or if you are primarily working with standard CPU or GPU environments without XLA-specific optimization needs.

LLM deployment MLOps AI infrastructure model serving TPU optimization
No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 19 / 25

How are scores calculated?

Stars

415

Forks

58

Language

Python

License

Apache-2.0

Last pushed

Jan 05, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/AI-Hypercomputer/JetStream"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.