BlackSamorez/tensor_parallel

Automatically split your PyTorch models on multiple GPUs for training & inference

/ 100

Established

This project helps machine learning practitioners or researchers run very large AI models, like advanced language models, across multiple GPUs more efficiently. It takes an existing PyTorch model that might be too big for a single GPU and automatically splits its components, allowing each GPU to work on a part of the model simultaneously. The result is faster training or inference for these massive models.

656 stars. Used by 1 other package. No commits in the last 6 months. Available on PyPI.

Use this if your PyTorch model is too large to fit on a single GPU or if you need to significantly speed up its training or inference using multiple GPUs.

Not ideal if you are working with smaller models that already fit comfortably on a single GPU or if you are conducting million-dollar-scale distributed training runs that require highly specialized and optimized infrastructure like DeepSpeed or Alpa.

large-language-models deep-learning AI-model-training GPU-acceleration machine-learning-research

Stale 6m

Maintenance 0 / 25

Adoption 11 / 25

Maturity 25 / 25

Community 14 / 25

How are scores calculated?

Stars

656

Forks

Language

Python

License

MIT

Related frameworks

pymc-devs/pytensor

PyTensor allows you to define, optimize, and efficiently evaluate mathematical expressions...

arogozhnikov/einops

Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)

lava-nc/lava-dl

Deep Learning library for Lava

tensorly/tensorly

TensorLy: Tensor Learning in Python.

tensorpack/tensorpack

A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

Explore ML Frameworks

All categories Trending ML Framework directory Insights