fattorib/transformer_shmap

Tensor Parallelism with JAX + Shard Map

/ 100

Experimental

This project helps machine learning engineers efficiently train very large transformer models. It takes a transformer model definition and training data, and outputs a trained model by distributing the computation across multiple accelerators like TPUs or GPUs. This is for machine learning engineers who need to scale up their large language model training.

No commits in the last 6 months.

Use this if you are a machine learning engineer working with JAX and need to train extremely large transformer models efficiently across multiple accelerators.

Not ideal if you are not familiar with JAX or are training smaller models that don't require tensor parallelism.

large-language-models distributed-training model-scaling deep-learning transformer-architectures

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 7 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

ThilinaRajapakse/simpletransformers

Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling,...

jsksxs360/How-to-use-Transformers

Transformers 库快速入门教程

google/deepconsensus

DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences...

Denis2054/Transformers-for-NLP-2nd-Edition

Transformer models from BERT to GPT-4, environments from Hugging Face to OpenAI. Fine-tuning,...

abhimishra91/transformers-tutorials

Github repo with tutorials to fine tune transformers for diff NLP tasks

Explore Transformer Models

All categories Trending Transformer directory Insights