- GSPMD separates programming an ML model from parallelization and is capable of scaling most deep learning network architectures
Google AI has launched GSPMD – General and Scalable Parallelization for ML Computation Graphs, to address scaling challenges. GSPMD is capable of scaling most deep learning network architectures and has been applied to many deep learning models which include GShard-M4, BigSSL, LaMDA, ViT, and MetNet-2. GSPMD has also been integrated into multiple ML frameworks, including TensorFlow and JAX, which use XLA as a shared compiler.
The solution separates the task of programming an ML model from the challenge of parallelization. It allows model developers to write programs as if they were run on a single device with very high memory and computation capacity. The user only needs to add a few lines of annotation code to a subset of critical tensors in the model code to indicate how to partition the tensors. With GSPMD, developers may employ different parallelism algorithms for different use cases without the need to reimplement the model.