CAPTURE: Memory-Centric Partitioning for Distributed DNN Training with Hybrid Parallelism

H. Dreuning; K. Verstoep; H.E. Bal; R.V. van Nieuwpoort

doi:https://doi.org/10.1109/HiPC58850.2023.00023

CAPTURE: Memory-Centric Partitioning for Distributed DNN Training with Hybrid Parallelism

Authors	H. Dreuning K. Verstoep H.E. Bal R.V. van Nieuwpoort
Publication date	2023
Book title	2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics
Book subtitle	HiPC 2023 : 18-21 December 2023, Goa, India : proceedings
ISBN	9798350383232
ISBN (electronic)	9798350383225
Event	30th Annual IEEE International Conference on High Performance Computing, Data, and Analytics, HiPC 2023
Pages (from-to)	76-86
Number of pages	11
Publisher	Piscataway, NJ: IEEE
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Deep Learning (DL) model sizes are increasing at a rapid pace, as larger models typically offer better statistical performance. Modern Large Language Models (LLMs) and image processing models contain billions of trainable parameters. Training such massive neural networks incurs significant memory requirements and financial cost. Hybrid-parallel training approaches have emerged that combine pipelining with data and tensor parallelism to facilitate the training of large DL models on distributed hardware setups. However, existing approaches to design a hybrid-parallel partitioning and parallelization plan for DL models focus on achieving high throughput and not on minimizing memory usage and financial cost. We introduce CAPTURE, a partitioning and parallelization approach for hybrid parallelism that minimizes peak memory usage. CAPTURE combines a profiling-based approach with statistical modeling to recommend a partitioning and parallelization plan that minimizes the peak memory usage across all the Graphics Processing Units (GPUs) in the hardware setup. Our results show a reduction in memory usage of up to 43.9% compared to partitioners in state-of-the-art hybrid-parallel training systems. The reduced memory footprint enables the training of larger DL models on the same hardware resources and training with larger batch sizes. CAPTURE can also train a given model on a smaller hardware setup than other approaches, reducing the financial cost of training massive DL models.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1109/HiPC58850.2023.00023
Other links	https://www.proceedings.com/74077.html https://www.scopus.com/pages/publications/85190604352
Downloads	CAPTURE_Memory-Centric_Partitioning_for_Distributed_DNN_Training_with_Hybrid_Parallelism (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

CAPTURE: Memory-Centric Partitioning for Distributed DNN Training with Hybrid Parallelism