CAPSlog: Scalable Memory-Centric Partitioning for Pipeline Parallelism

H. Dreuning; A.B. Liokouras; X. Ouyang; H.E. Bal; R.V. van Nieuwpoort

doi:https://doi.org/10.1109/PDP62718.2024.00012

CAPSlog: Scalable Memory-Centric Partitioning for Pipeline Parallelism

Authors	H. Dreuning A.B. Liokouras X. Ouyang H.E. Bal R.V. van Nieuwpoort
Publication date	2024
Host editors	A.E. Chis H. González-Vélez
Book title	2024 32nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing
Book subtitle	PDP 2024 : Dublin, Ireland, 20-22 March 2024 : proceedings
ISBN	9798350363081
ISBN (electronic)	9798350363074
Event	32nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2024
Pages (from-to)	17-25
Number of pages	9
Publisher	Piscataway, NJ: IEEE Computer Society
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Pipeline-parallel training has emerged as a popular method to train large Deep Neural Networks (DNNs), as it allows the use of the combined compute power and memory capacity of multiple Graphics Processing Units (GPUs). However, with the sustaining increase in Deep Learning (DL) model sizes, pipeline parallelism provides only a partial solution to the memory bottleneck in large-scale DNN training. Careful partitioning of the DL model over the available GPUs based on memory usage is required to further alleviate the memory bottleneck and train larger DNNs. mCAP is such a memory-oriented partitioning approach for pipeline parallel systems, but it does not scale to models with many layers and very large hardware setups, as it requires extensive profiling and fails to efficiently navigate the partitioning space to find the most memory-friendly partitioning. In this work, we propose CAPSlog, a scalable memory-centric partitioning approach that can recommend model partitionings for larger and more heterogeneous DL models and for larger hardware setups than existing approaches. CAPSlog introduces a new profiling method and a new, much more scalable algorithm for recommending memory-efficient partitionings. CAPSlog reduces the profiling time by 67 % compared to existing approaches, searches the partitioning space for the optimal solution orders of magnitude faster and can train significantly larger models.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1109/PDP62718.2024.00012
Other links	https://www.proceedings.com/74377.html https://www.scopus.com/pages/publications/85191747579
Downloads	CAPSlog_Scalable_Memory-Centric_Partitioning_for_Pipeline_Parallelism (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

CAPSlog: Scalable Memory-Centric Partitioning for Pipeline Parallelism