Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization

Colin Unger; Zhihao Jia; Wei Wu; Sina Lin; Mandeep Baines; Carlos Efrain Quintero Narvaez; Vinay Ramakrishnaiah; Nirmal Prajapati; Pat McCormick; Jamaludin Mohd-Yusof; Xi Luo; Dheevatsa Mudigere; Jongsoo Park; Misha Smelyanskiy; Alex Aiken

Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization

Colin Unger, Zhihao Jia, Wei Wu, Sina Lin, Mandeep Baines, Carlos Efrain Quintero Narvaez, Vinay Ramakrishnaiah, Nirmal Prajapati, Pat McCormick, Jamaludin Mohd-Yusof, Xi Luo, Dheevatsa Mudigere, Jongsoo Park, Misha Smelyanskiy, Alex Aiken

Los Alamos National Laboratory

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

29 Scopus citations

Abstract

This paper presents Unity, the first system that jointly optimizes algebraic transformations and parallelization in distributed DNN training. Unity represents both parallelization and algebraic transformations as substitutions on a unified parallel computation graph (PCG), which simultaneously expresses the computation, parallelization, and communication of a distributed DNN training procedure. Optimizations, in the form of graph substitutions, are automatically generated given a list of operator specifications, and are formally verified correct using an automated theorem prover. Unity then uses a novel hierarchical search algorithm to jointly optimize algebraic transformations and parallelization while maintaining scalability. The combination of these techniques provides a generic and extensible approach to optimizing distributed DNN training, capable of integrating new DNN operators, parallelization strategies, and model architectures with minimal manual effort. We evaluate Unity on seven real-world DNNs running on up to 192 GPUs on 32 nodes and show that Unity outperforms existing DNN training frameworks by up to 3.6× while keeping optimization times under 20 minutes. Unity is available to use as part of the open-source DNN training framework FlexFlow at https://github.com/flexflow/flexflow.

Original language	English
Title of host publication	Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022
Publisher	Unknown Publisher
Pages	267-284
Number of pages	18
ISBN (Electronic)	9781939133281
State	Published - 2022
Event	16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022 - Duration: Jan 1 2022 → …

Publication series

Name	Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022

Conference

Conference	16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022
Period	01/1/22 → …

Cite this

Unger, C., Jia, Z., Wu, W., Lin, S., Baines, M., Narvaez, C. E. Q., Ramakrishnaiah, V., Prajapati, N., McCormick, P., Mohd-Yusof, J., Luo, X., Mudigere, D., Park, J., Smelyanskiy, M., & Aiken, A. (2022). Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization. In Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022 (pp. 267-284). (Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022). Unknown Publisher.

Unger, Colin ; Jia, Zhihao ; Wu, Wei et al. / Unity : Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization. Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022. Unknown Publisher, 2022. pp. 267-284 (Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022).

@inproceedings{ca99832ce2d245adb9cfe0cdb94c4b1c,

title = "Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization",

abstract = "This paper presents Unity, the first system that jointly optimizes algebraic transformations and parallelization in distributed DNN training. Unity represents both parallelization and algebraic transformations as substitutions on a unified parallel computation graph (PCG), which simultaneously expresses the computation, parallelization, and communication of a distributed DNN training procedure. Optimizations, in the form of graph substitutions, are automatically generated given a list of operator specifications, and are formally verified correct using an automated theorem prover. Unity then uses a novel hierarchical search algorithm to jointly optimize algebraic transformations and parallelization while maintaining scalability. The combination of these techniques provides a generic and extensible approach to optimizing distributed DNN training, capable of integrating new DNN operators, parallelization strategies, and model architectures with minimal manual effort. We evaluate Unity on seven real-world DNNs running on up to 192 GPUs on 32 nodes and show that Unity outperforms existing DNN training frameworks by up to 3.6× while keeping optimization times under 20 minutes. Unity is available to use as part of the open-source DNN training framework FlexFlow at https://github.com/flexflow/flexflow.",

author = "Colin Unger and Zhihao Jia and Wei Wu and Sina Lin and Mandeep Baines and Narvaez, {Carlos Efrain Quintero} and Vinay Ramakrishnaiah and Nirmal Prajapati and Pat McCormick and Jamaludin Mohd-Yusof and Xi Luo and Dheevatsa Mudigere and Jongsoo Park and Misha Smelyanskiy and Alex Aiken",

year = "2022",

language = "English",

series = "Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022",

publisher = "Unknown Publisher",

pages = "267--284",

booktitle = "Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022",

note = "16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022 ; Conference date: 01-01-2022",

}

Unger, C, Jia, Z, Wu, W, Lin, S, Baines, M, Narvaez, CEQ, Ramakrishnaiah, V, Prajapati, N, McCormick, P, Mohd-Yusof, J, Luo, X, Mudigere, D, Park, J, Smelyanskiy, M & Aiken, A 2022, Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization. in Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022. Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022, Unknown Publisher, pp. 267-284, 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022, 01/1/22.

Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization. / Unger, Colin; Jia, Zhihao; Wu, Wei et al.
Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022. Unknown Publisher, 2022. p. 267-284 (Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Unity

T2 - 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022

AU - Unger, Colin

AU - Jia, Zhihao

AU - Wu, Wei

AU - Lin, Sina

AU - Baines, Mandeep

AU - Narvaez, Carlos Efrain Quintero

AU - Ramakrishnaiah, Vinay

AU - Prajapati, Nirmal

AU - McCormick, Pat

AU - Mohd-Yusof, Jamaludin

AU - Luo, Xi

AU - Mudigere, Dheevatsa

AU - Park, Jongsoo

AU - Smelyanskiy, Misha

AU - Aiken, Alex

PY - 2022

Y1 - 2022

N2 - This paper presents Unity, the first system that jointly optimizes algebraic transformations and parallelization in distributed DNN training. Unity represents both parallelization and algebraic transformations as substitutions on a unified parallel computation graph (PCG), which simultaneously expresses the computation, parallelization, and communication of a distributed DNN training procedure. Optimizations, in the form of graph substitutions, are automatically generated given a list of operator specifications, and are formally verified correct using an automated theorem prover. Unity then uses a novel hierarchical search algorithm to jointly optimize algebraic transformations and parallelization while maintaining scalability. The combination of these techniques provides a generic and extensible approach to optimizing distributed DNN training, capable of integrating new DNN operators, parallelization strategies, and model architectures with minimal manual effort. We evaluate Unity on seven real-world DNNs running on up to 192 GPUs on 32 nodes and show that Unity outperforms existing DNN training frameworks by up to 3.6× while keeping optimization times under 20 minutes. Unity is available to use as part of the open-source DNN training framework FlexFlow at https://github.com/flexflow/flexflow.

AB - This paper presents Unity, the first system that jointly optimizes algebraic transformations and parallelization in distributed DNN training. Unity represents both parallelization and algebraic transformations as substitutions on a unified parallel computation graph (PCG), which simultaneously expresses the computation, parallelization, and communication of a distributed DNN training procedure. Optimizations, in the form of graph substitutions, are automatically generated given a list of operator specifications, and are formally verified correct using an automated theorem prover. Unity then uses a novel hierarchical search algorithm to jointly optimize algebraic transformations and parallelization while maintaining scalability. The combination of these techniques provides a generic and extensible approach to optimizing distributed DNN training, capable of integrating new DNN operators, parallelization strategies, and model architectures with minimal manual effort. We evaluate Unity on seven real-world DNNs running on up to 192 GPUs on 32 nodes and show that Unity outperforms existing DNN training frameworks by up to 3.6× while keeping optimization times under 20 minutes. Unity is available to use as part of the open-source DNN training framework FlexFlow at https://github.com/flexflow/flexflow.

UR - http://www.scopus.com/inward/record.url?scp=85140355399&partnerID=8YFLogxK

M3 - Conference contribution

T3 - Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022

SP - 267

EP - 284

BT - Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022

PB - Unknown Publisher

Y2 - 1 January 2022

ER -

Unger C, Jia Z, Wu W, Lin S, Baines M, Narvaez CEQ et al. Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization. In Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022. Unknown Publisher. 2022. p. 267-284. (Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2022).

Unity: Accelerating DNN Training Through Joint Optimization of Algebraic Transformations and Parallelization

Abstract

Publication series

Conference

Fingerprint

Cite this