Parthenon—a performance portable block-structured adaptive mesh refinement framework

Philipp Grete; Joshua C. Dolence; Jonah M. Miller; Joshua Brown; Ben Ryan; Andrew Gaspar; Forrest Glines; Sriram Swaminarayan; Jonas Lippuner; Clell J. Solomon; Galen Shipman; Christoph Junghans; Daniel Holladay; James M. Stone; Luke F. Roberts

doi:10.1177/10943420221143775

Parthenon—a performance portable block-structured adaptive mesh refinement framework

Philipp Grete, Joshua C. Dolence, Jonah M. Miller, Joshua Brown, Ben Ryan, Andrew Gaspar, Forrest Glines, Sriram Swaminarayan, Jonas Lippuner, Clell J. Solomon, Galen Shipman, Christoph Junghans, Daniel Holladay, James M. Stone, Luke F. Roberts

Los Alamos National Laboratory

Research output: Contribution to journal › Article › peer-review

5 Scopus citations

Abstract

On the path to exascale the landscape of computer device architectures and corresponding programming models has become much more diverse. While various low-level performance portable programming models are available, support at the application level lacks behind. To address this issue, we present the performance portable block-structured adaptive mesh refinement (AMR) framework Parthenon, derived from the well-tested and widely used Athena++ astrophysical magnetohydrodynamics code, but generalized to serve as the foundation for a variety of downstream multi-physics codes. Parthenon adopts the Kokkos programming model, and provides various levels of abstractions from multidimensional variables, to packages defining and separating components, to launching of parallel compute kernels. Parthenon allocates all data in device memory to reduce data movement, supports the logical packing of variables and mesh blocks to reduce kernel launch overhead, and employs one-sided, asynchronous MPI calls to reduce communication overhead in multi-node simulations. Using a hydrodynamics miniapp, we demonstrate weak and strong scaling on various architectures including AMD and NVIDIA GPUs, Intel and AMD x86 CPUs, IBM Power9 CPUs, as well as Fujitsu A64FX CPUs. At the largest scale on Frontier (the first TOP500 exascale machine), the miniapp reaches a total of 1.7 × 10¹³ zone-cycles/s on 9216 nodes (73,728 logical GPUs) at (Formula presented.) weak scaling parallel efficiency (starting from a single node). In combination with being an open, collaborative project, this makes Parthenon an ideal framework to target exascale simulations in which the downstream developers can focus on their specific application rather than on the complexity of handling massively-parallel, device-accelerated AMR.

Original language	English
Pages (from-to)	465-486
Number of pages	22
Journal	International Journal of High Performance Computing Applications
Volume	37
Issue number	5
DOIs	https://doi.org/10.1177/10943420221143775
State	Published - Sep 2023

Access to Document

10.1177/10943420221143775

Cite this

Grete, P., Dolence, J. C., Miller, J. M., Brown, J., Ryan, B., Gaspar, A., Glines, F., Swaminarayan, S., Lippuner, J., Solomon, C. J., Shipman, G., Junghans, C., Holladay, D., Stone, J. M., & Roberts, L. F. (2023). Parthenon—a performance portable block-structured adaptive mesh refinement framework. International Journal of High Performance Computing Applications, 37(5), 465-486. https://doi.org/10.1177/10943420221143775

@article{7a22849f64184284ac4e0451c8b9e0fe,

title = "Parthenon—a performance portable block-structured adaptive mesh refinement framework",

abstract = "On the path to exascale the landscape of computer device architectures and corresponding programming models has become much more diverse. While various low-level performance portable programming models are available, support at the application level lacks behind. To address this issue, we present the performance portable block-structured adaptive mesh refinement (AMR) framework Parthenon, derived from the well-tested and widely used Athena++ astrophysical magnetohydrodynamics code, but generalized to serve as the foundation for a variety of downstream multi-physics codes. Parthenon adopts the Kokkos programming model, and provides various levels of abstractions from multidimensional variables, to packages defining and separating components, to launching of parallel compute kernels. Parthenon allocates all data in device memory to reduce data movement, supports the logical packing of variables and mesh blocks to reduce kernel launch overhead, and employs one-sided, asynchronous MPI calls to reduce communication overhead in multi-node simulations. Using a hydrodynamics miniapp, we demonstrate weak and strong scaling on various architectures including AMD and NVIDIA GPUs, Intel and AMD x86 CPUs, IBM Power9 CPUs, as well as Fujitsu A64FX CPUs. At the largest scale on Frontier (the first TOP500 exascale machine), the miniapp reaches a total of 1.7 × 1013 zone-cycles/s on 9216 nodes (73,728 logical GPUs) at (Formula presented.) weak scaling parallel efficiency (starting from a single node). In combination with being an open, collaborative project, this makes Parthenon an ideal framework to target exascale simulations in which the downstream developers can focus on their specific application rather than on the complexity of handling massively-parallel, device-accelerated AMR.",

author = "Philipp Grete and Dolence, {Joshua C.} and Miller, {Jonah M.} and Joshua Brown and Ben Ryan and Andrew Gaspar and Forrest Glines and Sriram Swaminarayan and Jonas Lippuner and Solomon, {Clell J.} and Galen Shipman and Christoph Junghans and Daniel Holladay and Stone, {James M.} and Roberts, {Luke F.}",

year = "2023",

month = sep,

doi = "10.1177/10943420221143775",

language = "English",

volume = "37",

pages = "465--486",

journal = "International Journal of High Performance Computing Applications",

issn = "1094-3420",

publisher = "SAGE Publications Inc.",

number = "5",

}

Grete, P, Dolence, JC, Miller, JM, Brown, J, Ryan, B, Gaspar, A, Glines, F, Swaminarayan, S, Lippuner, J, Solomon, CJ, Shipman, G , Junghans, C, Holladay, D, Stone, JM & Roberts, LF 2023, 'Parthenon—a performance portable block-structured adaptive mesh refinement framework', International Journal of High Performance Computing Applications, vol. 37, no. 5, pp. 465-486. https://doi.org/10.1177/10943420221143775

TY - JOUR

T1 - Parthenon—a performance portable block-structured adaptive mesh refinement framework

AU - Grete, Philipp

AU - Dolence, Joshua C.

AU - Miller, Jonah M.

AU - Brown, Joshua

AU - Ryan, Ben

AU - Gaspar, Andrew

AU - Glines, Forrest

AU - Swaminarayan, Sriram

AU - Lippuner, Jonas

AU - Solomon, Clell J.

AU - Shipman, Galen

AU - Junghans, Christoph

AU - Holladay, Daniel

AU - Stone, James M.

AU - Roberts, Luke F.

PY - 2023/9

Y1 - 2023/9

N2 - On the path to exascale the landscape of computer device architectures and corresponding programming models has become much more diverse. While various low-level performance portable programming models are available, support at the application level lacks behind. To address this issue, we present the performance portable block-structured adaptive mesh refinement (AMR) framework Parthenon, derived from the well-tested and widely used Athena++ astrophysical magnetohydrodynamics code, but generalized to serve as the foundation for a variety of downstream multi-physics codes. Parthenon adopts the Kokkos programming model, and provides various levels of abstractions from multidimensional variables, to packages defining and separating components, to launching of parallel compute kernels. Parthenon allocates all data in device memory to reduce data movement, supports the logical packing of variables and mesh blocks to reduce kernel launch overhead, and employs one-sided, asynchronous MPI calls to reduce communication overhead in multi-node simulations. Using a hydrodynamics miniapp, we demonstrate weak and strong scaling on various architectures including AMD and NVIDIA GPUs, Intel and AMD x86 CPUs, IBM Power9 CPUs, as well as Fujitsu A64FX CPUs. At the largest scale on Frontier (the first TOP500 exascale machine), the miniapp reaches a total of 1.7 × 1013 zone-cycles/s on 9216 nodes (73,728 logical GPUs) at (Formula presented.) weak scaling parallel efficiency (starting from a single node). In combination with being an open, collaborative project, this makes Parthenon an ideal framework to target exascale simulations in which the downstream developers can focus on their specific application rather than on the complexity of handling massively-parallel, device-accelerated AMR.

AB - On the path to exascale the landscape of computer device architectures and corresponding programming models has become much more diverse. While various low-level performance portable programming models are available, support at the application level lacks behind. To address this issue, we present the performance portable block-structured adaptive mesh refinement (AMR) framework Parthenon, derived from the well-tested and widely used Athena++ astrophysical magnetohydrodynamics code, but generalized to serve as the foundation for a variety of downstream multi-physics codes. Parthenon adopts the Kokkos programming model, and provides various levels of abstractions from multidimensional variables, to packages defining and separating components, to launching of parallel compute kernels. Parthenon allocates all data in device memory to reduce data movement, supports the logical packing of variables and mesh blocks to reduce kernel launch overhead, and employs one-sided, asynchronous MPI calls to reduce communication overhead in multi-node simulations. Using a hydrodynamics miniapp, we demonstrate weak and strong scaling on various architectures including AMD and NVIDIA GPUs, Intel and AMD x86 CPUs, IBM Power9 CPUs, as well as Fujitsu A64FX CPUs. At the largest scale on Frontier (the first TOP500 exascale machine), the miniapp reaches a total of 1.7 × 1013 zone-cycles/s on 9216 nodes (73,728 logical GPUs) at (Formula presented.) weak scaling parallel efficiency (starting from a single node). In combination with being an open, collaborative project, this makes Parthenon an ideal framework to target exascale simulations in which the downstream developers can focus on their specific application rather than on the complexity of handling massively-parallel, device-accelerated AMR.

UR - http://www.scopus.com/inward/record.url?scp=85144188348&partnerID=8YFLogxK

U2 - 10.1177/10943420221143775

DO - 10.1177/10943420221143775

M3 - Article

SN - 1094-3420

VL - 37

SP - 465

EP - 486

JO - International Journal of High Performance Computing Applications

JF - International Journal of High Performance Computing Applications

IS - 5

ER -

Parthenon—a performance portable block-structured adaptive mesh refinement framework

Abstract

Access to Document

Fingerprint

Cite this