Code modernization strategies for short-range non-bonded molecular dynamics simulations

James Vance; Zhen Hao Xu; Nikita Tretyakov; Torsten Stuehn; Markus Rampp; Sebastian Eibl; Christoph Junghans; André Brinkmann

doi:10.1016/j.cpc.2023.108760

Code modernization strategies for short-range non-bonded molecular dynamics simulations

James Vance, Zhen Hao Xu, Nikita Tretyakov, Torsten Stuehn, Markus Rampp, Sebastian Eibl, Christoph Junghans, André Brinkmann

Los Alamos National Laboratory

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

Modern HPC systems are increasingly relying on greater core counts and wider vector registers. Thus, applications need to be adapted to fully utilize these hardware capabilities. One class of applications that can benefit from this increase in parallelism are molecular dynamics simulations. In this paper, we describe our efforts at modernizing the ESPResSo++ simulation package for molecular dynamics by restructuring its particle data layout for efficient memory accesses and applying vectorization techniques to benefit the calculation of short-range non-bonded forces, which results in an overall three times speedup and serves as a baseline for further optimizations. We also implement fine-grained parallelism for multi-core CPUs through HPX, a C++ runtime system which uses lightweight threads and an asynchronous many-task approach to maximize concurrency. Our goal is to evaluate the performance of an HPX-based approach compared to the bulk-synchronous MPI-based implementation. This requires the introduction of an additional layer to the domain decomposition scheme that defines the task granularity. On spatially inhomogeneous systems, which impose a corresponding load-imbalance in traditional MPI-based approaches, we demonstrate that by choosing an optimal task size, the efficient work-stealing mechanisms of HPX can overcome the overhead of communication resulting in an overall 1.4 times speedup compared to the baseline MPI version.

Original language	English
Article number	108760
Journal	Computer Physics Communications
Volume	290
DOIs	https://doi.org/10.1016/j.cpc.2023.108760
State	Published - Sep 2023

Access to Document

10.1016/j.cpc.2023.108760

Cite this

@article{0eff336978dc43d88e453beff09a2f04,

title = "Code modernization strategies for short-range non-bonded molecular dynamics simulations",

abstract = "Modern HPC systems are increasingly relying on greater core counts and wider vector registers. Thus, applications need to be adapted to fully utilize these hardware capabilities. One class of applications that can benefit from this increase in parallelism are molecular dynamics simulations. In this paper, we describe our efforts at modernizing the ESPResSo++ simulation package for molecular dynamics by restructuring its particle data layout for efficient memory accesses and applying vectorization techniques to benefit the calculation of short-range non-bonded forces, which results in an overall three times speedup and serves as a baseline for further optimizations. We also implement fine-grained parallelism for multi-core CPUs through HPX, a C++ runtime system which uses lightweight threads and an asynchronous many-task approach to maximize concurrency. Our goal is to evaluate the performance of an HPX-based approach compared to the bulk-synchronous MPI-based implementation. This requires the introduction of an additional layer to the domain decomposition scheme that defines the task granularity. On spatially inhomogeneous systems, which impose a corresponding load-imbalance in traditional MPI-based approaches, we demonstrate that by choosing an optimal task size, the efficient work-stealing mechanisms of HPX can overcome the overhead of communication resulting in an overall 1.4 times speedup compared to the baseline MPI version.",

author = "James Vance and Xu, {Zhen Hao} and Nikita Tretyakov and Torsten Stuehn and Markus Rampp and Sebastian Eibl and Christoph Junghans and Andr{\'e} Brinkmann",

year = "2023",

month = sep,

doi = "10.1016/j.cpc.2023.108760",

language = "English",

volume = "290",

journal = "Computer Physics Communications",

issn = "0010-4655",

publisher = "Elsevier",

}

TY - JOUR

T1 - Code modernization strategies for short-range non-bonded molecular dynamics simulations

AU - Vance, James

AU - Xu, Zhen Hao

AU - Tretyakov, Nikita

AU - Stuehn, Torsten

AU - Rampp, Markus

AU - Eibl, Sebastian

AU - Junghans, Christoph

AU - Brinkmann, André

PY - 2023/9

Y1 - 2023/9

N2 - Modern HPC systems are increasingly relying on greater core counts and wider vector registers. Thus, applications need to be adapted to fully utilize these hardware capabilities. One class of applications that can benefit from this increase in parallelism are molecular dynamics simulations. In this paper, we describe our efforts at modernizing the ESPResSo++ simulation package for molecular dynamics by restructuring its particle data layout for efficient memory accesses and applying vectorization techniques to benefit the calculation of short-range non-bonded forces, which results in an overall three times speedup and serves as a baseline for further optimizations. We also implement fine-grained parallelism for multi-core CPUs through HPX, a C++ runtime system which uses lightweight threads and an asynchronous many-task approach to maximize concurrency. Our goal is to evaluate the performance of an HPX-based approach compared to the bulk-synchronous MPI-based implementation. This requires the introduction of an additional layer to the domain decomposition scheme that defines the task granularity. On spatially inhomogeneous systems, which impose a corresponding load-imbalance in traditional MPI-based approaches, we demonstrate that by choosing an optimal task size, the efficient work-stealing mechanisms of HPX can overcome the overhead of communication resulting in an overall 1.4 times speedup compared to the baseline MPI version.

AB - Modern HPC systems are increasingly relying on greater core counts and wider vector registers. Thus, applications need to be adapted to fully utilize these hardware capabilities. One class of applications that can benefit from this increase in parallelism are molecular dynamics simulations. In this paper, we describe our efforts at modernizing the ESPResSo++ simulation package for molecular dynamics by restructuring its particle data layout for efficient memory accesses and applying vectorization techniques to benefit the calculation of short-range non-bonded forces, which results in an overall three times speedup and serves as a baseline for further optimizations. We also implement fine-grained parallelism for multi-core CPUs through HPX, a C++ runtime system which uses lightweight threads and an asynchronous many-task approach to maximize concurrency. Our goal is to evaluate the performance of an HPX-based approach compared to the bulk-synchronous MPI-based implementation. This requires the introduction of an additional layer to the domain decomposition scheme that defines the task granularity. On spatially inhomogeneous systems, which impose a corresponding load-imbalance in traditional MPI-based approaches, we demonstrate that by choosing an optimal task size, the efficient work-stealing mechanisms of HPX can overcome the overhead of communication resulting in an overall 1.4 times speedup compared to the baseline MPI version.

UR - http://www.scopus.com/inward/record.url?scp=85159766814&partnerID=8YFLogxK

U2 - 10.1016/j.cpc.2023.108760

DO - 10.1016/j.cpc.2023.108760

M3 - Article

SN - 0010-4655

VL - 290

JO - Computer Physics Communications

JF - Computer Physics Communications

M1 - 108760

ER -

Code modernization strategies for short-range non-bonded molecular dynamics simulations

Abstract

Access to Document

Fingerprint

Cite this