Parallel Algorithms for Successive Convolution

Christlieb, Andrew J.; Guthrey, Pierson T.; Sands, William A.; Thavappiragasm, Mathialakan

doi:10.1007/s10915-020-01359-x

Parallel Algorithms for Successive Convolution

Published: 08 December 2020

Volume 86, article number 1, (2021)
Cite this article

Journal of Scientific Computing Aims and scope Submit manuscript

Andrew J. Christlieb¹,
Pierson T. Guthrey¹^nAff2,
William A. Sands ORCID: orcid.org/0000-0003-3643-4751¹ &
…
Mathialakan Thavappiragasm¹^nAff3

309 Accesses
4 Citations
Explore all metrics

Abstract

The development of modern computing architectures with ever-increasing amounts of parallelism has allowed for the solution of previously intractable problems across a variety of scientific disciplines. Despite these advances, multiscale computing problems continue to pose an incredible challenge to modern architectures because they require resolving scales that often vary by orders of magnitude in both space and time. Such complications have led us to consider alternative discretizations for partial differential equations (PDEs) which use expansions involving integral operators to approximate spatial derivatives (Christlieb et al. in J Comput Phys 379:214–236, 2019; Christlieb et al. J Sci Comput 82:52(3):1–29, 2020; Christlieb et al. J Comput Phys 415:1–25, 2020). These constructions use explicit information within the integral terms, but treat boundary data implicitly, which contributes to the overall speed of the method. This approach is provably unconditionally stable for linear problems and stability has been demonstrated experimentally for nonlinear problems. Additionally, it is matrix-free in the sense that it is not necessary to invert linear systems and iteration is not required for nonlinear terms. Moreover, the scheme employs a fast summation algorithm that yields a method with a computational complexity of ${\mathcal {O}}(N)$, where N is the number of mesh points along a coordinate direction. While much work has been done to explore the theory behind these methods, their practicality in large scale computing environments is a largely unexplored topic. In this work, we explore the performance of these methods by developing a domain decomposition algorithm suitable for distributed memory systems along with shared memory algorithms. As a first pass, we derive an artificial Courant–Friedrichs–Lewy condition that enforces a nearest-neighbor (N-N) communication pattern and briefly discuss possible generalizations. We also analyze several approaches for implementing the parallel algorithms by optimizing predominant loop structures and maximizing data reuse. Using a hybrid design that employs MPI and Kokkos (Edwards and Trott in J Parallel Distrib Comput 74:3202–3216, 2014) for the distributed and shared memory components of the algorithms, respectively, we show that our methods are efficient and can sustain an update rate $> 1\times 10^8$ DOF/node/s. We provide results that demonstrate the scalability and versatility of our algorithms using several different PDE test problems, including a nonlinear example, which employs an adaptive time-stepping rule.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel Finite Cell Method with Adaptive Geometric Multigrid

PFASST-ER: combining the parallel full approximation scheme in space and time with parallelization across the method

Article Open access 25 September 2020

Parallel Solvers for the Bidomain System

Notes

Note that the update frequency does not account for error in the numerical solution. Certainly, in order to compare the efficiency of various methods, especially those that belong to different classes, one must take into account the quality of the solution. This would be reflected in, for example, an error versus time-to-solution plot.
This is true when the memory space is that of the CPU (host memory). In device memory, these entries will be “coalesced", which is the optimal layout for threading on GPUs. This mapping of indices, between memory spaces, is automatically handled by Kokkos.
We refer to a pencil as a, generally, long rectangle (in 2-D) and a rectangular prism (in 3-D). The use of pencils, as opposed to square blocks, would require additional precomputing efforts and, possibly, restrictions on the problem size.

References

Christlieb, A., Guo, W., Jiang, Y.: A kernel-based high order “explicit” unconditionally stable scheme for time dependent Hamilton–Jacobi equations. J. Comput. Phys. 379, 214–236 (2019)
Article MathSciNet Google Scholar
Christlieb, A., Guo, W., Jiang, Y., Yang, H.: Kernel based high order "explicit" unconditionally-stable scheme for nonlinear degenerate advection-diffusion equations. J. Sci. Comput. 82:52(3), 1–29 (2020)
MathSciNet MATH Google Scholar
Christlieb, A., Sands, W., Yang, H.: A kernel-based explicit unconditionally stable scheme for hamilton-jacobi equations on nonuniform meshes. J. Comput. Phys. 415, 1–25 (2020). Art. No. 109543
Article MathSciNet MATH Google Scholar
Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74, 3202–3216 (2014)
Article Google Scholar
Causley, M., Christlieb, A., Ong, B., Van Groningen, L.: Method of lines transpose: an implicit solution to the wave equation. Math. Comput. 83(290), 2763–2786 (2014)
Article MathSciNet MATH Google Scholar
Causley, M.F., Cho, H., Christlieb, A.J., Seal, D.C.: Method of lines transpose: high order l-stable O(N) schemes for parabolic equations using successive convolution. SIAM J. Numer. Anal. 54(3), 1635–1652 (2016)
Article MathSciNet MATH Google Scholar
Causley, M., Cho, H., Christlieb, A.: Method of lines transpose: energy gradient flows using direct operator inversion for phase-field models. SIAM J. Sci. Comput. 39(5), B968–B992 (2017)
Article MathSciNet MATH Google Scholar
Cheng, Y., Christlieb, A.J., Guo, W., Ong, B.: An asymptotic preserving maxwell solver resulting in the darwin limit of electrodynamics. J. Sci. Comput. 71(3), 959–993 (2017)
Article MathSciNet MATH Google Scholar
Christlieb, A., Guo, W., Jiang, Y.: A weno-based method of lines transpose approach for vlasov simulations. J. Comput. Phys. 327, 337–367 (2016)
Article MathSciNet MATH Google Scholar
Gottlieb, S., Shu, C.-W., Tadmor, E.: Strong stability-preserving high-order time discretization methods. SIAM Rev. 43(1), 89–112 (2001)
Article MathSciNet MATH Google Scholar
Salazar, A., Raydan, M., Campo, A.: Theoretical analysis of the exponential transversal method of lines for the diffusion equation. Numer. Methods Partial Differ. Equ. 16(1), 30–41 (2000)
Article MathSciNet MATH Google Scholar
Schemann, M., Bornemann, F.A.: An adaptive rothe method for the wave equation. Comput. Vis. Sci. 1(3), 137–144 (1998)
Article MATH Google Scholar
Biros, G., Ying, L., Zorin, D.: An embedded boundary integral solver for the unsteady incompressible Navier–Stokes equations (preprint) (2002)
Biros, G., Ying, L., Zorin, D.: A fast solver for the stokes equations with distributed forces in complex geometries. J. Comput. Phys. 193, 317–348 (2004)
Article MathSciNet MATH Google Scholar
Chiu, S.-H., Moore, M.N.J., Quaife, B.: Viscous transport in eroding porous media. J. Fluid Mech. 893, A3 (2020). https://doi.org/10.1017/jfm.2020.228
Article MathSciNet MATH Google Scholar
Kropinski, M.C.A., Quaife, B.D.: Fast integral equation methods for rothe’s method applied to the isotropic heat equation. Comput. Math. Appl. 61, 2436–2446 (2011)
Article MathSciNet MATH Google Scholar
Quaife, B.D., Moore, M.N.J.: A boundary-integral framework to simulate viscous erosion of a porous medium. J. Comput. Phys. 375, 1–21 (2018)
Article MathSciNet MATH Google Scholar
Wang, H., Lei, T., Li, J., Huang, J., Yao, Z.: A parallel fast multipole accelerated integral equation scheme for 3d stokes equations. Int. J. Numer. Meth. Eng. 70, 812–839 (2007)
Article MathSciNet MATH Google Scholar
Ying, L., Biros, G., Zorin, D.: A high-order 3d boundary integral equation solver for elliptic pdes in smooth domains. J. Comput. Phys. 219, 247–275 (2006)
Article MathSciNet MATH Google Scholar
Saad, Y., Schultzn, M.H.: Gmres: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 7, 856–869 (1986)
Article MathSciNet MATH Google Scholar
Bruno, O.P., Lyon, M.: High-order unconditionally stable fc-ad solvers for general smooth domains i. basic elements. J. Comput. Phys. 229, 2009–2033 (2010)
Article MathSciNet MATH Google Scholar
Bruno, O.P., Lyon, M.: High-order unconditionally stable fc-ad solvers for general smooth domains ii. elliptic, parabolic and hyperbolic pdes. theoretical considerations. J. Comput. Phys. 229, 3358–3381 (2010)
Article MathSciNet MATH Google Scholar
Douglas Jr., J.: On the numerical integration of $\partial _{xx}U + \partial _{yy}U = \partial _{t}U$ by implicit methods. J. Soc. Ind. Appl. Math. 3, 42–65 (1955)
Article Google Scholar
Douglas Jr., J.: Alternating direction methods for three space variables. Numer. Math. 3, 41–63 (1962)
Article MathSciNet MATH Google Scholar
Peaceman, D.W., Rachford Jr., H.H.: The numerical solution of parabolic and elliptic differential equations. J. Soc. Ind. Appl. Math. 3, 28–41 (1955)
Article MathSciNet MATH Google Scholar
Albin, N., Bruno, O.P.: A spectral fc solver for the compressible navier-stokes equations in general domains i: Explicit time-stepping. J. Comput. Phys. 230, 6248–6270 (2011)
Article MathSciNet MATH Google Scholar
Anderson, T.G., Bruno, O.P., Lyon, M.: High-order, dispersionless “fast-hybrid” wave equation solver. part i: $\cal{O}(1)$ sampling cost via incident-field windowing and recentering. SIAM J. Sci. Comput. 42, 1348–1379 (2020)
Article MathSciNet Google Scholar
Causley, M.F., Christlieb, A.J.: Higher order a-stable schemes for the wave equation using a successive convolution approach. SIAM J. Numer. Anal. 52(1), 220–235 (2014)
Article MathSciNet MATH Google Scholar
Causley, M. F., Christlieb, A. J., Guclu, Y., Wolf, E.: Method of lines transpose: A fast implicit wave propagator. arXiv preprint arXiv:1306.6902, (2013)
Shu, C.-W.: High order weighted essentially nonoscillatory schemes for convection dominated problems. SIAM Rev. 51(1), 82–126 (2009)
Article MathSciNet MATH Google Scholar
Hornung, R.D., Keasler, J.A.: The raja portability layer: Overview and status. Lawrence Livermore National Laboratory (LLNL), Livermore. Tech. Rep. (2014). https://doi.org/10.2172/1169830
Grete, P., Glines, F. W., O’Shea, B. W.: K-athena: A performance portable structured grid finite volume magnetohydrodynamics code (2019). arXiv: 1905.04341 [cs.DC]
White, C.J., Stone, J.M., Gammie, C.F.: An extension of the athena++ code framework for grmhd based on advanced riemann solvers and staggered-mesh constrained transport. Astrophys. J. Suppl. 225, 2 (2016)
Article Google Scholar

Download references

Acknowledgements

The research of the authors was supported partly through computational resources made available by Michigan State University’s Institute for Cyber-Enabled Research. Funding for this work was provided in part by AFOSR grants FA9550-19-1-0281 and FA9550-17-1-0394 and NSF grant DMS 1912183. The authors would like to thank Brian O’Shea and Philipp Grete at Michigan State University for helpful discussions relating to Kokkos and some suggestions for fine-tuning of the parallel algorithms presented here. Additionally, W. Sands would like to thank the organizers of the 2019 Performance Portability Workshop with Kokkos, along with Christian Trott, David Hollman, and Damien Lebrun-Grandie for their assistance with our application. We also wish to express our gratitude to W. Nicolas G. Hitchon for taking the time to review our work and offer numerous suggestions and improvements regarding the presentation of the manuscript.

Author information

Pierson T. Guthrey
Present address: Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA
Mathialakan Thavappiragasm
Present address: Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA

Authors and Affiliations

Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA
Andrew J. Christlieb, Pierson T. Guthrey, William A. Sands & Mathialakan Thavappiragasm

Authors

Andrew J. Christlieb
View author publications
You can also search for this author in PubMed Google Scholar
Pierson T. Guthrey
View author publications
You can also search for this author in PubMed Google Scholar
William A. Sands
View author publications
You can also search for this author in PubMed Google Scholar
Mathialakan Thavappiragasm
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to William A. Sands.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Last updated on October 29, 2020. The research of the authors was supported in part by AFOSR Grants FA9550-19-1-0281 and FA9550-17-1-0394 and NSF Grant DMS 1912183.

Appendices

Example for Linear Advection

Suppose we wish to solve the 1D linear advection equation:

$$\begin{aligned} \partial _{t} u + c \partial _{x} u = 0, \quad (x,t) \in (a,b) \times {\mathbb {R}}^{+}, \end{aligned}$$

(Appendix A.1)

where $c > 0$ is the wave speed and leave the boundary conditions unspecified. The procedure for $c < 0$ is analogous. Discretizing (Appendix A.1) in time with backwards Euler yields a semi-discrete equation of the form

$$\begin{aligned} \frac{u^{n+1}(x) - u^{n}(x)}{\Delta t} + c \partial _{x} u^{n+1}(x) = 0. \end{aligned}$$

If we rearrange this, we obtain a linear equation of the form

$$\begin{aligned} {\mathcal {L}}[u^{n+1}; \alpha ](x) = u^{n}(x), \end{aligned}$$

(Appendix A.2)

where we have used

$$\begin{aligned} \alpha := \frac{1}{c\Delta t},\quad {\mathcal {L}}:= {\mathcal {I}} + \frac{1}{\alpha } \partial _{x}. \end{aligned}$$

By reversing the order in which the discretization is performed, we have created a sequence of BVPs at discrete time levels. If we had discretized equation (Appendix A.1) using the MOL formalism, then ${\mathcal {L}}$ would be an algebraic operator. To solve Eq. (Appendix A.2) for $u^{n+1}$, we analytically invert the operator ${\mathcal {L}}$. Notice that this equation is actually an ODE, which is linear, so the problem can be solved using methods developed for ODEs. If we apply the integrating factor method to the problem, we obtain

$$\begin{aligned} \partial _{x} \left[ e^{\alpha x} u^{n+1}(x) \right] = \alpha e^{\alpha x} u^{n}(x). \end{aligned}$$

To integrate this equation, we use the fact that characteristics move to the right, so integration is performed from a to x. After rearranging the result, we arrive at the update equation

$$\begin{aligned} u^{n+1}(x)&= e^{-\alpha (x-a)} u^{n+1}(a) + \alpha \int _{a}^{x} e^{-\alpha (x - s)} u^{n}(s) \, ds, \\&\equiv e^{-\alpha (x-a)} A^{n+1} + \alpha \int _{a}^{x} e^{-\alpha (x - s)} u^{n}(s) \, ds, \\&\equiv {\mathcal {L}}^{-1}[u^{n}; \alpha ](x). \end{aligned}$$

This update displays the origins of the implicit behavior of the method. While convolutions are performed on data from the previous time step, the boundary terms are taken at time level $n+1$.

Now that we have obtained the update equation, we need to apply the boundary conditions. Clearly, if the problem specifies a Dirchlet boundary condition at $x = a$, then $A^{n+1} = u^{n+1}(a)$. We can compute a variety of boundary conditions using the update equation

$$\begin{aligned} u^{n+1}(x) = e^{-\alpha (x-a)} A^{n+1} + \alpha \int _{a}^{x} e^{-\alpha (x - s)} u^{n}(s) \, ds, \end{aligned}$$

where

$$\begin{aligned} I[u^{n};\alpha ](x) = \alpha \int _{a}^{x} e^{-\alpha (x - s)} u^{n}(s) \, ds. \end{aligned}$$

For example, with periodic boundary conditions, we would need to satisfy

$$\begin{aligned} u^{n+1}(a)&= u^{n+1}(b), \end{aligned}$$

(Appendix A.3)

$$\begin{aligned} \partial _{x}u^{n+1}(a)&= \partial _{x} u^{n+1}(b). \end{aligned}$$

(Appendix A.4)

Applying condition (Appendix A.3), we find that

$$\begin{aligned} A^{n+1} = e^{-\alpha (b-a)} A^{n+1} + \alpha \int _{a}^{b} e^{-\alpha (b - s)} u^{n}(s) \, ds. \end{aligned}$$

Solving this equation for $A^{n+1}$ shows that

$$\begin{aligned} A^{n+1} = \frac{I[u^{n};\alpha ](b)}{1 - \mu }, \end{aligned}$$

with $\mu = e^{-\alpha (b - a)}$. Alternatively, we could have started with (Appendix A.4), which would give an identical solution. While this particular procedure is only applicable to linear problems, this exercise motivates some of the choices made to define operators in the method.

Kokkos Kernels

This section provides listings, which outline the general format of the Kokkos kernels used in this work. Specifically, we provide structures for the tiled/blocked algorithms (Listing 3) in addition to the kernel that executes the fast summation method along a line (Listing 3).

Sixth-Order WENO Quadrature

We provide the various expressions for the coefficients and smoothness indicators used in the reconstruction process for $J_{R}^{(r)}$. Defining $\nu \equiv \alpha \Delta x$, the coefficients for the fixed stencils are given in [2] as follows:

$$\begin{aligned} c_{-3}^{(0)}&= \frac{6 - 6\nu + 2\nu ^{2} - (6-\nu ^{2})e^{-\nu } }{6\nu ^{3}}, \\ c_{-2}^{(0)}&= -\frac{6 - 8\nu + 3\nu ^{2} - (6 - 2\nu -2\nu ^{2})e^{-\nu } }{2\nu ^{3}}, \\ c_{-1}^{(0)}&= \frac{6 - 10\nu + 6\nu ^{2} - (6 - 4\nu -\nu ^{2} + 2\nu ^{2})e^{-\nu } }{2\nu ^{3}}, \\ c_{0}^{(0)}&= -\frac{6 - 12\nu + 11\nu ^{2} - 6\nu ^{3} - (6 - 6\nu + 2\nu ^{2})e^{-\nu } }{6\nu ^{3}}, \\&~\\ c_{-2}^{(1)}&= \frac{6 - \nu ^{2} - (6 + 6\nu + 2\nu ^{2})e^{-\nu } }{6\nu ^{3}}, \\ c_{-1}^{(1)}&= -\frac{6 - 2\nu - 2\nu ^{2} - (6 + 4\nu - \nu ^{2} - 2\nu ^{3})e^{-\nu } }{2\nu ^{3}}, \\ c_{0}^{(1)}&= \frac{6 - 4\nu - \nu ^{2} + 2\nu ^{3} - (6 + 2\nu - 2\nu ^{2} )e^{-\nu } }{2\nu ^{3}}, \\ c_{1}^{(1)}&= -\frac{6 - 6\nu + 2\nu ^{2} - (6-\nu ^{2})e^{-\nu } }{6\nu ^{3}}, \\&~\\ c_{-1}^{(2)}&= \frac{6 + 6\nu + 2\nu ^{2} - (6 + 12\nu + 11\nu ^{2} + 6\nu ^{3} )e^{-\nu } }{6\nu ^{3}}, \\ c_{0}^{(2)}&= -\frac{6 + 4\nu - \nu ^{2} - 2\nu ^{3} - (6 + 10\nu + 6\nu ^{2})e^{-\nu } }{2\nu ^{3}}, \\ c_{1}^{(2)}&= \frac{6 + 2\nu - 2\nu ^{2} - (6 + 8\nu + 3\nu ^{2})e^{-\nu } }{2\nu ^{3}}, \\ c_{2}^{(2)}&= -\frac{6 - \nu ^{2} - (6 + 6\nu + 2\nu ^{2})e^{-\nu } }{6\nu ^{3}}. \end{aligned}$$

The corresponding linear weights are

$$\begin{aligned} d_{0}&= \frac{6 - \nu ^{2} - (6 + 6\nu + 2\nu ^{2})e^{-\nu } }{3\nu ( 2 - \nu - (2 + \nu )e^{-\nu } ) }, \\ d_{2}&= \frac{ 60 - 60\nu + 15\nu ^{2} + 5\nu ^{3} - 3\nu ^{4} - (60 - 15\nu ^{2} + 2\nu ^{4})e^{-\nu } }{ 10\nu ^{2} (6 - \nu ^{2} - (6 + 6\nu + 2\nu ^{2})e^{-\nu } ) }, \\ d_{1}&= 1 - d_{0} - d_{2}. \end{aligned}$$

To obtain the analogous expressions for $J_{L}^{(r)}$, we exploit the “mirror-symmetry" property of WENO reconstructions. That is, one can keep the left side of each of the expressions, then reverse the order of the expressions on the right. Expressions for calculating one particular smoothness indicator, if interested, can be found in [2].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Christlieb, A.J., Guthrey, P.T., Sands, W.A. et al. Parallel Algorithms for Successive Convolution. J Sci Comput 86, 1 (2021). https://doi.org/10.1007/s10915-020-01359-x

Download citation

Received: 22 July 2020
Accepted: 01 November 2020
Published: 08 December 2020
DOI: https://doi.org/10.1007/s10915-020-01359-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel Algorithms for Successive Convolution

Abstract

Access this article

Similar content being viewed by others

Parallel Finite Cell Method with Adaptive Geometric Multigrid

PFASST-ER: combining the parallel full approximation scheme in space and time with parallelization across the method

Parallel Solvers for the Bidomain System

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Example for Linear Advection

Kokkos Kernels

Sixth-Order WENO Quadrature

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Parallel Algorithms for Successive Convolution

Abstract

Access this article

Similar content being viewed by others

Parallel Finite Cell Method with Adaptive Geometric Multigrid

PFASST-ER: combining the parallel full approximation scheme in space and time with parallelization across the method

Parallel Solvers for the Bidomain System

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Example for Linear Advection

Kokkos Kernels

Sixth-Order WENO Quadrature

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation