Skip to main content
Log in

Parallel Algorithms for Successive Convolution

  • Published:
Journal of Scientific Computing Aims and scope Submit manuscript

Abstract

The development of modern computing architectures with ever-increasing amounts of parallelism has allowed for the solution of previously intractable problems across a variety of scientific disciplines. Despite these advances, multiscale computing problems continue to pose an incredible challenge to modern architectures because they require resolving scales that often vary by orders of magnitude in both space and time. Such complications have led us to consider alternative discretizations for partial differential equations (PDEs) which use expansions involving integral operators to approximate spatial derivatives (Christlieb et al. in J Comput Phys 379:214–236, 2019; Christlieb et al. J Sci Comput 82:52(3):1–29, 2020; Christlieb et al. J Comput Phys 415:1–25, 2020). These constructions use explicit information within the integral terms, but treat boundary data implicitly, which contributes to the overall speed of the method. This approach is provably unconditionally stable for linear problems and stability has been demonstrated experimentally for nonlinear problems. Additionally, it is matrix-free in the sense that it is not necessary to invert linear systems and iteration is not required for nonlinear terms. Moreover, the scheme employs a fast summation algorithm that yields a method with a computational complexity of \({\mathcal {O}}(N)\), where N is the number of mesh points along a coordinate direction. While much work has been done to explore the theory behind these methods, their practicality in large scale computing environments is a largely unexplored topic. In this work, we explore the performance of these methods by developing a domain decomposition algorithm suitable for distributed memory systems along with shared memory algorithms. As a first pass, we derive an artificial Courant–Friedrichs–Lewy condition that enforces a nearest-neighbor (N-N) communication pattern and briefly discuss possible generalizations. We also analyze several approaches for implementing the parallel algorithms by optimizing predominant loop structures and maximizing data reuse. Using a hybrid design that employs MPI and Kokkos (Edwards and Trott in J Parallel Distrib Comput 74:3202–3216, 2014) for the distributed and shared memory components of the algorithms, respectively, we show that our methods are efficient and can sustain an update rate \(> 1\times 10^8\) DOF/node/s. We provide results that demonstrate the scalability and versatility of our algorithms using several different PDE test problems, including a nonlinear example, which employs an adaptive time-stepping rule.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. Note that the update frequency does not account for error in the numerical solution. Certainly, in order to compare the efficiency of various methods, especially those that belong to different classes, one must take into account the quality of the solution. This would be reflected in, for example, an error versus time-to-solution plot.

  2. This is true when the memory space is that of the CPU (host memory). In device memory, these entries will be “coalesced", which is the optimal layout for threading on GPUs. This mapping of indices, between memory spaces, is automatically handled by Kokkos.

  3. We refer to a pencil as a, generally, long rectangle (in 2-D) and a rectangular prism (in 3-D). The use of pencils, as opposed to square blocks, would require additional precomputing efforts and, possibly, restrictions on the problem size.

References

  1. Christlieb, A., Guo, W., Jiang, Y.: A kernel-based high order “explicit” unconditionally stable scheme for time dependent Hamilton–Jacobi equations. J. Comput. Phys. 379, 214–236 (2019)

    Article  MathSciNet  Google Scholar 

  2. Christlieb, A., Guo, W., Jiang, Y., Yang, H.: Kernel based high order "explicit" unconditionally-stable scheme for nonlinear degenerate advection-diffusion equations. J. Sci. Comput. 82:52(3), 1–29 (2020)

    MathSciNet  MATH  Google Scholar 

  3. Christlieb, A., Sands, W., Yang, H.: A kernel-based explicit unconditionally stable scheme for hamilton-jacobi equations on nonuniform meshes. J. Comput. Phys. 415, 1–25 (2020). Art. No. 109543

    Article  MathSciNet  MATH  Google Scholar 

  4. Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74, 3202–3216 (2014)

    Article  Google Scholar 

  5. Causley, M., Christlieb, A., Ong, B., Van Groningen, L.: Method of lines transpose: an implicit solution to the wave equation. Math. Comput. 83(290), 2763–2786 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  6. Causley, M.F., Cho, H., Christlieb, A.J., Seal, D.C.: Method of lines transpose: high order l-stable O(N) schemes for parabolic equations using successive convolution. SIAM J. Numer. Anal. 54(3), 1635–1652 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  7. Causley, M., Cho, H., Christlieb, A.: Method of lines transpose: energy gradient flows using direct operator inversion for phase-field models. SIAM J. Sci. Comput. 39(5), B968–B992 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  8. Cheng, Y., Christlieb, A.J., Guo, W., Ong, B.: An asymptotic preserving maxwell solver resulting in the darwin limit of electrodynamics. J. Sci. Comput. 71(3), 959–993 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  9. Christlieb, A., Guo, W., Jiang, Y.: A weno-based method of lines transpose approach for vlasov simulations. J. Comput. Phys. 327, 337–367 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  10. Gottlieb, S., Shu, C.-W., Tadmor, E.: Strong stability-preserving high-order time discretization methods. SIAM Rev. 43(1), 89–112 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  11. Salazar, A., Raydan, M., Campo, A.: Theoretical analysis of the exponential transversal method of lines for the diffusion equation. Numer. Methods Partial Differ. Equ. 16(1), 30–41 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  12. Schemann, M., Bornemann, F.A.: An adaptive rothe method for the wave equation. Comput. Vis. Sci. 1(3), 137–144 (1998)

    Article  MATH  Google Scholar 

  13. Biros, G., Ying, L., Zorin, D.: An embedded boundary integral solver for the unsteady incompressible Navier–Stokes equations (preprint) (2002)

  14. Biros, G., Ying, L., Zorin, D.: A fast solver for the stokes equations with distributed forces in complex geometries. J. Comput. Phys. 193, 317–348 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  15. Chiu, S.-H., Moore, M.N.J., Quaife, B.: Viscous transport in eroding porous media. J. Fluid Mech. 893, A3 (2020). https://doi.org/10.1017/jfm.2020.228

    Article  MathSciNet  MATH  Google Scholar 

  16. Kropinski, M.C.A., Quaife, B.D.: Fast integral equation methods for rothe’s method applied to the isotropic heat equation. Comput. Math. Appl. 61, 2436–2446 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  17. Quaife, B.D., Moore, M.N.J.: A boundary-integral framework to simulate viscous erosion of a porous medium. J. Comput. Phys. 375, 1–21 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  18. Wang, H., Lei, T., Li, J., Huang, J., Yao, Z.: A parallel fast multipole accelerated integral equation scheme for 3d stokes equations. Int. J. Numer. Meth. Eng. 70, 812–839 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  19. Ying, L., Biros, G., Zorin, D.: A high-order 3d boundary integral equation solver for elliptic pdes in smooth domains. J. Comput. Phys. 219, 247–275 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  20. Saad, Y., Schultzn, M.H.: Gmres: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 7, 856–869 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  21. Bruno, O.P., Lyon, M.: High-order unconditionally stable fc-ad solvers for general smooth domains i. basic elements. J. Comput. Phys. 229, 2009–2033 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  22. Bruno, O.P., Lyon, M.: High-order unconditionally stable fc-ad solvers for general smooth domains ii. elliptic, parabolic and hyperbolic pdes. theoretical considerations. J. Comput. Phys. 229, 3358–3381 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  23. Douglas Jr., J.: On the numerical integration of \(\partial _{xx}U + \partial _{yy}U = \partial _{t}U\) by implicit methods. J. Soc. Ind. Appl. Math. 3, 42–65 (1955)

    Article  Google Scholar 

  24. Douglas Jr., J.: Alternating direction methods for three space variables. Numer. Math. 3, 41–63 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  25. Peaceman, D.W., Rachford Jr., H.H.: The numerical solution of parabolic and elliptic differential equations. J. Soc. Ind. Appl. Math. 3, 28–41 (1955)

    Article  MathSciNet  MATH  Google Scholar 

  26. Albin, N., Bruno, O.P.: A spectral fc solver for the compressible navier-stokes equations in general domains i: Explicit time-stepping. J. Comput. Phys. 230, 6248–6270 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  27. Anderson, T.G., Bruno, O.P., Lyon, M.: High-order, dispersionless “fast-hybrid” wave equation solver. part i: \(\cal{O}(1)\) sampling cost via incident-field windowing and recentering. SIAM J. Sci. Comput. 42, 1348–1379 (2020)

    Article  MathSciNet  Google Scholar 

  28. Causley, M.F., Christlieb, A.J.: Higher order a-stable schemes for the wave equation using a successive convolution approach. SIAM J. Numer. Anal. 52(1), 220–235 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  29. Causley, M. F., Christlieb, A. J., Guclu, Y., Wolf, E.: Method of lines transpose: A fast implicit wave propagator. arXiv preprint arXiv:1306.6902, (2013)

  30. Shu, C.-W.: High order weighted essentially nonoscillatory schemes for convection dominated problems. SIAM Rev. 51(1), 82–126 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  31. Hornung, R.D., Keasler, J.A.: The raja portability layer: Overview and status. Lawrence Livermore National Laboratory (LLNL), Livermore. Tech. Rep. (2014). https://doi.org/10.2172/1169830

  32. Grete, P., Glines, F. W., O’Shea, B. W.: K-athena: A performance portable structured grid finite volume magnetohydrodynamics code (2019). arXiv: 1905.04341 [cs.DC]

  33. White, C.J., Stone, J.M., Gammie, C.F.: An extension of the athena++ code framework for grmhd based on advanced riemann solvers and staggered-mesh constrained transport. Astrophys. J. Suppl. 225, 2 (2016)

    Article  Google Scholar 

Download references

Acknowledgements

The research of the authors was supported partly through computational resources made available by Michigan State University’s Institute for Cyber-Enabled Research. Funding for this work was provided in part by AFOSR grants FA9550-19-1-0281 and FA9550-17-1-0394 and NSF grant DMS 1912183. The authors would like to thank Brian O’Shea and Philipp Grete at Michigan State University for helpful discussions relating to Kokkos and some suggestions for fine-tuning of the parallel algorithms presented here. Additionally, W. Sands would like to thank the organizers of the 2019 Performance Portability Workshop with Kokkos, along with Christian Trott, David Hollman, and Damien Lebrun-Grandie for their assistance with our application. We also wish to express our gratitude to W. Nicolas G. Hitchon for taking the time to review our work and offer numerous suggestions and improvements regarding the presentation of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to William A. Sands.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Last updated on October 29, 2020. The research of the authors was supported in part by AFOSR Grants FA9550-19-1-0281 and FA9550-17-1-0394 and NSF Grant DMS 1912183.

Appendices

Example for Linear Advection

Suppose we wish to solve the 1D linear advection equation:

$$\begin{aligned} \partial _{t} u + c \partial _{x} u = 0, \quad (x,t) \in (a,b) \times {\mathbb {R}}^{+}, \end{aligned}$$
(Appendix A.1)

where \(c > 0\) is the wave speed and leave the boundary conditions unspecified. The procedure for \(c < 0\) is analogous. Discretizing (Appendix A.1) in time with backwards Euler yields a semi-discrete equation of the form

$$\begin{aligned} \frac{u^{n+1}(x) - u^{n}(x)}{\Delta t} + c \partial _{x} u^{n+1}(x) = 0. \end{aligned}$$

If we rearrange this, we obtain a linear equation of the form

$$\begin{aligned} {\mathcal {L}}[u^{n+1}; \alpha ](x) = u^{n}(x), \end{aligned}$$
(Appendix A.2)

where we have used

$$\begin{aligned} \alpha := \frac{1}{c\Delta t},\quad {\mathcal {L}}:= {\mathcal {I}} + \frac{1}{\alpha } \partial _{x}. \end{aligned}$$

By reversing the order in which the discretization is performed, we have created a sequence of BVPs at discrete time levels. If we had discretized equation (Appendix A.1) using the MOL formalism, then \({\mathcal {L}}\) would be an algebraic operator. To solve Eq. (Appendix A.2) for \(u^{n+1}\), we analytically invert the operator \({\mathcal {L}}\). Notice that this equation is actually an ODE, which is linear, so the problem can be solved using methods developed for ODEs. If we apply the integrating factor method to the problem, we obtain

$$\begin{aligned} \partial _{x} \left[ e^{\alpha x} u^{n+1}(x) \right] = \alpha e^{\alpha x} u^{n}(x). \end{aligned}$$

To integrate this equation, we use the fact that characteristics move to the right, so integration is performed from a to x. After rearranging the result, we arrive at the update equation

$$\begin{aligned} u^{n+1}(x)&= e^{-\alpha (x-a)} u^{n+1}(a) + \alpha \int _{a}^{x} e^{-\alpha (x - s)} u^{n}(s) \, ds, \\&\equiv e^{-\alpha (x-a)} A^{n+1} + \alpha \int _{a}^{x} e^{-\alpha (x - s)} u^{n}(s) \, ds, \\&\equiv {\mathcal {L}}^{-1}[u^{n}; \alpha ](x). \end{aligned}$$

This update displays the origins of the implicit behavior of the method. While convolutions are performed on data from the previous time step, the boundary terms are taken at time level \(n+1\).

Now that we have obtained the update equation, we need to apply the boundary conditions. Clearly, if the problem specifies a Dirchlet boundary condition at \(x = a\), then \(A^{n+1} = u^{n+1}(a)\). We can compute a variety of boundary conditions using the update equation

$$\begin{aligned} u^{n+1}(x) = e^{-\alpha (x-a)} A^{n+1} + \alpha \int _{a}^{x} e^{-\alpha (x - s)} u^{n}(s) \, ds, \end{aligned}$$

where

$$\begin{aligned} I[u^{n};\alpha ](x) = \alpha \int _{a}^{x} e^{-\alpha (x - s)} u^{n}(s) \, ds. \end{aligned}$$

For example, with periodic boundary conditions, we would need to satisfy

$$\begin{aligned} u^{n+1}(a)&= u^{n+1}(b), \end{aligned}$$
(Appendix A.3)
$$\begin{aligned} \partial _{x}u^{n+1}(a)&= \partial _{x} u^{n+1}(b). \end{aligned}$$
(Appendix A.4)

Applying condition (Appendix A.3), we find that

$$\begin{aligned} A^{n+1} = e^{-\alpha (b-a)} A^{n+1} + \alpha \int _{a}^{b} e^{-\alpha (b - s)} u^{n}(s) \, ds. \end{aligned}$$

Solving this equation for \(A^{n+1}\) shows that

$$\begin{aligned} A^{n+1} = \frac{I[u^{n};\alpha ](b)}{1 - \mu }, \end{aligned}$$

with \(\mu = e^{-\alpha (b - a)}\). Alternatively, we could have started with (Appendix A.4), which would give an identical solution. While this particular procedure is only applicable to linear problems, this exercise motivates some of the choices made to define operators in the method.

Kokkos Kernels

This section provides listings, which outline the general format of the Kokkos kernels used in this work. Specifically, we provide structures for the tiled/blocked algorithms (Listing 3) in addition to the kernel that executes the fast summation method along a line (Listing 3).

figure d
figure e

Sixth-Order WENO Quadrature

We provide the various expressions for the coefficients and smoothness indicators used in the reconstruction process for \(J_{R}^{(r)}\). Defining \(\nu \equiv \alpha \Delta x\), the coefficients for the fixed stencils are given in [2] as follows:

$$\begin{aligned} c_{-3}^{(0)}&= \frac{6 - 6\nu + 2\nu ^{2} - (6-\nu ^{2})e^{-\nu } }{6\nu ^{3}}, \\ c_{-2}^{(0)}&= -\frac{6 - 8\nu + 3\nu ^{2} - (6 - 2\nu -2\nu ^{2})e^{-\nu } }{2\nu ^{3}}, \\ c_{-1}^{(0)}&= \frac{6 - 10\nu + 6\nu ^{2} - (6 - 4\nu -\nu ^{2} + 2\nu ^{2})e^{-\nu } }{2\nu ^{3}}, \\ c_{0}^{(0)}&= -\frac{6 - 12\nu + 11\nu ^{2} - 6\nu ^{3} - (6 - 6\nu + 2\nu ^{2})e^{-\nu } }{6\nu ^{3}}, \\&~\\ c_{-2}^{(1)}&= \frac{6 - \nu ^{2} - (6 + 6\nu + 2\nu ^{2})e^{-\nu } }{6\nu ^{3}}, \\ c_{-1}^{(1)}&= -\frac{6 - 2\nu - 2\nu ^{2} - (6 + 4\nu - \nu ^{2} - 2\nu ^{3})e^{-\nu } }{2\nu ^{3}}, \\ c_{0}^{(1)}&= \frac{6 - 4\nu - \nu ^{2} + 2\nu ^{3} - (6 + 2\nu - 2\nu ^{2} )e^{-\nu } }{2\nu ^{3}}, \\ c_{1}^{(1)}&= -\frac{6 - 6\nu + 2\nu ^{2} - (6-\nu ^{2})e^{-\nu } }{6\nu ^{3}}, \\&~\\ c_{-1}^{(2)}&= \frac{6 + 6\nu + 2\nu ^{2} - (6 + 12\nu + 11\nu ^{2} + 6\nu ^{3} )e^{-\nu } }{6\nu ^{3}}, \\ c_{0}^{(2)}&= -\frac{6 + 4\nu - \nu ^{2} - 2\nu ^{3} - (6 + 10\nu + 6\nu ^{2})e^{-\nu } }{2\nu ^{3}}, \\ c_{1}^{(2)}&= \frac{6 + 2\nu - 2\nu ^{2} - (6 + 8\nu + 3\nu ^{2})e^{-\nu } }{2\nu ^{3}}, \\ c_{2}^{(2)}&= -\frac{6 - \nu ^{2} - (6 + 6\nu + 2\nu ^{2})e^{-\nu } }{6\nu ^{3}}. \end{aligned}$$

The corresponding linear weights are

$$\begin{aligned} d_{0}&= \frac{6 - \nu ^{2} - (6 + 6\nu + 2\nu ^{2})e^{-\nu } }{3\nu ( 2 - \nu - (2 + \nu )e^{-\nu } ) }, \\ d_{2}&= \frac{ 60 - 60\nu + 15\nu ^{2} + 5\nu ^{3} - 3\nu ^{4} - (60 - 15\nu ^{2} + 2\nu ^{4})e^{-\nu } }{ 10\nu ^{2} (6 - \nu ^{2} - (6 + 6\nu + 2\nu ^{2})e^{-\nu } ) }, \\ d_{1}&= 1 - d_{0} - d_{2}. \end{aligned}$$

To obtain the analogous expressions for \(J_{L}^{(r)}\), we exploit the “mirror-symmetry" property of WENO reconstructions. That is, one can keep the left side of each of the expressions, then reverse the order of the expressions on the right. Expressions for calculating one particular smoothness indicator, if interested, can be found in [2].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Christlieb, A.J., Guthrey, P.T., Sands, W.A. et al. Parallel Algorithms for Successive Convolution. J Sci Comput 86, 1 (2021). https://doi.org/10.1007/s10915-020-01359-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10915-020-01359-x

Keywords

Navigation