Advertisement

Performance Evaluation for a PETSc Parallel-in-Time Solver Based on the MGRIT Algorithm

  • Valeria Mele
  • Diego Romano
  • Emil M. Constantinescu
  • Luisa Carracciuolo
  • Luisa D’Amore
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11339)

Abstract

We herein describe the performance evaluation of a modular implementation of the MGRIT (MultiGrid-In-Time) algorithm within the context of the PETSc (the Portable, Extensible Toolkit for Scientific computing) library. Our aim is to give the PETSc users the opportunity of testing the MGRIT parallel-in-time approach as an alternative to the Time Stepping integrator (TS), when solving their problems arising from the discretization of linear evolutionary models. To this end, we analyzed the performance parameters of the algorithm in order to underline the relationship between the configuration factors and problem characteristics, intentionally overlooking any accuracy issue and spacial parallelism.

Keywords

Parallelism-in-time Performance evaluation Multigrid reduction MGRIT Linear systems PETSc 

Notes

Acknowledgments

The research was carried out during a collaboration between the University of Naples Federico II (Naples, Italy) and the Argonne National Laboratory (Chicago, Illinois, USA).

It has received funding from European Commission under H2020-MSCA-RISE NASDAC project (grant agreement n. 691184).

This work was also supported by GNCS INdAM.

References

  1. 1.
    Balay, S., et al.: Petsc User Manual. Revision 3.7 Report number ANL-95/11 Rev. 3.7 127241, United States: N. p., 2016. Web (2016).  https://doi.org/10.2172/1255238
  2. 2.
    Murli, A., Boccia, V., Carracciuolo, L., D’Amore, L., Laccetti, G., Lapegna, M.: Monitoring and migration of a PETSc-based parallel application for medical imaging in a grid computing PSE. In: Gaffney, P.W., Pool, J.C.T. (eds.) Grid-Based Problem Solving Environments. ITIFIP, vol. 239, pp. 421–432. Springer, Boston, MA (2007).  https://doi.org/10.1007/978-0-387-73659-4_25CrossRefGoogle Scholar
  3. 3.
    Falgout, R.D., Friedhoff, S., Kolev, T.V., MacLachlan, S.P., Schroder, J.B.: Parallel time integration with multigrid. SIAM J. Sci. Comput. 36(6), C635–C661 (2014).  https://doi.org/10.1137/130944230MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    XBraid: Parallel multigrid in time. http://llnl.gov/casc/xbraid
  5. 5.
    Carracciuolo, L., D’Amore, L., Mele, V.: Toward a fully parallel multigrid in time algorithm in PETSc environment: a case study in ocean models. In: IEEE proceedings of International Conference on High Performance Computing & Simulation (HPCS) 2015, Amsterdam, pp. 595–598 (2015).  https://doi.org/10.1109/HPCSim.2015.7237098
  6. 6.
    Tjaden, G.S., Flynn, M.J.: Detection and parallel execution of independent instruction. IEEE Trans. Comput. 19(10), 889–895 (1970).  https://doi.org/10.1109/T-C.1970.222795CrossRefGoogle Scholar
  7. 7.
    Gahvari, H., et al.: A performance model for allocating the parallelism in a multigrid-in-time solver. In: Proceedings of 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems (PMBS), Salt Lake City, UT, 2016, art. no. 7836411, pp. 22–31. IEEE Press (2017).  https://doi.org/10.1109/PMBS.2016.008
  8. 8.
    D’Amore, L., Mele, V., Laccetti, G., Murli, A.: Mathematical approach to the performance evaluation of matrix multiply algorithm. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds.) PPAM 2015. LNCS, vol. 9574, pp. 25–34. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-32152-3_3CrossRefGoogle Scholar
  9. 9.
    Mele, V., Costantinescu, E.M., Carracciuolo, L., D’Amore, L.: A PETSc parallel-in-time solver based on MGRIT algorithm. Concurrency Comput.: Practice Exp. e4928 (2018).  https://doi.org/10.1002/cpe.4928CrossRefGoogle Scholar
  10. 10.
    Schroder, J.B., Falgout, R.D., Manteuffel, T.A., O’Neill, B.: Multigrid reduction in time for nonlinear parabolic problems: a case study. SIAM J. Sci. Comput. 39(5), S298–S322 (2017)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Lions, J.L., Maday, Y., Turinici, G.: A parareal in time discretization of PDEs. Comptes Rendus de l’Academie des Sci. - Ser. I - Math. 332, 661–668 (2001).  https://doi.org/10.1016/S0764-4442(00)01793-6CrossRefzbMATHGoogle Scholar
  12. 12.
    Gander, M.J., Vandewalle, S.: Analysis of the parareal time-parallel time-integration method. SIAM J. Sci. Comput. 29, 556–578 (2007).  https://doi.org/10.1137/05064607XMathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Falgout, R.D., Friedhoff, S., Kolev, T.V., MacLachlan, S.P., Schroder, J.B., Vandewalle, S.: Multigrid methods with space-time concurrency. SIAM J. Sci. Comput. (2015).  https://doi.org/10.1007/s00791-017-0283-9MathSciNetCrossRefGoogle Scholar
  14. 14.
    Cuomo, S., De Michele, P., Piccialli, F.: 3D data denoising via nonlocal means filter by using parallel GPU strategies. Comput. Math. Methods Med. 2014, 14 (2014).  https://doi.org/10.1155/2014/523862. Article ID 523862CrossRefzbMATHGoogle Scholar
  15. 15.
    Cuomo, S., De Michele, P., Piccialli, F.: A (multi) GPU iterative reconstruction algorithm based on Hessian penalty term for sparse MRI. Int. J. Grid Utility Comput. 9(2), 139–156 (2018).  https://doi.org/10.1504/IJGUC.2018.091720CrossRefGoogle Scholar
  16. 16.
    Piccialli, F., Cuomo, S., De Michele, P.: A regularized MRI image reconstruction based on Hessian penalty term on CPU/GPU systems. Procedia Comput. Sci. 18, 2643–2646 (2013).  https://doi.org/10.1016/j.procs.2013.06.001. ISSN 1877–0509CrossRefGoogle Scholar
  17. 17.
    D’Amore, L., Marcellino, L., Mele, V., Romano, D.: Deconvolution of 3D fluorescence microscopy images using graphics processing units. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2011. LNCS, vol. 7203, pp. 690–699. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-31464-3_70CrossRefGoogle Scholar
  18. 18.
    Maddalena, L., Petrosino, A., Laccetti, G.: A fusion-based approach to digital movie restoration. Pattern Recogn. 42(7), 1485–1495 (2009)CrossRefGoogle Scholar
  19. 19.
    Gregoretti, F., Laccetti, G., Murli, A., Oliva, G., Scafuri, U.: MGF: a grid-enabled MPI library. Future Gen. Comput. Syst. 24(2), 158–165 (2008)CrossRefGoogle Scholar
  20. 20.
    Laccetti, G., Lapegna, M., Mele, V., Romano, D., Murli, A.: A double adaptive algorithm for multidimensional integration on multicore based HPC systems. Int. J. Parallel Program. 40(4), 397–409 (2012).  https://doi.org/10.1007/s10766-011-0191-4CrossRefGoogle Scholar
  21. 21.
    Laccetti, G., Lapegna, M., Mele, V., Romano, D.: A study on adaptive algorithms for numerical quadrature on heterogeneous GPU and multicore based systems. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2013. LNCS, vol. 8384, pp. 704–713. Springer, Heidelberg (2014).  https://doi.org/10.1007/978-3-642-55224-3_66CrossRefGoogle Scholar
  22. 22.
    Laccetti, G., Lapegna, M., Mele, V., Montella, R.: An adaptive algorithm for high-dimensional integrals on heterogeneous CPU-GPU systems. Concurrency Comput.: Practice Exp. 2018, e4945 (2018).  https://doi.org/10.1002/cpe.4945CrossRefGoogle Scholar
  23. 23.
    Laccetti, G., Lapegna, M., Mele, V.: A loosely coordinated model for heap-based priority queues in multicore environments. Int. J. Parallel Program. 44(4), 901–921 (2016).  https://doi.org/10.1007/s10766-015-0398-xCrossRefGoogle Scholar
  24. 24.
    D’Amore, L., Casaburi, D., Galletti, A., Marcellino, L., Murli, A.: Integration of emerging computer technologies for an efficient image sequences analysis. Integr. Comput.-Aided Eng. 18(4), 365–378 (2011).  https://doi.org/10.3233/ICA-2011-0382CrossRefGoogle Scholar
  25. 25.
    Arcucci, R., D’Amore, L., Celestino, S., Laccetti, G., Murli, A.: A scalable numerical algorithm for solving Tikhonov regularization problems. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds.) PPAM 2015. LNCS, vol. 9574, pp. 45–54. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-32152-3_5CrossRefGoogle Scholar
  26. 26.
    Boccia, V., Carracciuolo, L., Laccetti, G., Lapegna, M., Mele, V.: HADAB: enabling fault tolerance in parallel applications running in distributed environments. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2011. LNCS, vol. 7203, pp. 700–709. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-31464-3_71CrossRefGoogle Scholar
  27. 27.
    Murli, A., Cuomo, S., D’Amore, L., Galletti, A.: Numerical regularization of a real inversion formula based on the Laplace transform’s eigen function expansion of the inverse function. Inverse Probl. 23(2), 713 (2007)CrossRefGoogle Scholar
  28. 28.
    D’Amore, L., Campagna, R., Mele, V., Murli, A., Rizzardi, M.: ReLaTIve. An Ansi C90 software package for the real Laplace transform inversion. Numer. Algorithms 63(1), 187–211 (2013).  https://doi.org/10.1007/s11075-012-9636-0CrossRefGoogle Scholar
  29. 29.
    Murli, A., D’Amore, L., Laccetti, G., Gregoretti, F., Oliva, G.: A multi-grained distributed implementation of the parallel Block Conjugate Gradient algorithm. Concurrency Comput. Practice Exp. 22(15), 2053–2072 (2010).  https://doi.org/10.1002/cpe.1548CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of Naples Federico IINaplesItaly
  2. 2.Italian National Research Council - CNRRomeItaly
  3. 3.Mathematics and Computer Science DivisionArgonne National LaboratoryChicagoUSA

Personalised recommendations