Advertisement

The Journal of Supercomputing

, Volume 75, Issue 3, pp 1654–1669 | Cite as

Analytical Communication Performance Models as a metric in the partitioning of data-parallel kernels on heterogeneous platforms

  • Juan A. Rico-GallegoEmail author
  • Juan C. Díaz-Martín
  • Carmen Calvo-Jurado
  • Sergio Moreno-Álvarez
  • Juan L. García-Zapata
Article
  • 64 Downloads

Abstract

Data partitioning on heterogeneous HPC platforms is formulated as an optimization problem. The algorithm departs from the communication performance models of the processes representing their speeds and outputs a data tiling that minimizes the communication cost. Traditionally, communication volume is the metric used to guide the partitioning, but such metric is unable to capture the complexities introduced by uneven communication channels and the variety of patterns in the kernel communications. We discuss Analytical Communication Performance Models as a new metric in partitioning algorithms. They have not been considered in the past because of two reasons: prediction inaccuracy and lack of tools to automatically build and solve kernel communication formal expressions. We show how communication performance models fit the specific kernel and platform, and we present results that equal or even improve previous volume-based strategies.

Keywords

Partitioning algorithms Communication performance models Communication optimization Hybrid data-parallel kernels 

Notes

Acknowledgements

This work was supported by the European Regional Development Fund ‘A way to achieve Europe’ (ERDF) and the Extremadura Local Government (Ref. IB16118). It was also partially supported by the computing facilities of Extremadura Research Center for Advanced Technologies (CETA-CIEMAT), funded by the European Regional Development Fund (ERDF).

References

  1. 1.
    Beaumont O, Boudet V, Rastello F, Robert Y (2001) Matrix multiplication on heterogeneous platforms. IEEE Trans Parallel Distrib Syst 12(10):1033–1051CrossRefGoogle Scholar
  2. 2.
    Clarke D, Zhong Z, Rychkov V, Lastovetsky A (2014) FuPerMod: a software tool for the optimization of data-parallel applications on heterogeneous platforms. J Supercomput 69:61–69CrossRefGoogle Scholar
  3. 3.
    Dongarra J, Pineau JF, Robert Y, Vivien F (2008) Matrix product on heterogeneous master-worker platforms. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ACM, New York, NY, USA, PPoPP ’08, pp 53–62Google Scholar
  4. 5.
    Kalinov A, Lastovetsky A (2001) Heterogeneous distribution of computations solving linear algebra problems on networks of heterogeneous computers. J Parallel Distrib Comput 61(4):520–535CrossRefzbMATHGoogle Scholar
  5. 6.
    Lastovetsky A, Reddy R (2010) Distributed data partitioning for heterogeneous processors based on partial estimation of their functional performance models. In: Lin HX, Alexander M, Forsell M, Knüpfer A, Prodan R, Sousa L, Streit A (eds) Euro-Par 2009—parallel processing workshops. Springer, Berlin, pp 91–101Google Scholar
  6. 7.
    Malik T, Rychkov V, Lastovetsky A (2016) Network-aware optimization of communications for parallel matrix multiplication on hierarchical HPC platforms. Concurr Comput Pract Exp 28:802–821CrossRefGoogle Scholar
  7. 8.
    Rico-Gallego JA, Díaz-Martín JC (2015) \(\tau \)-Lop: modeling performance of shared memory MPI. Parallel Comput 46:14–31CrossRefGoogle Scholar
  8. 9.
    Rico-Gallego JA, Díaz-Martín JC, Lastovetsky AL (2016) Extending \(\tau \)-lop to model concurrent MPI communications in multicore clusters. Future Gener Comput Syst 61:66–82CrossRefGoogle Scholar
  9. 10.
    Rico-Gallego JA, Lastovetsky AL, Díaz-Martín JC (2017) Model-based estimation of the communication cost of hybrid data-parallel applications on heterogeneous clusters. IEEE Trans Parallel Distrib Syst 28(11):3215–3228CrossRefGoogle Scholar
  10. 4.
    van de Geijn RA, Watts J (1995) SUMMA: scalable universal matrix multiplication algorithm. Technical Report, Austin, TX, USAGoogle Scholar
  11. 11.
    Zhong Z, Rychkov V, Lastovetsky A (2015) Data partitioning on multicore and multi-GPU platforms using functional performance models. IEEE Trans Comput 64:2506–2518MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Juan A. Rico-Gallego
    • 1
    Email author
  • Juan C. Díaz-Martín
    • 2
  • Carmen Calvo-Jurado
    • 3
  • Sergio Moreno-Álvarez
    • 1
  • Juan L. García-Zapata
    • 3
  1. 1.Department of Computer Systems Engineering and TelematicsUniversity of ExtremaduraCáceresSpain
  2. 2.Department of Computer Technology and CommunicationsUniversity of ExtremaduraCáceresSpain
  3. 3.Department of MathematicsUniversity of ExtremaduraBadajozSpain

Personalised recommendations