A framework for efficient performance prediction of distributed applications in heterogeneous systems

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Predicting distributed application performance is a constant challenge to researchers, with an increased difficulty when heterogeneous systems are involved. Research conducted so far is limited by application type, programming language, or targeted system. The employed models become too complex and prediction cost increases significantly. We propose dPerf, a new performance prediction tool. In dPerf, we extended existing methods from the frameworks Rose and SimGrid. New methods have also been proposed and implemented such that dPerf would perform (i) static code analysis and (ii) trace-based simulation. Based on these two phases, dPerf predicts the performance of C, C++ and Fortran applications communicating using MPI or P2PSAP. Neither one of the used frameworks was developed explicitly for performance prediction, making dPerf a novel tool. dPerf accuracy is validated by a sequential Laplace code and a parallel NAS benchmark. For a low prediction cost and a high gain, dPerf yields accurate results.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

References

  1. 1.

    Adve VS, Bagrodia R, Browne JC, Deelman E, Dube A, Houstis EN, Rice JR, Sakellariou R, Sundaram-Stukel DJ, Teller PJ, Vernon MK (2000) POEMS: End-to-end performance design of large parallel adaptive computational systems. IEEE Trans Softw Eng 26:1027–1048

    Article  Google Scholar 

  2. 2.

    ANR CIP project web page. http://spiderman-2.laas.fr/CIS-CIP

  3. 3.

    Badia RM, Escalé F, Gabriel E, Gimenez J, Keller R, Labarta J, Müller MS (2004) Performance prediction in a grid environment. In: Grid computing. Lecture notes in computer science, vol 2970. Springer, Berlin/Heidelberg, pp 257–264

    Google Scholar 

  4. 4.

    Bailey DH, Barszcz E, Barton JT, Browning DS, Carter RL, Dagum L, Fatoohi RA, Frederickson PO, Lasinski TA, Schreiber RS, Simon HD, Venkatakrishnan V, Weeratunga SK (1991) The NAS parallel benchmarks—summary and preliminary results. In: SC’91: proceedings of the 1991 ACM/IEEE conference on supercomputing. ACM Press, New York, pp 158–165

    Google Scholar 

  5. 5.

    Bourgeois J, Spies F (2000) Performance prediction of an NAS benchmark program with ChronosMix environment. In: Euro-Par’00: the 6-th international Euro-Par conference on parallel processing. Springer, Berlin, pp 208–216

    Google Scholar 

  6. 6.

    Casanova H, Legrand A, Quinson M (2008) SimGrid: a generic framework for large-scale distributed experiments. In: UKSIM’08: proceedings of the 10th int conference on computer modeling and simulation. IEEE Computer Society, Los Alamitos, pp 126–131

    Google Scholar 

  7. 7.

    Cornea BF, Bourgeois J (2011) Performance prediction of distributed applications using block benchmarking methods. In: PDP’11, 19-th int Euromicro conf on parallel, distributed and network-based processing. IEEE Computer Society, Los Alamitos

    Google Scholar 

  8. 8.

    Cornea B, Bourgeois J (2012) http://lifc.univ-fcomte.fr/page_personnelle/recherche/136

  9. 9.

    Culler D, Karp R, Patterson D, Sahay A, Schauser KE, Santos E, Subramonian R, von Eicken T (1993) LogP: towards a realistic model of parallel computation. ACM Press, New York, pp 1–12

    Google Scholar 

  10. 10.

    El Baz D, Nguyen TT (2010) A self-adaptive communication protocol with application to high performance peer to peer distributed computing. In: PDP’10: proceedings of the 18th Euromicro conference on parallel, distributed and network-based processing. IEEE Computer Society, Los Alamitos, pp 327–333

    Google Scholar 

  11. 11.

    Ernst-Desmulier JB, Bourgeois J, Spies F, Verbeke J (2005) Adding new features in a peer-to-peer distributed computing framework. In: PDP’05: proceedings of the 13th Euromicro conference on parallel, distributed and network-based processing. IEEE Computer Society, Los Alamitos, pp 34–41

    Google Scholar 

  12. 12.

    Ernst-Desmulier JB, Bourgeois J, Spies F (2008) P2pperf: a framework for simulating and optimizing peer-to-peer-distributed computing applications. Concurr Comput 20(6):693–712

    Article  Google Scholar 

  13. 13.

    Fahringer T (1996) On estimating the useful work distribution of parallel programs under the P3T: a static performance estimator. Concurr Pract Exp 8:28–32

    Google Scholar 

  14. 14.

    Fahringer T, Zima HP (1993) A static parameter based performance prediction tool for parallel programs. In: ICS’93: proceedings of the 7th international conference on supercomputing. ACM Press, New York, pp 207–219

    Google Scholar 

  15. 15.

    Finney SA (2001) Real-time data collection in Linux: a case study. Behav Res Methods Instrum Comput 33:167–173

    Article  Google Scholar 

  16. 16.

    Laplace transform instrumented with dPerf; simple block benchmarking method. http://bogdan.cornea.perso.neuf.fr/files/journal_files/laplace_dperf.c

  17. 17.

    Laplace transform. http://www.physics.ohio-state.edu/~ntg/780/c_progs/laplace.c

  18. 18.

    Li J, Shi F, Deng N, Zuo Q (2009) Performance prediction based on hierarchy parallel features captured in multi-processing system. In: HPDC’09: proc of the 18th ACM int symposium on high performance distributed computing. ACM Press, New York, pp 63–64

    Google Scholar 

  19. 19.

    Livadas PE, Croll S (1994) System dependence graphs based on parse trees and their use in software maintenance. Inf Sci 76(3–4):197–232

    Article  Google Scholar 

  20. 20.

    Marin G (2007) Application insight through performance modeling. In: IPCCC’07: proceedings of the performance, computing, and comm. conf. IEEE Computer Society, Los Alamitos

    Google Scholar 

  21. 21.

    Marin G, Mellor-Crummey J (2004) Cross-architecture performance predictions for scientific applications using parameterized models. In: SIGMETRICS’04/Performance’04: proceedings of the joint international conference on measurement and modeling of computer systems. ACM Press, New York, pp 2–13

    Google Scholar 

  22. 22.

    NAS parallel benchmarks. http://www.nas.nasa.gov/Resources/Software/npb.html

  23. 23.

    Nguyen TT, El Baz D, Spiteri P, Jourjon G, Chau M (2010) High performance peer-to-peer distributed computing with application to obstacle problem. In: IPDPSW’10: IEEE international symposium on parallel distributed processing, workshops and Phd forum, pp 1–8

    Google Scholar 

  24. 24.

    Noeth M, Marathe J, Mueller F, Schulz M, de Supinski B (2006) Scalable compression and replay of communication traces in massively parallel environments. In: SC’06: proceedings of the 2006 ACM/IEEE conference on supercomputing. ACM Press, New York, p 144

    Google Scholar 

  25. 25.

    PAPI project website. http://icl.cs.utk.edu/papi/

  26. 26.

    PAPI SC2008 handout. http://icl.cs.utk.edu/graphics/posters/files/

  27. 27.

    Perfmon project webpage. http://perfmon2.sourceforge.net/

  28. 28.

    Pettersson M (2012) Perfctr project webpage. http://user.it.uu.se/~mikpe/linux/perfctr/

  29. 29.

    Prakash S, Bagrodia RL (1998) MPI-SIM: using parallel simulation to evaluate mpi programs. In: WSC’98: proceedings of the 30th conference on winter simulation. IEEE Computer Society Press, Los Alamitos, pp 467–474

    Google Scholar 

  30. 30.

    Rose LD, Poxon H (2009) A paradigm change: from performance monitoring to performance analysis. In: SBAC-PAD, pp 119–126

    Google Scholar 

  31. 31.

    Saavedra RH, Smith AJ (1996) Analysis of benchmark characteristics and benchmark performance prediction. ACM Trans Comput Syst 14(4):344–384

    Article  Google Scholar 

  32. 32.

    Schordan M, Quinlan D (2003) A source-to-source architecture for user-defined optimizations. In: Modular programming languages. Lecture notes in computer science, vol 2789. Springer, Berlin/Heidelberg, pp 214–223

    Google Scholar 

  33. 33.

    Skinner D, Kramer W (2005) Understanding the causes of performance variability in HPC workloads. In: IEEE workload characterization symposium, pp 137–149

    Google Scholar 

  34. 34.

    Snavely A, Wolter N, Carrington L (2001) Modeling application performance by convolving machine signatures with application profiles. In: WWC’01: IEEE international workshop on workload characterization. IEEE Computer Society, Los Alamitos, pp 149–156

    Google Scholar 

  35. 35.

    Sundaram-Stukel D, Vernon MK (1999) Predictive analysis of a wavefront application using LogGP. In: 7th ACM SIGPLAN symposium on principles and practice of parallel programming, vol 34(8). ACM Press, New York, pp 141–150

    Google Scholar 

  36. 36.

    The message passing interface standard. http://www-unix.mcs.anl.gov/mpi

  37. 37.

    van Gemund AJC (2003) Symbolic performance modeling of parallel systems. IEEE Trans Parallel Distrib Syst 14(2):154–165

    Article  Google Scholar 

  38. 38.

    Zaparanuks D, Jovic M, Hauswirth M (2009) Accuracy of performance counter measurements. In: ISPASS’09: IEEE international symposium on performance analysis of systems and software, pp 23–32

    Google Scholar 

  39. 39.

    Zhai J, Chen W, Zheng W (2010) Phantom: predicting performance of parallel applications on large-scale parallel machines using a single node. In: PPoPP’10: proceedings of the 15th ACM SIGPLAN symposium on principles and practice of parallel programming. ACM Press, New York, pp 305–314

    Google Scholar 

Download references

Acknowledgements

This work is funded by the French National Agency for Research under the ANR-07-CIS7-011-01 contract [2].

Author information

Affiliations

Authors

Corresponding author

Correspondence to Bogdan Florin Cornea.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Cornea, B.F., Bourgeois, J. A framework for efficient performance prediction of distributed applications in heterogeneous systems. J Supercomput 62, 1609–1634 (2012). https://doi.org/10.1007/s11227-012-0823-5

Download citation

Keywords

  • Performance prediction
  • Distributed applications
  • Automatic static analysis
  • Block benchmarking
  • Trace-based simulation
  • dPerf