The Journal of Supercomputing

, Volume 71, Issue 1, pp 162–201 | Cite as

Subjective versus objective: classifying analytical models for productive heterogeneous performance prediction

  • Vivek K. Pallipuram
  • Melissa C. Smith
  • Nilim Sarma
  • Ranajeet Anand
  • Edwin Weill
  • Karan Sapra
Article

Abstract

Heterogeneous analytical models are valuable tools that facilitate optimal application tuning via runtime prediction; however, they require several man-hours of effort to understand and employ for meaningful performance prediction. Consequently, developers face the challenge of selecting adequate performance models that best fit their design goals and level of system knowledge. In this research, we present a classification that enables users to select a set of easy-to-use and reliable analytical models for quality performance prediction. These models, which target the general-purpose graphical processing unit (GPGPU)-based systems, are categorized into two primary analytical classes: subjective-analytical and objective-analytical. The subjective-analytical models predict the computation and communication components of an application by describing the system using minimum qualitative relations among the system parameters; whereas the objective-analytical models predict these components by measuring pertinent hardware events using micro-benchmarks. We categorize, enhance, and characterize the existing analytical models for GPGPU computations, network-level, and inter-connect communications to facilitate fast and reliable application performance prediction. We also explore a suitable combination of the aforementioned analytical classes, the hybrid approach, for high-quality performance prediction and report prediction accuracy up to 95 % for several tested GPGPU cluster configurations. The research aims to ultimately provide a collection of easy-to-select analytical models that promote straightforward and accurate performance prediction prior to large-scale implementation.

Keywords

Performance prediction Analytical modeling Qualitative analysis Quantitative analysis High-level abstraction Synchronous iterative algorithms GPGPU clusters Kepler K20 

References

  1. 1.
  2. 2.
    Intel Xeon Phi™ Product Family (2014). http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-detail.html. Accessed 10 Sep 2014
  3. 3.
    Burns G, Daoud R, Vaigl J (1994). LAM: an open cluster environment for MPI. In: Proceedings of supercomputing symposium, pp 379–386Google Scholar
  4. 4.
    The OpenMP\({\textregistered }\) API specification for parallel programming (2014). http://openmp.org/wp/. Accessed 10 Sep 2014
  5. 5.
    Texas Advanced Computing Center: Stampede (2014). http://www.tacc.utexas.edu/resources/hpc/#stampede
  6. 6.
    Kindratenko V, Enos J, Shi G, Showerman M, Arnold G, Stone J, Phillips J, Hwu W (2009) GPU clusters for high-performance computing. In: Proceedings of the workshop on parallel programming on accelerator clusters (PPAC 2009) held in conjunction with cluster 2009, New Orleans, LA, pp 1–8, 31 August–4 September 2009Google Scholar
  7. 7.
    Baghsorkhi SS, Delahaye M, Patel SJ, Gropp WD, Hwu WW (2010) An adaptive performance modeling tool for GPU architectures. In: Proceedings of the 15th ACM SIGPLAN symposium on principles and practice of parallel programming, vol 45(5), pp 105–114, May 2011Google Scholar
  8. 8.
    Schaa D, Kaeli D (2009) Exploring the multiple-GPU design space. In: Proceedings of the international symposium on parallel and distributed processing (IPDPS 2009), pp 1–12, 23 May–29 May 2009Google Scholar
  9. 9.
    Hong S, Kim H (2009) An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In: Proceedings of the 36th international symposium on computer architecture (ISCA 2009), vol 37(3), pp 152–163, June 2009Google Scholar
  10. 10.
    Infiniband (2014). http://www.infinibandta.org/. Accessed 10 Sep 2014
  11. 11.
    PCI-Express (2014). http://www.nvidia.com/page/pci_express.html. Accessed 10 Sep 2014
  12. 12.
    Pallipuram VK, Smith MC, Raut N, Ren X (2012) Exploring multi-level parallelism for large-scale spiking neural networks. In: Proceedings of the international conference on parallel and distributed techniques and applications (PDPTA 2012) held in conjunction with WORLDCOMP 2012, Las Vegas, NV, vol 2, pp 773–779, July 2012Google Scholar
  13. 13.
    Pallipuram VK, Raut N, Ren X, Smith MC, Naik S (2012) A multi-node GPGPU implementation of non-linear anisotropic diffusion filter. In: Proceedings of the symposium on application accelerators for high-performance computing (SAAHPC 2012), Argonne, IL, pp. 11–18, 10th July–11th July 2012Google Scholar
  14. 14.
    Kinnmark, Ingemar (1986) The shallow water wave equations: formulation, analysis and application. Lecture notes in engineering, vol 15. Springer, BerlinGoogle Scholar
  15. 15.
    Pallipuram VK, Smith MC, Raut N, Ren X (2012) A regression-based performance prediction framework for synchronous iterative algorithms on GPGPU clusters. Concurr Comput Pract Exp. doi: 10.1002/cpe.3017
  16. 16.
    Zhang Y, Owens JD (2011) A quantitative performance analysis model for GPU architectures. In: Proceedings of the 17th international symposium on high performance computer architecture (HPCA 2011), pp 382–393, 12th February–16th February 2011Google Scholar
  17. 17.
    Culler D, Karp R, Patterson D, Sahay A, Schauser KE, Santos E, Subramonian R, von Eicken T (1993) LogP: towards a realistic model of parallel computation. In: Proceedings of the 4th ACM SIGPLAN symposium on principles and practice of parallel programming, pp 1–12. doi: 10.1145/155332.155333
  18. 18.
    Alexandrov A, Ionescu MF, Schauser KE, Scheiman C (1995) LogGP: incorporating long messages into the LogP model: one step closer towards a realistic model for parallel computation. In: Proceedings of the 7th annual ACM symposium on parallel algorithms and architectures, pp 95–105. doi: 10.1145/215399.215426
  19. 19.
    Kielman T, Bal HE, Verstoep K (2000) Fast measurement of LogP parameters for message passing platforms. In: Proceedings of the 15th workshop on parallel and distributed processing (IPDPS 2000), pp 1176–1183Google Scholar
  20. 20.
    Hoefler T, Lichei A, Rehm W (2007) Low-overhead LogGP parameter assessment for modern interconnection networks. In: Proceedings of the parallel and distributed processing symposium (IPDPS 2007), pp 1–8, March 2007Google Scholar
  21. 21.
    Hodgkin AL, Huxley AF (1952) A quantitative description of membrane current and application to conduction and excitation in nerve. J Physiol 117:500–544CrossRefGoogle Scholar
  22. 22.
    Morris C, Lecar H (1981) Voltage oscillations in the barnacle giant muscle fiber. l. Biophys J 35(1):193–213CrossRefGoogle Scholar
  23. 23.
    Wilson HR (1999) Simplified dynamics of human and mammalian neocortical neurons. J Theor Biol 200(4):375–388CrossRefGoogle Scholar
  24. 24.
    Izhikevich EM (2003) Simple model to use for cortical spiking neurons. IEEE Trans Neural Netw 14(5):1569–1572CrossRefGoogle Scholar
  25. 25.
    Gupta A, Long L (2007) Character recognition using spiking neural networks. In: Proceedings of the international joint conference on neural networks (IJCNN 2007), pp 53–58, August 2007Google Scholar
  26. 26.
    Wu W, Liu H (2008) Noise removal using nonlinear diffusion filtering based on statistic-local open system. In: Proceedings of the congress on image and signal processing (CISP), vol 3, pp 372–378Google Scholar
  27. 27.
    Perona P, Malik J (1990) Scale space and edge detection using anisotropic diffusion. IEEE Trans Pattern Anal Mach Intell 2(7):629–639CrossRefGoogle Scholar
  28. 28.
    Lax D, Wendroff B (1960) Systems of conservation laws. Commun Pure Appl Math 13(2):217–237CrossRefMathSciNetMATHGoogle Scholar
  29. 29.
    Nvidia GPU Direct (2014). https://developer.nvidia.com/gpudirect. Accessed 10 Sep 2014
  30. 30.
    The Palmetto Cluster (2014). http://citi.clemson.edu/palmetto/. Accessed 10 Sep 2014
  31. 31.
    Nvidia Tesla Product Literature (2014). http://www.nvidia.com/object/tesla_product_literature.html. Accessed 10 Sep 2014
  32. 32.
    Nvidia’s Next Generation CUDA Compute Architecture: Kepler GK110-Whitepaper (2014). http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf. Accessed 10 Sep 2014
  33. 33.
    CUDA Downloads (2014). https://developer.nvidia.com/cuda-downloads. Accessed 10 Sep 2014
  34. 34.
    MPI Documents (2014). http://www.mpi-forum.org/docs/. Accessed 10 Sep 2014
  35. 35.
    Michaelis L, Menten ML (1913) Die kinetic der invertinwirkung. Biochem Z 49(333–369):1913Google Scholar
  36. 36.
    National Center for Supercomputing Applications (NCSA).https://www.ncsa.illinois.edu/. Accessed 10 Sep 2014
  37. 37.
    Danalis A, Marin G, McCurdy C, Meredith JS, Roth PC, Spafford K, Tipparaju V, Vetter JS (2010) The scalable heterogeneous computing (SHOC) benchmark suite. In: Proceedings of the 3rd workshop on general purpose computation on graphical processing units (GPGPU 2010), pp 63–74Google Scholar
  38. 38.
    Parallel thread execution ISA version 4.0 (2014). http://docs.nvidia.com/cuda/parallel-thread-execution/#abstract. Accessed 10 Sep 2014
  39. 39.
    Nvidia, CUDA Programming Guide (2014). http://docs.nvidia.com/cuda/index.html. Accessed 10 Sep 2014

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Vivek K. Pallipuram
    • 1
  • Melissa C. Smith
    • 1
  • Nilim Sarma
    • 1
  • Ranajeet Anand
    • 1
  • Edwin Weill
    • 1
  • Karan Sapra
    • 1
  1. 1.Holcombe Department of Electrical and Computer EngineeringClemson UniversityClemsonUSA

Personalised recommendations