Skip to main content
Log in

Subjective versus objective: classifying analytical models for productive heterogeneous performance prediction

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Heterogeneous analytical models are valuable tools that facilitate optimal application tuning via runtime prediction; however, they require several man-hours of effort to understand and employ for meaningful performance prediction. Consequently, developers face the challenge of selecting adequate performance models that best fit their design goals and level of system knowledge. In this research, we present a classification that enables users to select a set of easy-to-use and reliable analytical models for quality performance prediction. These models, which target the general-purpose graphical processing unit (GPGPU)-based systems, are categorized into two primary analytical classes: subjective-analytical and objective-analytical. The subjective-analytical models predict the computation and communication components of an application by describing the system using minimum qualitative relations among the system parameters; whereas the objective-analytical models predict these components by measuring pertinent hardware events using micro-benchmarks. We categorize, enhance, and characterize the existing analytical models for GPGPU computations, network-level, and inter-connect communications to facilitate fast and reliable application performance prediction. We also explore a suitable combination of the aforementioned analytical classes, the hybrid approach, for high-quality performance prediction and report prediction accuracy up to 95 % for several tested GPGPU cluster configurations. The research aims to ultimately provide a collection of easy-to-select analytical models that promote straightforward and accurate performance prediction prior to large-scale implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Many Integrated Core (MIC) Architecture-Advanced (2014). http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html. Accessed 10 Sep 2014

  2. Intel Xeon Phi™ Product Family (2014). http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-detail.html. Accessed 10 Sep 2014

  3. Burns G, Daoud R, Vaigl J (1994). LAM: an open cluster environment for MPI. In: Proceedings of supercomputing symposium, pp 379–386

  4. The OpenMP\({\textregistered }\) API specification for parallel programming (2014). http://openmp.org/wp/. Accessed 10 Sep 2014

  5. Texas Advanced Computing Center: Stampede (2014). http://www.tacc.utexas.edu/resources/hpc/#stampede

  6. Kindratenko V, Enos J, Shi G, Showerman M, Arnold G, Stone J, Phillips J, Hwu W (2009) GPU clusters for high-performance computing. In: Proceedings of the workshop on parallel programming on accelerator clusters (PPAC 2009) held in conjunction with cluster 2009, New Orleans, LA, pp 1–8, 31 August–4 September 2009

  7. Baghsorkhi SS, Delahaye M, Patel SJ, Gropp WD, Hwu WW (2010) An adaptive performance modeling tool for GPU architectures. In: Proceedings of the 15th ACM SIGPLAN symposium on principles and practice of parallel programming, vol 45(5), pp 105–114, May 2011

  8. Schaa D, Kaeli D (2009) Exploring the multiple-GPU design space. In: Proceedings of the international symposium on parallel and distributed processing (IPDPS 2009), pp 1–12, 23 May–29 May 2009

  9. Hong S, Kim H (2009) An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In: Proceedings of the 36th international symposium on computer architecture (ISCA 2009), vol 37(3), pp 152–163, June 2009

  10. Infiniband (2014). http://www.infinibandta.org/. Accessed 10 Sep 2014

  11. PCI-Express (2014). http://www.nvidia.com/page/pci_express.html. Accessed 10 Sep 2014

  12. Pallipuram VK, Smith MC, Raut N, Ren X (2012) Exploring multi-level parallelism for large-scale spiking neural networks. In: Proceedings of the international conference on parallel and distributed techniques and applications (PDPTA 2012) held in conjunction with WORLDCOMP 2012, Las Vegas, NV, vol 2, pp 773–779, July 2012

  13. Pallipuram VK, Raut N, Ren X, Smith MC, Naik S (2012) A multi-node GPGPU implementation of non-linear anisotropic diffusion filter. In: Proceedings of the symposium on application accelerators for high-performance computing (SAAHPC 2012), Argonne, IL, pp. 11–18, 10th July–11th July 2012

  14. Kinnmark, Ingemar (1986) The shallow water wave equations: formulation, analysis and application. Lecture notes in engineering, vol 15. Springer, Berlin

  15. Pallipuram VK, Smith MC, Raut N, Ren X (2012) A regression-based performance prediction framework for synchronous iterative algorithms on GPGPU clusters. Concurr Comput Pract Exp. doi:10.1002/cpe.3017

  16. Zhang Y, Owens JD (2011) A quantitative performance analysis model for GPU architectures. In: Proceedings of the 17th international symposium on high performance computer architecture (HPCA 2011), pp 382–393, 12th February–16th February 2011

  17. Culler D, Karp R, Patterson D, Sahay A, Schauser KE, Santos E, Subramonian R, von Eicken T (1993) LogP: towards a realistic model of parallel computation. In: Proceedings of the 4th ACM SIGPLAN symposium on principles and practice of parallel programming, pp 1–12. doi:10.1145/155332.155333

  18. Alexandrov A, Ionescu MF, Schauser KE, Scheiman C (1995) LogGP: incorporating long messages into the LogP model: one step closer towards a realistic model for parallel computation. In: Proceedings of the 7th annual ACM symposium on parallel algorithms and architectures, pp 95–105. doi:10.1145/215399.215426

  19. Kielman T, Bal HE, Verstoep K (2000) Fast measurement of LogP parameters for message passing platforms. In: Proceedings of the 15th workshop on parallel and distributed processing (IPDPS 2000), pp 1176–1183

  20. Hoefler T, Lichei A, Rehm W (2007) Low-overhead LogGP parameter assessment for modern interconnection networks. In: Proceedings of the parallel and distributed processing symposium (IPDPS 2007), pp 1–8, March 2007

  21. Hodgkin AL, Huxley AF (1952) A quantitative description of membrane current and application to conduction and excitation in nerve. J Physiol 117:500–544

    Article  Google Scholar 

  22. Morris C, Lecar H (1981) Voltage oscillations in the barnacle giant muscle fiber. l. Biophys J 35(1):193–213

    Article  Google Scholar 

  23. Wilson HR (1999) Simplified dynamics of human and mammalian neocortical neurons. J Theor Biol 200(4):375–388

    Article  Google Scholar 

  24. Izhikevich EM (2003) Simple model to use for cortical spiking neurons. IEEE Trans Neural Netw 14(5):1569–1572

    Article  Google Scholar 

  25. Gupta A, Long L (2007) Character recognition using spiking neural networks. In: Proceedings of the international joint conference on neural networks (IJCNN 2007), pp 53–58, August 2007

  26. Wu W, Liu H (2008) Noise removal using nonlinear diffusion filtering based on statistic-local open system. In: Proceedings of the congress on image and signal processing (CISP), vol 3, pp 372–378

  27. Perona P, Malik J (1990) Scale space and edge detection using anisotropic diffusion. IEEE Trans Pattern Anal Mach Intell 2(7):629–639

    Article  Google Scholar 

  28. Lax D, Wendroff B (1960) Systems of conservation laws. Commun Pure Appl Math 13(2):217–237

    Article  MathSciNet  MATH  Google Scholar 

  29. Nvidia GPU Direct (2014). https://developer.nvidia.com/gpudirect. Accessed 10 Sep 2014

  30. The Palmetto Cluster (2014). http://citi.clemson.edu/palmetto/. Accessed 10 Sep 2014

  31. Nvidia Tesla Product Literature (2014). http://www.nvidia.com/object/tesla_product_literature.html. Accessed 10 Sep 2014

  32. Nvidia’s Next Generation CUDA Compute Architecture: Kepler GK110-Whitepaper (2014). http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf. Accessed 10 Sep 2014

  33. CUDA Downloads (2014). https://developer.nvidia.com/cuda-downloads. Accessed 10 Sep 2014

  34. MPI Documents (2014). http://www.mpi-forum.org/docs/. Accessed 10 Sep 2014

  35. Michaelis L, Menten ML (1913) Die kinetic der invertinwirkung. Biochem Z 49(333–369):1913

    Google Scholar 

  36. National Center for Supercomputing Applications (NCSA).https://www.ncsa.illinois.edu/. Accessed 10 Sep 2014

  37. Danalis A, Marin G, McCurdy C, Meredith JS, Roth PC, Spafford K, Tipparaju V, Vetter JS (2010) The scalable heterogeneous computing (SHOC) benchmark suite. In: Proceedings of the 3rd workshop on general purpose computation on graphical processing units (GPGPU 2010), pp 63–74

  38. Parallel thread execution ISA version 4.0 (2014). http://docs.nvidia.com/cuda/parallel-thread-execution/#abstract. Accessed 10 Sep 2014

  39. Nvidia, CUDA Programming Guide (2014). http://docs.nvidia.com/cuda/index.html. Accessed 10 Sep 2014

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vivek K. Pallipuram.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pallipuram, V.K., Smith, M.C., Sarma, N. et al. Subjective versus objective: classifying analytical models for productive heterogeneous performance prediction. J Supercomput 71, 162–201 (2015). https://doi.org/10.1007/s11227-014-1292-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-014-1292-9

Keywords

Navigation