Abstract
Heterogeneous analytical models are valuable tools that facilitate optimal application tuning via runtime prediction; however, they require several man-hours of effort to understand and employ for meaningful performance prediction. Consequently, developers face the challenge of selecting adequate performance models that best fit their design goals and level of system knowledge. In this research, we present a classification that enables users to select a set of easy-to-use and reliable analytical models for quality performance prediction. These models, which target the general-purpose graphical processing unit (GPGPU)-based systems, are categorized into two primary analytical classes: subjective-analytical and objective-analytical. The subjective-analytical models predict the computation and communication components of an application by describing the system using minimum qualitative relations among the system parameters; whereas the objective-analytical models predict these components by measuring pertinent hardware events using micro-benchmarks. We categorize, enhance, and characterize the existing analytical models for GPGPU computations, network-level, and inter-connect communications to facilitate fast and reliable application performance prediction. We also explore a suitable combination of the aforementioned analytical classes, the hybrid approach, for high-quality performance prediction and report prediction accuracy up to 95 % for several tested GPGPU cluster configurations. The research aims to ultimately provide a collection of easy-to-select analytical models that promote straightforward and accurate performance prediction prior to large-scale implementation.
Similar content being viewed by others
References
Many Integrated Core (MIC) Architecture-Advanced (2014). http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html. Accessed 10 Sep 2014
Intel Xeon Phi™ Product Family (2014). http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-detail.html. Accessed 10 Sep 2014
Burns G, Daoud R, Vaigl J (1994). LAM: an open cluster environment for MPI. In: Proceedings of supercomputing symposium, pp 379–386
The OpenMP\({\textregistered }\) API specification for parallel programming (2014). http://openmp.org/wp/. Accessed 10 Sep 2014
Texas Advanced Computing Center: Stampede (2014). http://www.tacc.utexas.edu/resources/hpc/#stampede
Kindratenko V, Enos J, Shi G, Showerman M, Arnold G, Stone J, Phillips J, Hwu W (2009) GPU clusters for high-performance computing. In: Proceedings of the workshop on parallel programming on accelerator clusters (PPAC 2009) held in conjunction with cluster 2009, New Orleans, LA, pp 1–8, 31 August–4 September 2009
Baghsorkhi SS, Delahaye M, Patel SJ, Gropp WD, Hwu WW (2010) An adaptive performance modeling tool for GPU architectures. In: Proceedings of the 15th ACM SIGPLAN symposium on principles and practice of parallel programming, vol 45(5), pp 105–114, May 2011
Schaa D, Kaeli D (2009) Exploring the multiple-GPU design space. In: Proceedings of the international symposium on parallel and distributed processing (IPDPS 2009), pp 1–12, 23 May–29 May 2009
Hong S, Kim H (2009) An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. In: Proceedings of the 36th international symposium on computer architecture (ISCA 2009), vol 37(3), pp 152–163, June 2009
Infiniband (2014). http://www.infinibandta.org/. Accessed 10 Sep 2014
PCI-Express (2014). http://www.nvidia.com/page/pci_express.html. Accessed 10 Sep 2014
Pallipuram VK, Smith MC, Raut N, Ren X (2012) Exploring multi-level parallelism for large-scale spiking neural networks. In: Proceedings of the international conference on parallel and distributed techniques and applications (PDPTA 2012) held in conjunction with WORLDCOMP 2012, Las Vegas, NV, vol 2, pp 773–779, July 2012
Pallipuram VK, Raut N, Ren X, Smith MC, Naik S (2012) A multi-node GPGPU implementation of non-linear anisotropic diffusion filter. In: Proceedings of the symposium on application accelerators for high-performance computing (SAAHPC 2012), Argonne, IL, pp. 11–18, 10th July–11th July 2012
Kinnmark, Ingemar (1986) The shallow water wave equations: formulation, analysis and application. Lecture notes in engineering, vol 15. Springer, Berlin
Pallipuram VK, Smith MC, Raut N, Ren X (2012) A regression-based performance prediction framework for synchronous iterative algorithms on GPGPU clusters. Concurr Comput Pract Exp. doi:10.1002/cpe.3017
Zhang Y, Owens JD (2011) A quantitative performance analysis model for GPU architectures. In: Proceedings of the 17th international symposium on high performance computer architecture (HPCA 2011), pp 382–393, 12th February–16th February 2011
Culler D, Karp R, Patterson D, Sahay A, Schauser KE, Santos E, Subramonian R, von Eicken T (1993) LogP: towards a realistic model of parallel computation. In: Proceedings of the 4th ACM SIGPLAN symposium on principles and practice of parallel programming, pp 1–12. doi:10.1145/155332.155333
Alexandrov A, Ionescu MF, Schauser KE, Scheiman C (1995) LogGP: incorporating long messages into the LogP model: one step closer towards a realistic model for parallel computation. In: Proceedings of the 7th annual ACM symposium on parallel algorithms and architectures, pp 95–105. doi:10.1145/215399.215426
Kielman T, Bal HE, Verstoep K (2000) Fast measurement of LogP parameters for message passing platforms. In: Proceedings of the 15th workshop on parallel and distributed processing (IPDPS 2000), pp 1176–1183
Hoefler T, Lichei A, Rehm W (2007) Low-overhead LogGP parameter assessment for modern interconnection networks. In: Proceedings of the parallel and distributed processing symposium (IPDPS 2007), pp 1–8, March 2007
Hodgkin AL, Huxley AF (1952) A quantitative description of membrane current and application to conduction and excitation in nerve. J Physiol 117:500–544
Morris C, Lecar H (1981) Voltage oscillations in the barnacle giant muscle fiber. l. Biophys J 35(1):193–213
Wilson HR (1999) Simplified dynamics of human and mammalian neocortical neurons. J Theor Biol 200(4):375–388
Izhikevich EM (2003) Simple model to use for cortical spiking neurons. IEEE Trans Neural Netw 14(5):1569–1572
Gupta A, Long L (2007) Character recognition using spiking neural networks. In: Proceedings of the international joint conference on neural networks (IJCNN 2007), pp 53–58, August 2007
Wu W, Liu H (2008) Noise removal using nonlinear diffusion filtering based on statistic-local open system. In: Proceedings of the congress on image and signal processing (CISP), vol 3, pp 372–378
Perona P, Malik J (1990) Scale space and edge detection using anisotropic diffusion. IEEE Trans Pattern Anal Mach Intell 2(7):629–639
Lax D, Wendroff B (1960) Systems of conservation laws. Commun Pure Appl Math 13(2):217–237
Nvidia GPU Direct (2014). https://developer.nvidia.com/gpudirect. Accessed 10 Sep 2014
The Palmetto Cluster (2014). http://citi.clemson.edu/palmetto/. Accessed 10 Sep 2014
Nvidia Tesla Product Literature (2014). http://www.nvidia.com/object/tesla_product_literature.html. Accessed 10 Sep 2014
Nvidia’s Next Generation CUDA Compute Architecture: Kepler GK110-Whitepaper (2014). http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf. Accessed 10 Sep 2014
CUDA Downloads (2014). https://developer.nvidia.com/cuda-downloads. Accessed 10 Sep 2014
MPI Documents (2014). http://www.mpi-forum.org/docs/. Accessed 10 Sep 2014
Michaelis L, Menten ML (1913) Die kinetic der invertinwirkung. Biochem Z 49(333–369):1913
National Center for Supercomputing Applications (NCSA).https://www.ncsa.illinois.edu/. Accessed 10 Sep 2014
Danalis A, Marin G, McCurdy C, Meredith JS, Roth PC, Spafford K, Tipparaju V, Vetter JS (2010) The scalable heterogeneous computing (SHOC) benchmark suite. In: Proceedings of the 3rd workshop on general purpose computation on graphical processing units (GPGPU 2010), pp 63–74
Parallel thread execution ISA version 4.0 (2014). http://docs.nvidia.com/cuda/parallel-thread-execution/#abstract. Accessed 10 Sep 2014
Nvidia, CUDA Programming Guide (2014). http://docs.nvidia.com/cuda/index.html. Accessed 10 Sep 2014
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pallipuram, V.K., Smith, M.C., Sarma, N. et al. Subjective versus objective: classifying analytical models for productive heterogeneous performance prediction. J Supercomput 71, 162–201 (2015). https://doi.org/10.1007/s11227-014-1292-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-014-1292-9