Abstract
Even though parallel programs, written in high-level languages, are portable across different architectures, their parallelism does not necessarily scale after migration. Predicting a multicore-application’s performance on the target platform in an early development phase can prevent developers from unpromising optimizations and thus significantly reduce development time. However, the vast diversity and heterogeneity of system-design decisions of processor types from HPC and desktop PCs to embedded MPSoCs complicate the modeling due to varying capabilities. Concurrency effects (caching, locks, or bandwidth bottlenecks) influence parallel runtime behavior as well. Complex performance prediction approaches emerged, which can be grouped into: virtual prototyping, analytical models, and statistical methods. In this work, we predict the performance of two algorithms from the field of advanced driver-assistance systems in a case study. With the following three methods, we provide a comparative overview of state-of-the-art predictions: GEM5 (virtual prototype), IBM Exabounds (analytical model), and an in-house developed statistical method. We first describe the theoretical background, describe the experimental- and model-setup, and give a detailed evaluation of the prediction. In addition, we discuss the applicability of all three methods for predicting parallel and heterogeneous systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ardalani, N., Lestourgeon, C., Sankaralingam, K., Zhu, X.: Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance. In: International Symposium on Microarchitecture. ACM (2015)
ARM: ARM Fast Models. https://developer.arm.com/tools-and-software/simulation-models/fast-models. Accessed 17 May 2019
Arndt, O.J., Becker, D., Banz, C., Blume, H.: Parallel implementation of real-time semi-global matching on embedded multi-core architectures. In: International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS). IEEE (2013)
Arndt, O.J., Lefherz, T., Blume, H.: Abstracting parallel programming and its analysis towards framework independent development. In: International Symposium on Embedded Multicore/Many-Core Systems-on-Chip (MCSoC). IEEE (2015)
Arndt, O.J., Linde, T., Blume, H.: Implementation and analysis of the histograms of oriented gradients algorithm on a heterogeneous multicore CPU/GPU architecture. In: Global Conference on Signal and Information Processing (GlobalSIP). IEEE (2015)
Arndt, O.J., Lüders, M., Blume, H.: Statistical performance prediction for multicore applications based on scalability characteristics. In: International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE (2019)
Bellard, F.: QEMU, a fast and portable dynamic translator. In: Annual Technical Conference. USENIX Association (2005)
Binkert, N., Beckmann, B., Black, G., Reinhardt, S.K., Saidi, A., Basu, E.A.: The Gem5 simulator. ACM Comput. Archit. News 39(2), 1–7 (2011)
Cadence: Cadence Virtual System Platform. https://www.cadence.com/content/dam/cadence-www/global/en_US/documents/Archive/virtual_system_platform_ds.pdf. Accessed 17 May 2019
De Pestel, S., Van den Steen, S., Akram, S., Eeckhout, L.: RPPM: rapid performance prediction of multithreaded applications on multicore hardware. IEEE Comput. Archit. Lett. 17, 183–186 (2018)
Eyerman, S., Eeckhout, L., Karkhanis, T., Smith, J.E.: A mechanistic performance model for superscalar out-of-order processors. ACM Trans. Comput. Syst. 27(2), 3:1–3:37 (2009)
Hoste, K., Eeckhout, L.: Microarchitecture-independent workload characterization. Micro 27, 63–72 (2007)
Imperas: Open Virtual Platforms. http://ovpworld.org/. Accessed 17 May 2019
Jongerius, R., Anghel, A., Dittmann, G., Mariani, G., Vermij, E., Corporaal, H.: Analytic multi-core processor model for fast design-space exploration. IEEE Trans. Comput. 67, 755–770 (2018)
Jongerius, R., Mariani, G., Anghel, A., Dittmann, G., Vermij, E., Corporaal, H.: Analytic processor model for fast design-space exploration. In: International Conference on Computer Design (ICCD). IEEE (2015)
Menard, C., Castrillón, J., Jung, M., Wehn, N.: System simulation with gem5 and SystemC: the keystone for full interoperability. In: International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS). IEEE (2017)
Meng, J., Morozov, V.A., Kumaran, K., Vishwanath, V., Uram, T.D.: GROPHECY: GPU performance projection from CPU code skeletons. In: International Conference on High Performance Computing, Networking, Storage and Analysis. ACM (2011)
Power, J., Hestness, J., Orr, M.S., Hill, M.D., Wood, D.A.: gem5-gpu: a heterogeneous CPU-GPU simulator. IEEE Comput. Archit. Lett. 14(1), 34–36 (2015)
Van den Steen, S., De Pestel, S., Mechri, M., Eyerman, S., Carlson, T., Black-Schaffer, D., et al.: Micro-architecture independent analytical processor performance and power modeling. In: International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Lüders, M., Arndt, O.J., Blume, H. (2020). Multicore Performance Prediction – Comparing Three Recent Approaches in a Case Study. In: Schwardmann, U., et al. Euro-Par 2019: Parallel Processing Workshops. Euro-Par 2019. Lecture Notes in Computer Science(), vol 11997. Springer, Cham. https://doi.org/10.1007/978-3-030-48340-1_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-48340-1_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-48339-5
Online ISBN: 978-3-030-48340-1
eBook Packages: Computer ScienceComputer Science (R0)