Abstract
This chapter presents a comprehensive study of the techniques used to improve the performance, energy, or EDP of parallel applications. They are discussed considering the following:
-
Adaptability: when the adaptation of the number of threads and processor operating frequency happens and whether it is continuous or not.
-
Transparency: when the application tuning involves the need for special tools or compilers, programmer influence, and/or changes in the source or binary codes.
Therefore, in Sect. 4.1, we first discuss the design space exploration related to the way how the approaches that optimize parallel applications can achieve adaptability and transparency. In Sect. 4.2, we describe the works that aim to improve the execution of parallel applications by tuning the number of threads. Then, Sect. 4.3 presents the approaches that change the levels of voltage and frequency of the processor in order to deliver a better behavior of parallel applications. Finally, Sect. 4.4 discusses the approaches that exploit both DCT and DVFS for improving parallel applications execution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adya, A., Howell, J., Theimer, M., Bolosky, W.J., Douceur, J.R.: Cooperative task management without manual stack management. In: Annual Conference on USENIX, pp. 289–302. USENIX Association, Berkeley (2002)
Akram, S., Sartor, J.B., Eeckhout, L.: DVFS performance prediction for managed multithreaded applications. In: ISPASS, pp. 12–23. IEEE, Piscataway (2016). https://doi.org/10.1109/ISPASS.2016.7482070
Alessi, F., Thoman, P., Georgakoudis, G., Fahringer, T., Nikolopoulos, D.S.: Application-level energy awareness for openmp. In: International Workshop on OpenMP, pp. 219–232. Springer, Berlin (2015)
Barnes, B.J., Rountree, B., Lowenthal, D.K., Reeves, J., de Supinski, B., Schulz, M.: A regression-based approach to scalability prediction. In: Proceedings of the 22Nd Annual International Conference on Supercomputing, ICS ’08, pp. 368–377. ACM, New York (2008). https://doi.org/10.1145/1375527.1375580
Basmadjian, R., de Meer, H.: Evaluating and modeling power consumption of multi-core processors. In: 2012 Third International Conference on Future Systems: Where Energy, Computing and Communication Meet (e-Energy), pp. 1–10. IEEE, Piscataway (2012). https://doi.org/10.1145/2208828.2208840
Benedict, S., Rejitha, R.S., Gschwandtner, P., Prodan, R., Fahringer, T.: Energy prediction of openmp applications using random forest modeling approach. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, pp. 1251–1260. IEEE, Piscataway (2015). https://doi.org/10.1109/IPDPSW.2015.12
Bhattacharjee, A., Martonosi, M.: Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors. SIGARCH Comput. Archit. News 37(3), 290–301 (2009). https://doi.org/10.1145/1555815.1555792
Cabrera, A., Almeida, F., Blanco, V., Giménez, D.: Analytical modeling of the energy consumption for the high performance linpack. In: 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 343–350. IEEE, Piscataway (2013). https://doi.org/10.1109/PDP.2013.56
Chadha, G., Mahlke, S., Narayanasamy, S.: When less is more (limo): controlled parallelism forimproved efficiency. In: Proceedings of the 2012 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, pp. 141–150. ACM, New York (2012)
Chen, Y.L., Chang, M.F., Liang, W.Y., Lee, C.H.: Performance and energy efficient dynamic voltage and frequency scaling scheme for multicore embedded system. In: IEEE ICCE, pp. 58–59. IEEE, Piscataway (2016). https://doi.org/10.1109/ICCE.2016.7430521
Chou, C.Y., Chang, H.Y., Wang, S.T., Huang, K.C., Shen, C.Y.: An improved model for predicting hpl performance. In: Cérin, C., Li, K.C. (eds.) Advances in Grid and Pervasive Computing, pp. 158–168. Springer, Berlin (2007)
Cochran, R., Hankendi, C., Coskun, A.K., Reda, S.: Pack & cap: adaptive DVFS and thread packing under power caps. In: IEEE/ACM MICRO, pp. 175–185 (2011). https://doi.org/10.1145/2155620.2155641
Curtis-Maury, M., Dzierwa, J., Antonopoulos, C.D., Nikolopoulos, D.S.: Online power-performance adaptation of multithreaded programs using hardware event-based prediction. In: Proceedings of the 20th Annual International Conference on Supercomputing, pp. 157–166. ACM, New York (2006)
Curtis-Maury, M., Shah, A., Blagojevic, F., Nikolopoulos, D.S., De Supinski, B.R., Schulz, M.: Prediction models for multi-dimensional power-performance optimization on many cores. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 250–259. ACM, New York (2008)
Dimakopoulos, V.V., Leontiadis, E., Tzoumas, G.: A portable c compiler for openmp v. 2.0. In: Proceedings of the of the 5th European Workshop on OpenMP (EWOMP03) (2003)
Ding, Y., Kandemir, M., Raghavan, P., Irwin, M.J.: A helper thread based edp reduction scheme for adapting application execution in CMPS. In: 2008 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–14. IEEE, Piscataway (2008). https://doi.org/10.1109/IPDPS.2008.4536297
dos Santos Marques, W., de Souza, P.S.S., Lorenzon, A.F., Beck, A.C.S., Beck Rutzig, M., Diniz Rossi, F.: Improving EDP in multi-core embedded systems through multidimensional frequency scaling. In: 2017 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–4. IEEE, Piscataway (2017). https://doi.org/10.1109/ISCAS.2017.8050515
Ge, R., Feng, X., Feng, W., Cameron, K.W.: CPU MISER: a performance-directed, run-time system for power-aware clusters. In: ICPP, pp. 18–18 (2007). https://doi.org/10.1109/ICPP.2007.29
Hankendi, C., Coskun, A.K.: Adaptive power and resource management techniques for multi-threaded workloads. In: 2013 IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum, pp. 2302–2305. IEEE, Picataway (2013). https://doi.org/10.1109/IPDPSW.2013.258
Hotta, Y., Sato, M., Kimura, H., Matsuoka, S., Boku, T., Takahashi, D.: Profile-based optimization of power performance by using dynamic voltage scaling on a pc cluster. In: IEEE IPDPS (2006). https://doi.org/10.1109/IPDPS.2006.1639597
Hsu, C.H., Feng, W.C.: A power-aware run-time system for high-performance computing. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC ’05, pp. 1–1 (2005). https://doi.org/10.1109/SC.2005.3
Hwang, Y., Chung, K.: Dynamic power management technique for multicore based embedded mobile devices. IEEE Trans. Ind. Inf. 9(3), 1601–1612 (2013). https://doi.org/10.1109/TII.2012.2232299
Ipek, E., de Supinski, B.R., Schulz, M., McKee, S.A.: An approach to performance prediction for parallel applications. In: Proceedings of the 11th International Euro-Par Conference on Parallel Processing, Euro-Par’05, pp. 196–205. Springer, Berlin (2005)
Jayakumar, A., Murali, P., Vadhiyar, S.: Matching application signatures for performance predictions using a single execution. In: 2015 IEEE International Parallel and Distributed Processing Symposium, pp. 1161–1170. IEEE, Picataway (2015). https://doi.org/10.1109/IPDPS.2015.20
Jordan, H., Thoman, P., Durillo, J.J., Pellegrini, S., Gschwandtner, P., Fahringer, T., Moritsch, H.: A multi-objective auto-tuning framework for parallel codes. In: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12. IEEE, Picataway (2012)
Ju, T., Wu, W., Chen, H., Zhu, Z., Dong, X.: Thread count prediction model: Dynamically adjusting threads for heterogeneous many-core systems. In: 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS), pp. 456–464. IEEE, Picataway (2015). https://doi.org/10.1109/ICPADS.2015.64
Jung, C., Lim, D., Lee, J., Han, S.: Adaptive execution techniques for SMT multiprocessor architectures. In: Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 236–246. ACM, New York (2005)
Lee, J., Wu, H., Ravichandran, M., Clark, N.: Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications. ACM SIGARCH Comput. Archit. News 38(3), 270–279 (2010)
Li, D., de Supinski, B.R., Schulz, M., Cameron, K., Nikolopoulos, D.S.: Hybrid MPI/openMP power-aware computing. In: IEEE IPDPS, pp. 1–12 (2010). https://doi.org/10.1109/IPDPS.2010.5470463
Li, D., de Supinski, B.R., Schulz, M., Nikolopoulos, D.S., Cameron, K.W.: Strategies for energy-efficient resource management of hybrid programming models. IEEE Trans. Parallel Distrib. Syst. 24(1), 44–157 (2013). https://doi.org/10.1109/TPDS.2012.95
Li, J., Martinez, J.F.: Dynamic power-performance adaptation of parallel computation on chip multiprocessors. In: The Twelfth International Symposium on High-Performance Computer Architecture, 2006, pp. 77–87 (2006). https://doi.org/10.1109/HPCA.2006.1598114
Lorenzon, A.F., Souza, J.D., Beck, A.C.S.: Laant: A library to automatically optimize edp for openMP applications. In: DATE, pp. 1229–1232 (2017). https://doi.org/10.23919/DATE.2017.7927176
Marathe, A., Bailey, P.E., Lowenthal, D.K., Rountree, B., Schulz, M., de Supinski, B.R.: A run-time system for power-constrained hpc applications. In: Kunkel, J.M., Ludwig, T. (eds.) High Performance Computing, pp. 394–408. Springer, Cham (2015)
Miftakhutdinov, R., Ebrahimi, E., Patt, Y.N.: Predicting performance impact of dvfs for realistic memory systems. In: 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 155–165 (2012). https://doi.org/10.1109/MICRO.2012.23
Miftakhutdinov, R.R.: Performance prediction for dynamic voltage and frequency scaling. Ph.D. thesis, The University of Texas (2014)
Palermo, G., Silvano, C., Zaccaria, V.: An efficient design space exploration methodology for on-chip multiprocessors subject to application-specific constraints. In: 2008 Symposium on Application Specific Processors, pp. 75–82 (2008). https://doi.org/10.1109/SASP.2008.4570789
Porterfield, A., Fowler, R., Neyer, M.: Maestro: Dynamic runtime power and concurrency adaptation. In: Proceedings Workshop Managed Many-Core System, pp. 1–8
Porterfield, A.K., Olivier, S.L., Bhalachandra, S., Prins, J.F.: Power measurement and concurrency throttling for energy reduction in openMP programs. In: Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International, pp. 884–891. IEEE, Piscataway (2013)
Pusukuri, K.K., Gupta, R., Bhuyan, L.N.: Thread reinforcer: Dynamically determining number of threads via os level monitoring. In: 2011 IEEE International Symposium on Workload Characterization (IISWC), pp. 116–125. IEEE, Piscataway (2011)
Quinlan, D., Liao, C.: The rose source-to-source compiler infrastructure. In: Cetus Users and Compiler Infrastructure Workshop, in conjunction with PACT 2011 (2011)
Raman, A., Zaks, A., Lee, J.W., August, D.I.: Parcae: A system for flexible parallel execution. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, pp. 133–144. ACM, New York (2012)
Rizvandi, N.B., Taheri, J., Zomaya, A.Y., Lee, Y.C.: Linear combinations of DVFS-enabled processor frequencies to modify the energy-aware scheduling algorithms. In: CCGRID, pp. 388–397 (2010). https://doi.org/10.1109/CCGRID.2010.38
Rossi, F.D., Storch, M., de Oliveira, I., Rose, C.A.F.D.: Modeling power consumption for dvfs policies. In: 2015 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1879–1882. IEEE, Piscataway (2015). https://doi.org/10.1109/ISCAS.2015.7169024
Rountree, B., Lowenthal, D.K., Schulz, M., de Supinski, B.R.: Practical performance prediction under dynamic voltage frequency scaling. In: 2011 International Green Computing Conference and Workshops, pp. 1–8 (2011). https://doi.org/10.1109/IGCC.2011.6008553
Sensi, D.D.: Predicting performance and power consumption of parallel applications. In: 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), pp. 200–207 (2016). https://doi.org/10.1109/PDP.2016.41
Sensi, D.D., Torquati, M., Danelutto, M.: A reconfiguration algorithm for power-aware parallel applications. TACO 13(4), 43:1–43:25 (2016). https://doi.org/10.1145/3004054
Shafik, R.A., Das, A., Yang, S., Merrett, G., Al-Hashimi, B.M.: Adaptive energy minimization of openMP parallel applications on many-core systems. In: Proceedings of the 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures, pp. 19–24. ACM, New York (2015)
Shafik, R.A., Das, A.K., Yang, S., Merrett, G.V., Al-Hashimi, B.: Thermal-aware adaptive energy minimization of open MP parallel applications (2015)
Sharkawi, S., DeSota, D., Panda, R., Indukuru, R., Stevens, S., Taylor, V., Wu, X.: Performance projection of HPC applications using spec cfp2006 benchmarks. In: 2009 IEEE International Symposium on Parallel Distributed Processing, pp. 1–12. IEEE, Piscataway (2009). https://doi.org/10.1109/IPDPS.2009.5161057
Singh, K., İpek, E., McKee, S.A., de Supinski, B.R., Schulz, M., Caruana, R.: Predicting parallel application performance via machine learning approaches: Research articles. Concurr. Comput. Pract. Exper. 19(17), 2219–2235 (2007). https://doi.org/10.1002/cpe.v19:17
Snowdon, D.C., Petters, S.M., Heiser, G.: Accurate on-line prediction of processor and memoryenergy usage under voltage scaling. In: Proceedings of the 7th ACM &Amp; IEEE International Conference on Embedded Software, EMSOFT ’07, pp. 84–93. ACM, New York (2007). https://doi.org/10.1145/1289927.1289945
Snowdon, D.C., Van Der Linden, G., Petters, S.M., Heiser, G.: Accurate run-time prediction of performance degradation under frequency scaling. In: Workshop on Operating Systems Platforms for Embedded Real-Time applications, p. 58 (2007)
Sodhi, S., Subhlok, J., Xu, Q.: Performance prediction with skeletons. Clust. Comput. 11(2), 151–165 (2008). https://doi.org/10.1007/s10586-007-0039-2
Song, S.L., Barker, K., Kerbyson, D.: Unified performance and power modeling of scientific workloads. In: Proceedings of the 1st International Workshop on Energy Efficient Supercomputing, E2SC ’13, pp. 4:1–4:8. ACM, New York (2013). https://doi.org/10.1145/2536430.2536435
Sridharan, S., Gupta, G., Sohi, G.S.: Holistic run-time parallelism management for time and energy efficiency. In: Proceedings of the 27th international ACM conference on International conference on supercomputing, pp. 337–348. ACM, New York (2013)
Sridharan, S., Gupta, G., Sohi, G.S.: Adaptive, efficient, parallel execution of parallel programs. ACM SIGPLAN Notices 49(6), 169–180 (2014)
Suleman, M.A., Qureshi, M.K., Patt, Y.N.: Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPS. SIGARCH Comput. Archit. News 36(1), 277–286 (2008). https://doi.org/10.1145/1353534.1346317
Taylor, V., Xu, X., Geisler, J., Li, X., Lan, Z., Hereld, M., Judson, I.R., Stevens, R.: Prophesy: automating the modeling process. In: Proceedings Third Annual International Workshop on Active Middleware Services, pp. 3–11 (2001). https://doi.org/10.1109/AMS.2001.993715
Taylor, V., Wu, X., Geisler, J., Stevens, R.: Using kernel couplings to predict parallel application performance. In: Proceedings 11th IEEE International Symposium on High Performance Distributed Computing, pp. 125–134 (2002). https://doi.org/10.1109/HPDC.2002.1029910
Tiwari, A., Laurenzano, M.A., Carrington, L., Snavely, A.: Modeling power and energy usage of hpc kernels. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops PhD Forum, pp. 990–998 (2012). https://doi.org/10.1109/IPDPSW.2012.121
Wheeler, K.B., Murphy, R.C., Thain, D.: Qthreads: An api for programming with millions of lightweight threads. In: IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8 (2008). https://doi.org/10.1109/IPDPS.2008.4536359
Witkowski, M., Oleksiak, A., Piontek, T., Weglarz, J.: Practical power consumption estimation for real life HPC applications. Futur. Gener. Comput. Syst. 29(1), 208–217 (2013). https://doi.org/10.1016/j.future.2012.06.003
Wu, Q., Martonosi, M., Clark, D.W., Reddi, V.J., Connors, D., Wu, Y., Lee, J., Brooks, D.: Dynamic-compiler-driven control for microprocessor energy and performance. IEEE Micro 26(1), 119–129 (2006). https://doi.org/10.1109/MM.2006.9
Yang, L.T., Ma, X., Mueller, F.: Cross-platform performance prediction of parallel applications using partial execution. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC ’05, p. 40. IEEE Computer Society, Washington (2005). https://doi.org/10.1109/SC.2005.20
Zhang, W., Cheng, A.M.K., Subhlok, J.: Dwarfcode: A performance prediction tool for parallel applications. IEEE Trans. Comput. 65(2), 495–507 (2016). https://doi.org/10.1109/TC.2015.2417526
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2019 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Lorenzon, A.F., Schneider Beck Filho, A.C. (2019). Tuning Parallel Applications. In: Parallel Computing Hits the Power Wall. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-030-28719-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-28719-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28718-4
Online ISBN: 978-3-030-28719-1
eBook Packages: Computer ScienceComputer Science (R0)