Tuning Parallel Applications

Part of the SpringerBriefs in Computer Science book series (BRIEFSCOMPUTER)


This chapter presents a comprehensive study of the techniques used to improve the performance, energy, or EDP of parallel applications. They are discussed considering the following:
  • Adaptability: when the adaptation of the number of threads and processor operating frequency happens and whether it is continuous or not.

  • Transparency: when the application tuning involves the need for special tools or compilers, programmer influence, and/or changes in the source or binary codes.

Therefore, in Sect. 4.1, we first discuss the design space exploration related to the way how the approaches that optimize parallel applications can achieve adaptability and transparency. In Sect. 4.2, we describe the works that aim to improve the execution of parallel applications by tuning the number of threads. Then, Sect. 4.3 presents the approaches that change the levels of voltage and frequency of the processor in order to deliver a better behavior of parallel applications. Finally, Sect. 4.4 discusses the approaches that exploit both DCT and DVFS for improving parallel applications execution.


  1. 1.
    Adya, A., Howell, J., Theimer, M., Bolosky, W.J., Douceur, J.R.: Cooperative task management without manual stack management. In: Annual Conference on USENIX, pp. 289–302. USENIX Association, Berkeley (2002)Google Scholar
  2. 2.
    Akram, S., Sartor, J.B., Eeckhout, L.: DVFS performance prediction for managed multithreaded applications. In: ISPASS, pp. 12–23. IEEE, Piscataway (2016).
  3. 3.
    Alessi, F., Thoman, P., Georgakoudis, G., Fahringer, T., Nikolopoulos, D.S.: Application-level energy awareness for openmp. In: International Workshop on OpenMP, pp. 219–232. Springer, Berlin (2015)CrossRefGoogle Scholar
  4. 5.
    Barnes, B.J., Rountree, B., Lowenthal, D.K., Reeves, J., de Supinski, B., Schulz, M.: A regression-based approach to scalability prediction. In: Proceedings of the 22Nd Annual International Conference on Supercomputing, ICS ’08, pp. 368–377. ACM, New York (2008).
  5. 6.
    Basmadjian, R., de Meer, H.: Evaluating and modeling power consumption of multi-core processors. In: 2012 Third International Conference on Future Systems: Where Energy, Computing and Communication Meet (e-Energy), pp. 1–10. IEEE, Piscataway (2012).
  6. 8.
    Benedict, S., Rejitha, R.S., Gschwandtner, P., Prodan, R., Fahringer, T.: Energy prediction of openmp applications using random forest modeling approach. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, pp. 1251–1260. IEEE, Piscataway (2015).
  7. 11.
    Bhattacharjee, A., Martonosi, M.: Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors. SIGARCH Comput. Archit. News 37(3), 290–301 (2009). CrossRefGoogle Scholar
  8. 18.
    Cabrera, A., Almeida, F., Blanco, V., Giménez, D.: Analytical modeling of the energy consumption for the high performance linpack. In: 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 343–350. IEEE, Piscataway (2013).
  9. 20.
    Chadha, G., Mahlke, S., Narayanasamy, S.: When less is more (limo): controlled parallelism forimproved efficiency. In: Proceedings of the 2012 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, pp. 141–150. ACM, New York (2012)Google Scholar
  10. 24.
    Chen, Y.L., Chang, M.F., Liang, W.Y., Lee, C.H.: Performance and energy efficient dynamic voltage and frequency scaling scheme for multicore embedded system. In: IEEE ICCE, pp. 58–59. IEEE, Piscataway (2016).
  11. 25.
    Chou, C.Y., Chang, H.Y., Wang, S.T., Huang, K.C., Shen, C.Y.: An improved model for predicting hpl performance. In: Cérin, C., Li, K.C. (eds.) Advances in Grid and Pervasive Computing, pp. 158–168. Springer, Berlin (2007)CrossRefGoogle Scholar
  12. 27.
    Cochran, R., Hankendi, C., Coskun, A.K., Reda, S.: Pack & cap: adaptive DVFS and thread packing under power caps. In: IEEE/ACM MICRO, pp. 175–185 (2011).
  13. 28.
    Curtis-Maury, M., Dzierwa, J., Antonopoulos, C.D., Nikolopoulos, D.S.: Online power-performance adaptation of multithreaded programs using hardware event-based prediction. In: Proceedings of the 20th Annual International Conference on Supercomputing, pp. 157–166. ACM, New York (2006)Google Scholar
  14. 29.
    Curtis-Maury, M., Shah, A., Blagojevic, F., Nikolopoulos, D.S., De Supinski, B.R., Schulz, M.: Prediction models for multi-dimensional power-performance optimization on many cores. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 250–259. ACM, New York (2008)Google Scholar
  15. 30.
    Dimakopoulos, V.V., Leontiadis, E., Tzoumas, G.: A portable c compiler for openmp v. 2.0. In: Proceedings of the of the 5th European Workshop on OpenMP (EWOMP03) (2003)Google Scholar
  16. 31.
    Ding, Y., Kandemir, M., Raghavan, P., Irwin, M.J.: A helper thread based edp reduction scheme for adapting application execution in CMPS. In: 2008 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–14. IEEE, Piscataway (2008).
  17. 33.
    dos Santos Marques, W., de Souza, P.S.S., Lorenzon, A.F., Beck, A.C.S., Beck Rutzig, M., Diniz Rossi, F.: Improving EDP in multi-core embedded systems through multidimensional frequency scaling. In: 2017 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–4. IEEE, Piscataway (2017).
  18. 37.
    Ge, R., Feng, X., Feng, W., Cameron, K.W.: CPU MISER: a performance-directed, run-time system for power-aware clusters. In: ICPP, pp. 18–18 (2007).
  19. 42.
    Hankendi, C., Coskun, A.K.: Adaptive power and resource management techniques for multi-threaded workloads. In: 2013 IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum, pp. 2302–2305. IEEE, Picataway (2013).
  20. 45.
    Hotta, Y., Sato, M., Kimura, H., Matsuoka, S., Boku, T., Takahashi, D.: Profile-based optimization of power performance by using dynamic voltage scaling on a pc cluster. In: IEEE IPDPS (2006).
  21. 46.
    Hsu, C.H., Feng, W.C.: A power-aware run-time system for high-performance computing. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC ’05, pp. 1–1 (2005).
  22. 48.
    Hwang, Y., Chung, K.: Dynamic power management technique for multicore based embedded mobile devices. IEEE Trans. Ind. Inf. 9(3), 1601–1612 (2013). CrossRefGoogle Scholar
  23. 49.
    Ipek, E., de Supinski, B.R., Schulz, M., McKee, S.A.: An approach to performance prediction for parallel applications. In: Proceedings of the 11th International Euro-Par Conference on Parallel Processing, Euro-Par’05, pp. 196–205. Springer, Berlin (2005)CrossRefGoogle Scholar
  24. 50.
    Jayakumar, A., Murali, P., Vadhiyar, S.: Matching application signatures for performance predictions using a single execution. In: 2015 IEEE International Parallel and Distributed Processing Symposium, pp. 1161–1170. IEEE, Picataway (2015).
  25. 53.
    Jordan, H., Thoman, P., Durillo, J.J., Pellegrini, S., Gschwandtner, P., Fahringer, T., Moritsch, H.: A multi-objective auto-tuning framework for parallel codes. In: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12. IEEE, Picataway (2012)Google Scholar
  26. 54.
    Ju, T., Wu, W., Chen, H., Zhu, Z., Dong, X.: Thread count prediction model: Dynamically adjusting threads for heterogeneous many-core systems. In: 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS), pp. 456–464. IEEE, Picataway (2015).
  27. 55.
    Jung, C., Lim, D., Lee, J., Han, S.: Adaptive execution techniques for SMT multiprocessor architectures. In: Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 236–246. ACM, New York (2005)Google Scholar
  28. 63.
    Lee, J., Wu, H., Ravichandran, M., Clark, N.: Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications. ACM SIGARCH Comput. Archit. News 38(3), 270–279 (2010)CrossRefGoogle Scholar
  29. 65.
    Li, D., de Supinski, B.R., Schulz, M., Cameron, K., Nikolopoulos, D.S.: Hybrid MPI/openMP power-aware computing. In: IEEE IPDPS, pp. 1–12 (2010).
  30. 66.
    Li, D., de Supinski, B.R., Schulz, M., Nikolopoulos, D.S., Cameron, K.W.: Strategies for energy-efficient resource management of hybrid programming models. IEEE Trans. Parallel Distrib. Syst. 24(1), 44–157 (2013). CrossRefGoogle Scholar
  31. 67.
    Li, J., Martinez, J.F.: Dynamic power-performance adaptation of parallel computation on chip multiprocessors. In: The Twelfth International Symposium on High-Performance Computer Architecture, 2006, pp. 77–87 (2006). Google Scholar
  32. 72.
    Lorenzon, A.F., Souza, J.D., Beck, A.C.S.: Laant: A library to automatically optimize edp for openMP applications. In: DATE, pp. 1229–1232 (2017).
  33. 77.
    Marathe, A., Bailey, P.E., Lowenthal, D.K., Rountree, B., Schulz, M., de Supinski, B.R.: A run-time system for power-constrained hpc applications. In: Kunkel, J.M., Ludwig, T. (eds.) High Performance Computing, pp. 394–408. Springer, Cham (2015)CrossRefGoogle Scholar
  34. 81.
    Miftakhutdinov, R., Ebrahimi, E., Patt, Y.N.: Predicting performance impact of dvfs for realistic memory systems. In: 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 155–165 (2012).
  35. 82.
    Miftakhutdinov, R.R.: Performance prediction for dynamic voltage and frequency scaling. Ph.D. thesis, The University of Texas (2014)Google Scholar
  36. 87.
    Palermo, G., Silvano, C., Zaccaria, V.: An efficient design space exploration methodology for on-chip multiprocessors subject to application-specific constraints. In: 2008 Symposium on Application Specific Processors, pp. 75–82 (2008).
  37. 90.
    Porterfield, A., Fowler, R., Neyer, M.: Maestro: Dynamic runtime power and concurrency adaptation. In: Proceedings Workshop Managed Many-Core System, pp. 1–8Google Scholar
  38. 91.
    Porterfield, A.K., Olivier, S.L., Bhalachandra, S., Prins, J.F.: Power measurement and concurrency throttling for energy reduction in openMP programs. In: Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International, pp. 884–891. IEEE, Piscataway (2013)Google Scholar
  39. 92.
    Pusukuri, K.K., Gupta, R., Bhuyan, L.N.: Thread reinforcer: Dynamically determining number of threads via os level monitoring. In: 2011 IEEE International Symposium on Workload Characterization (IISWC), pp. 116–125. IEEE, Piscataway (2011)Google Scholar
  40. 93.
    Quinlan, D., Liao, C.: The rose source-to-source compiler infrastructure. In: Cetus Users and Compiler Infrastructure Workshop, in conjunction with PACT 2011 (2011)Google Scholar
  41. 96.
    Raman, A., Zaks, A., Lee, J.W., August, D.I.: Parcae: A system for flexible parallel execution. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, pp. 133–144. ACM, New York (2012)Google Scholar
  42. 98.
    Rizvandi, N.B., Taheri, J., Zomaya, A.Y., Lee, Y.C.: Linear combinations of DVFS-enabled processor frequencies to modify the energy-aware scheduling algorithms. In: CCGRID, pp. 388–397 (2010).
  43. 99.
    Rossi, F.D., Storch, M., de Oliveira, I., Rose, C.A.F.D.: Modeling power consumption for dvfs policies. In: 2015 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1879–1882. IEEE, Piscataway (2015).
  44. 100.
    Rountree, B., Lowenthal, D.K., Schulz, M., de Supinski, B.R.: Practical performance prediction under dynamic voltage frequency scaling. In: 2011 International Green Computing Conference and Workshops, pp. 1–8 (2011).
  45. 101.
    Sensi, D.D.: Predicting performance and power consumption of parallel applications. In: 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), pp. 200–207 (2016).
  46. 102.
    Sensi, D.D., Torquati, M., Danelutto, M.: A reconfiguration algorithm for power-aware parallel applications. TACO 13(4), 43:1–43:25 (2016). CrossRefGoogle Scholar
  47. 104.
    Shafik, R.A., Das, A., Yang, S., Merrett, G., Al-Hashimi, B.M.: Adaptive energy minimization of openMP parallel applications on many-core systems. In: Proceedings of the 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures, pp. 19–24. ACM, New York (2015)Google Scholar
  48. 105.
    Shafik, R.A., Das, A.K., Yang, S., Merrett, G.V., Al-Hashimi, B.: Thermal-aware adaptive energy minimization of open MP parallel applications (2015)Google Scholar
  49. 106.
    Sharkawi, S., DeSota, D., Panda, R., Indukuru, R., Stevens, S., Taylor, V., Wu, X.: Performance projection of HPC applications using spec cfp2006 benchmarks. In: 2009 IEEE International Symposium on Parallel Distributed Processing, pp. 1–12. IEEE, Piscataway (2009).
  50. 107.
    Singh, K., İpek, E., McKee, S.A., de Supinski, B.R., Schulz, M., Caruana, R.: Predicting parallel application performance via machine learning approaches: Research articles. Concurr. Comput. Pract. Exper. 19(17), 2219–2235 (2007). CrossRefGoogle Scholar
  51. 108.
    Snowdon, D.C., Petters, S.M., Heiser, G.: Accurate on-line prediction of processor and memoryenergy usage under voltage scaling. In: Proceedings of the 7th ACM &Amp; IEEE International Conference on Embedded Software, EMSOFT ’07, pp. 84–93. ACM, New York (2007).
  52. 109.
    Snowdon, D.C., Van Der Linden, G., Petters, S.M., Heiser, G.: Accurate run-time prediction of performance degradation under frequency scaling. In: Workshop on Operating Systems Platforms for Embedded Real-Time applications, p. 58 (2007)Google Scholar
  53. 110.
    Sodhi, S., Subhlok, J., Xu, Q.: Performance prediction with skeletons. Clust. Comput. 11(2), 151–165 (2008). CrossRefGoogle Scholar
  54. 111.
    Song, S.L., Barker, K., Kerbyson, D.: Unified performance and power modeling of scientific workloads. In: Proceedings of the 1st International Workshop on Energy Efficient Supercomputing, E2SC ’13, pp. 4:1–4:8. ACM, New York (2013).
  55. 112.
    Sridharan, S., Gupta, G., Sohi, G.S.: Holistic run-time parallelism management for time and energy efficiency. In: Proceedings of the 27th international ACM conference on International conference on supercomputing, pp. 337–348. ACM, New York (2013)Google Scholar
  56. 113.
    Sridharan, S., Gupta, G., Sohi, G.S.: Adaptive, efficient, parallel execution of parallel programs. ACM SIGPLAN Notices 49(6), 169–180 (2014)CrossRefGoogle Scholar
  57. 115.
    Suleman, M.A., Qureshi, M.K., Patt, Y.N.: Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPS. SIGARCH Comput. Archit. News 36(1), 277–286 (2008). CrossRefGoogle Scholar
  58. 118.
    Taylor, V., Xu, X., Geisler, J., Li, X., Lan, Z., Hereld, M., Judson, I.R., Stevens, R.: Prophesy: automating the modeling process. In: Proceedings Third Annual International Workshop on Active Middleware Services, pp. 3–11 (2001).
  59. 119.
    Taylor, V., Wu, X., Geisler, J., Stevens, R.: Using kernel couplings to predict parallel application performance. In: Proceedings 11th IEEE International Symposium on High Performance Distributed Computing, pp. 125–134 (2002).
  60. 120.
    Tiwari, A., Laurenzano, M.A., Carrington, L., Snavely, A.: Modeling power and energy usage of hpc kernels. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops PhD Forum, pp. 990–998 (2012).
  61. 123.
    Wheeler, K.B., Murphy, R.C., Thain, D.: Qthreads: An api for programming with millions of lightweight threads. In: IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8 (2008).
  62. 125.
    Witkowski, M., Oleksiak, A., Piontek, T., Weglarz, J.: Practical power consumption estimation for real life HPC applications. Futur. Gener. Comput. Syst. 29(1), 208–217 (2013). CrossRefGoogle Scholar
  63. 126.
    Wu, Q., Martonosi, M., Clark, D.W., Reddi, V.J., Connors, D., Wu, Y., Lee, J., Brooks, D.: Dynamic-compiler-driven control for microprocessor energy and performance. IEEE Micro 26(1), 119–129 (2006). CrossRefGoogle Scholar
  64. 127.
    Yang, L.T., Ma, X., Mueller, F.: Cross-platform performance prediction of parallel applications using partial execution. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC ’05, p. 40. IEEE Computer Society, Washington (2005).
  65. 128.
    Zhang, W., Cheng, A.M.K., Subhlok, J.: Dwarfcode: A performance prediction tool for parallel applications. IEEE Trans. Comput. 65(2), 495–507 (2016). MathSciNetCrossRefGoogle Scholar

Copyright information

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceFederal University of Pampa (UNIPAMPA)AlegreteBrazil
  2. 2.Institute of Informatics, Campus do ValeFederal University of Rio Grande do Sul (UFRGS)Porto AlegreBrazil

Personalised recommendations