Skip to main content

Tuning Parallel Applications

  • Chapter
  • First Online:
Parallel Computing Hits the Power Wall

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

  • 349 Accesses

Abstract

This chapter presents a comprehensive study of the techniques used to improve the performance, energy, or EDP of parallel applications. They are discussed considering the following:

  • Adaptability: when the adaptation of the number of threads and processor operating frequency happens and whether it is continuous or not.

  • Transparency: when the application tuning involves the need for special tools or compilers, programmer influence, and/or changes in the source or binary codes.

Therefore, in Sect. 4.1, we first discuss the design space exploration related to the way how the approaches that optimize parallel applications can achieve adaptability and transparency. In Sect. 4.2, we describe the works that aim to improve the execution of parallel applications by tuning the number of threads. Then, Sect. 4.3 presents the approaches that change the levels of voltage and frequency of the processor in order to deliver a better behavior of parallel applications. Finally, Sect. 4.4 discusses the approaches that exploit both DCT and DVFS for improving parallel applications execution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adya, A., Howell, J., Theimer, M., Bolosky, W.J., Douceur, J.R.: Cooperative task management without manual stack management. In: Annual Conference on USENIX, pp. 289–302. USENIX Association, Berkeley (2002)

    Google Scholar 

  2. Akram, S., Sartor, J.B., Eeckhout, L.: DVFS performance prediction for managed multithreaded applications. In: ISPASS, pp. 12–23. IEEE, Piscataway (2016). https://doi.org/10.1109/ISPASS.2016.7482070

  3. Alessi, F., Thoman, P., Georgakoudis, G., Fahringer, T., Nikolopoulos, D.S.: Application-level energy awareness for openmp. In: International Workshop on OpenMP, pp. 219–232. Springer, Berlin (2015)

    Chapter  Google Scholar 

  4. Barnes, B.J., Rountree, B., Lowenthal, D.K., Reeves, J., de Supinski, B., Schulz, M.: A regression-based approach to scalability prediction. In: Proceedings of the 22Nd Annual International Conference on Supercomputing, ICS ’08, pp. 368–377. ACM, New York (2008). https://doi.org/10.1145/1375527.1375580

  5. Basmadjian, R., de Meer, H.: Evaluating and modeling power consumption of multi-core processors. In: 2012 Third International Conference on Future Systems: Where Energy, Computing and Communication Meet (e-Energy), pp. 1–10. IEEE, Piscataway (2012). https://doi.org/10.1145/2208828.2208840

  6. Benedict, S., Rejitha, R.S., Gschwandtner, P., Prodan, R., Fahringer, T.: Energy prediction of openmp applications using random forest modeling approach. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, pp. 1251–1260. IEEE, Piscataway (2015). https://doi.org/10.1109/IPDPSW.2015.12

  7. Bhattacharjee, A., Martonosi, M.: Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors. SIGARCH Comput. Archit. News 37(3), 290–301 (2009). https://doi.org/10.1145/1555815.1555792

    Article  Google Scholar 

  8. Cabrera, A., Almeida, F., Blanco, V., Giménez, D.: Analytical modeling of the energy consumption for the high performance linpack. In: 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 343–350. IEEE, Piscataway (2013). https://doi.org/10.1109/PDP.2013.56

  9. Chadha, G., Mahlke, S., Narayanasamy, S.: When less is more (limo): controlled parallelism forimproved efficiency. In: Proceedings of the 2012 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, pp. 141–150. ACM, New York (2012)

    Google Scholar 

  10. Chen, Y.L., Chang, M.F., Liang, W.Y., Lee, C.H.: Performance and energy efficient dynamic voltage and frequency scaling scheme for multicore embedded system. In: IEEE ICCE, pp. 58–59. IEEE, Piscataway (2016). https://doi.org/10.1109/ICCE.2016.7430521

  11. Chou, C.Y., Chang, H.Y., Wang, S.T., Huang, K.C., Shen, C.Y.: An improved model for predicting hpl performance. In: Cérin, C., Li, K.C. (eds.) Advances in Grid and Pervasive Computing, pp. 158–168. Springer, Berlin (2007)

    Chapter  Google Scholar 

  12. Cochran, R., Hankendi, C., Coskun, A.K., Reda, S.: Pack & cap: adaptive DVFS and thread packing under power caps. In: IEEE/ACM MICRO, pp. 175–185 (2011). https://doi.org/10.1145/2155620.2155641

  13. Curtis-Maury, M., Dzierwa, J., Antonopoulos, C.D., Nikolopoulos, D.S.: Online power-performance adaptation of multithreaded programs using hardware event-based prediction. In: Proceedings of the 20th Annual International Conference on Supercomputing, pp. 157–166. ACM, New York (2006)

    Google Scholar 

  14. Curtis-Maury, M., Shah, A., Blagojevic, F., Nikolopoulos, D.S., De Supinski, B.R., Schulz, M.: Prediction models for multi-dimensional power-performance optimization on many cores. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 250–259. ACM, New York (2008)

    Google Scholar 

  15. Dimakopoulos, V.V., Leontiadis, E., Tzoumas, G.: A portable c compiler for openmp v. 2.0. In: Proceedings of the of the 5th European Workshop on OpenMP (EWOMP03) (2003)

    Google Scholar 

  16. Ding, Y., Kandemir, M., Raghavan, P., Irwin, M.J.: A helper thread based edp reduction scheme for adapting application execution in CMPS. In: 2008 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–14. IEEE, Piscataway (2008). https://doi.org/10.1109/IPDPS.2008.4536297

  17. dos Santos Marques, W., de Souza, P.S.S., Lorenzon, A.F., Beck, A.C.S., Beck Rutzig, M., Diniz Rossi, F.: Improving EDP in multi-core embedded systems through multidimensional frequency scaling. In: 2017 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–4. IEEE, Piscataway (2017). https://doi.org/10.1109/ISCAS.2017.8050515

  18. Ge, R., Feng, X., Feng, W., Cameron, K.W.: CPU MISER: a performance-directed, run-time system for power-aware clusters. In: ICPP, pp. 18–18 (2007). https://doi.org/10.1109/ICPP.2007.29

  19. Hankendi, C., Coskun, A.K.: Adaptive power and resource management techniques for multi-threaded workloads. In: 2013 IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum, pp. 2302–2305. IEEE, Picataway (2013). https://doi.org/10.1109/IPDPSW.2013.258

  20. Hotta, Y., Sato, M., Kimura, H., Matsuoka, S., Boku, T., Takahashi, D.: Profile-based optimization of power performance by using dynamic voltage scaling on a pc cluster. In: IEEE IPDPS (2006). https://doi.org/10.1109/IPDPS.2006.1639597

  21. Hsu, C.H., Feng, W.C.: A power-aware run-time system for high-performance computing. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC ’05, pp. 1–1 (2005). https://doi.org/10.1109/SC.2005.3

  22. Hwang, Y., Chung, K.: Dynamic power management technique for multicore based embedded mobile devices. IEEE Trans. Ind. Inf. 9(3), 1601–1612 (2013). https://doi.org/10.1109/TII.2012.2232299

    Article  Google Scholar 

  23. Ipek, E., de Supinski, B.R., Schulz, M., McKee, S.A.: An approach to performance prediction for parallel applications. In: Proceedings of the 11th International Euro-Par Conference on Parallel Processing, Euro-Par’05, pp. 196–205. Springer, Berlin (2005)

    Chapter  Google Scholar 

  24. Jayakumar, A., Murali, P., Vadhiyar, S.: Matching application signatures for performance predictions using a single execution. In: 2015 IEEE International Parallel and Distributed Processing Symposium, pp. 1161–1170. IEEE, Picataway (2015). https://doi.org/10.1109/IPDPS.2015.20

  25. Jordan, H., Thoman, P., Durillo, J.J., Pellegrini, S., Gschwandtner, P., Fahringer, T., Moritsch, H.: A multi-objective auto-tuning framework for parallel codes. In: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12. IEEE, Picataway (2012)

    Google Scholar 

  26. Ju, T., Wu, W., Chen, H., Zhu, Z., Dong, X.: Thread count prediction model: Dynamically adjusting threads for heterogeneous many-core systems. In: 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS), pp. 456–464. IEEE, Picataway (2015). https://doi.org/10.1109/ICPADS.2015.64

  27. Jung, C., Lim, D., Lee, J., Han, S.: Adaptive execution techniques for SMT multiprocessor architectures. In: Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 236–246. ACM, New York (2005)

    Google Scholar 

  28. Lee, J., Wu, H., Ravichandran, M., Clark, N.: Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications. ACM SIGARCH Comput. Archit. News 38(3), 270–279 (2010)

    Article  Google Scholar 

  29. Li, D., de Supinski, B.R., Schulz, M., Cameron, K., Nikolopoulos, D.S.: Hybrid MPI/openMP power-aware computing. In: IEEE IPDPS, pp. 1–12 (2010). https://doi.org/10.1109/IPDPS.2010.5470463

  30. Li, D., de Supinski, B.R., Schulz, M., Nikolopoulos, D.S., Cameron, K.W.: Strategies for energy-efficient resource management of hybrid programming models. IEEE Trans. Parallel Distrib. Syst. 24(1), 44–157 (2013). https://doi.org/10.1109/TPDS.2012.95

    Article  Google Scholar 

  31. Li, J., Martinez, J.F.: Dynamic power-performance adaptation of parallel computation on chip multiprocessors. In: The Twelfth International Symposium on High-Performance Computer Architecture, 2006, pp. 77–87 (2006). https://doi.org/10.1109/HPCA.2006.1598114

    Google Scholar 

  32. Lorenzon, A.F., Souza, J.D., Beck, A.C.S.: Laant: A library to automatically optimize edp for openMP applications. In: DATE, pp. 1229–1232 (2017). https://doi.org/10.23919/DATE.2017.7927176

  33. Marathe, A., Bailey, P.E., Lowenthal, D.K., Rountree, B., Schulz, M., de Supinski, B.R.: A run-time system for power-constrained hpc applications. In: Kunkel, J.M., Ludwig, T. (eds.) High Performance Computing, pp. 394–408. Springer, Cham (2015)

    Chapter  Google Scholar 

  34. Miftakhutdinov, R., Ebrahimi, E., Patt, Y.N.: Predicting performance impact of dvfs for realistic memory systems. In: 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 155–165 (2012). https://doi.org/10.1109/MICRO.2012.23

  35. Miftakhutdinov, R.R.: Performance prediction for dynamic voltage and frequency scaling. Ph.D. thesis, The University of Texas (2014)

    Google Scholar 

  36. Palermo, G., Silvano, C., Zaccaria, V.: An efficient design space exploration methodology for on-chip multiprocessors subject to application-specific constraints. In: 2008 Symposium on Application Specific Processors, pp. 75–82 (2008). https://doi.org/10.1109/SASP.2008.4570789

  37. Porterfield, A., Fowler, R., Neyer, M.: Maestro: Dynamic runtime power and concurrency adaptation. In: Proceedings Workshop Managed Many-Core System, pp. 1–8

    Google Scholar 

  38. Porterfield, A.K., Olivier, S.L., Bhalachandra, S., Prins, J.F.: Power measurement and concurrency throttling for energy reduction in openMP programs. In: Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International, pp. 884–891. IEEE, Piscataway (2013)

    Google Scholar 

  39. Pusukuri, K.K., Gupta, R., Bhuyan, L.N.: Thread reinforcer: Dynamically determining number of threads via os level monitoring. In: 2011 IEEE International Symposium on Workload Characterization (IISWC), pp. 116–125. IEEE, Piscataway (2011)

    Google Scholar 

  40. Quinlan, D., Liao, C.: The rose source-to-source compiler infrastructure. In: Cetus Users and Compiler Infrastructure Workshop, in conjunction with PACT 2011 (2011)

    Google Scholar 

  41. Raman, A., Zaks, A., Lee, J.W., August, D.I.: Parcae: A system for flexible parallel execution. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, pp. 133–144. ACM, New York (2012)

    Google Scholar 

  42. Rizvandi, N.B., Taheri, J., Zomaya, A.Y., Lee, Y.C.: Linear combinations of DVFS-enabled processor frequencies to modify the energy-aware scheduling algorithms. In: CCGRID, pp. 388–397 (2010). https://doi.org/10.1109/CCGRID.2010.38

  43. Rossi, F.D., Storch, M., de Oliveira, I., Rose, C.A.F.D.: Modeling power consumption for dvfs policies. In: 2015 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1879–1882. IEEE, Piscataway (2015). https://doi.org/10.1109/ISCAS.2015.7169024

  44. Rountree, B., Lowenthal, D.K., Schulz, M., de Supinski, B.R.: Practical performance prediction under dynamic voltage frequency scaling. In: 2011 International Green Computing Conference and Workshops, pp. 1–8 (2011). https://doi.org/10.1109/IGCC.2011.6008553

  45. Sensi, D.D.: Predicting performance and power consumption of parallel applications. In: 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), pp. 200–207 (2016). https://doi.org/10.1109/PDP.2016.41

  46. Sensi, D.D., Torquati, M., Danelutto, M.: A reconfiguration algorithm for power-aware parallel applications. TACO 13(4), 43:1–43:25 (2016). https://doi.org/10.1145/3004054

    Article  Google Scholar 

  47. Shafik, R.A., Das, A., Yang, S., Merrett, G., Al-Hashimi, B.M.: Adaptive energy minimization of openMP parallel applications on many-core systems. In: Proceedings of the 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures, pp. 19–24. ACM, New York (2015)

    Google Scholar 

  48. Shafik, R.A., Das, A.K., Yang, S., Merrett, G.V., Al-Hashimi, B.: Thermal-aware adaptive energy minimization of open MP parallel applications (2015)

    Google Scholar 

  49. Sharkawi, S., DeSota, D., Panda, R., Indukuru, R., Stevens, S., Taylor, V., Wu, X.: Performance projection of HPC applications using spec cfp2006 benchmarks. In: 2009 IEEE International Symposium on Parallel Distributed Processing, pp. 1–12. IEEE, Piscataway (2009). https://doi.org/10.1109/IPDPS.2009.5161057

  50. Singh, K., İpek, E., McKee, S.A., de Supinski, B.R., Schulz, M., Caruana, R.: Predicting parallel application performance via machine learning approaches: Research articles. Concurr. Comput. Pract. Exper. 19(17), 2219–2235 (2007). https://doi.org/10.1002/cpe.v19:17

    Article  Google Scholar 

  51. Snowdon, D.C., Petters, S.M., Heiser, G.: Accurate on-line prediction of processor and memoryenergy usage under voltage scaling. In: Proceedings of the 7th ACM &Amp; IEEE International Conference on Embedded Software, EMSOFT ’07, pp. 84–93. ACM, New York (2007). https://doi.org/10.1145/1289927.1289945

  52. Snowdon, D.C., Van Der Linden, G., Petters, S.M., Heiser, G.: Accurate run-time prediction of performance degradation under frequency scaling. In: Workshop on Operating Systems Platforms for Embedded Real-Time applications, p. 58 (2007)

    Google Scholar 

  53. Sodhi, S., Subhlok, J., Xu, Q.: Performance prediction with skeletons. Clust. Comput. 11(2), 151–165 (2008). https://doi.org/10.1007/s10586-007-0039-2

    Article  Google Scholar 

  54. Song, S.L., Barker, K., Kerbyson, D.: Unified performance and power modeling of scientific workloads. In: Proceedings of the 1st International Workshop on Energy Efficient Supercomputing, E2SC ’13, pp. 4:1–4:8. ACM, New York (2013). https://doi.org/10.1145/2536430.2536435

  55. Sridharan, S., Gupta, G., Sohi, G.S.: Holistic run-time parallelism management for time and energy efficiency. In: Proceedings of the 27th international ACM conference on International conference on supercomputing, pp. 337–348. ACM, New York (2013)

    Google Scholar 

  56. Sridharan, S., Gupta, G., Sohi, G.S.: Adaptive, efficient, parallel execution of parallel programs. ACM SIGPLAN Notices 49(6), 169–180 (2014)

    Article  Google Scholar 

  57. Suleman, M.A., Qureshi, M.K., Patt, Y.N.: Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPS. SIGARCH Comput. Archit. News 36(1), 277–286 (2008). https://doi.org/10.1145/1353534.1346317

    Article  Google Scholar 

  58. Taylor, V., Xu, X., Geisler, J., Li, X., Lan, Z., Hereld, M., Judson, I.R., Stevens, R.: Prophesy: automating the modeling process. In: Proceedings Third Annual International Workshop on Active Middleware Services, pp. 3–11 (2001). https://doi.org/10.1109/AMS.2001.993715

  59. Taylor, V., Wu, X., Geisler, J., Stevens, R.: Using kernel couplings to predict parallel application performance. In: Proceedings 11th IEEE International Symposium on High Performance Distributed Computing, pp. 125–134 (2002). https://doi.org/10.1109/HPDC.2002.1029910

  60. Tiwari, A., Laurenzano, M.A., Carrington, L., Snavely, A.: Modeling power and energy usage of hpc kernels. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops PhD Forum, pp. 990–998 (2012). https://doi.org/10.1109/IPDPSW.2012.121

  61. Wheeler, K.B., Murphy, R.C., Thain, D.: Qthreads: An api for programming with millions of lightweight threads. In: IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8 (2008). https://doi.org/10.1109/IPDPS.2008.4536359

  62. Witkowski, M., Oleksiak, A., Piontek, T., Weglarz, J.: Practical power consumption estimation for real life HPC applications. Futur. Gener. Comput. Syst. 29(1), 208–217 (2013). https://doi.org/10.1016/j.future.2012.06.003

    Article  Google Scholar 

  63. Wu, Q., Martonosi, M., Clark, D.W., Reddi, V.J., Connors, D., Wu, Y., Lee, J., Brooks, D.: Dynamic-compiler-driven control for microprocessor energy and performance. IEEE Micro 26(1), 119–129 (2006). https://doi.org/10.1109/MM.2006.9

    Article  Google Scholar 

  64. Yang, L.T., Ma, X., Mueller, F.: Cross-platform performance prediction of parallel applications using partial execution. In: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC ’05, p. 40. IEEE Computer Society, Washington (2005). https://doi.org/10.1109/SC.2005.20

  65. Zhang, W., Cheng, A.M.K., Subhlok, J.: Dwarfcode: A performance prediction tool for parallel applications. IEEE Trans. Comput. 65(2), 495–507 (2016). https://doi.org/10.1109/TC.2015.2417526

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2019 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Lorenzon, A.F., Schneider Beck Filho, A.C. (2019). Tuning Parallel Applications. In: Parallel Computing Hits the Power Wall. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-030-28719-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-28719-1_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-28718-4

  • Online ISBN: 978-3-030-28719-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics