Abstract
This chapter presents Aurora as a case study to optimize the execution of parallel applications. Aurora is an OpenMP framework that is completely transparent to both the designer and end user. Without any code transformation or recompilation, it is capable of automatically finding, at runtime and with minimum overhead, the optimal number of threads for each parallel loop region and readapts in cases the behavior of a region changes during execution. Therefore, Sect. 5.1 discusses the importance of providing an approach that, at the same time, is transparent to the user and provides adaptability regarding the execution environment. Then, Aurora is presented in Sect. 5.2 and evaluated through an extensive set of comparisons with some well-known state-of-the-art solutions in Sect. 5.3
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
GOMP_parallel_start is also named as GOMP_parallel.
References
Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., Simon, H.D., Venkatakrishnan, V., Weeratunga, S.K.: The NAS parallel benchmarks—summary and preliminary results. In: ACM/IEEE Conference on Supercomputing, pp. 158–165. ACM, New York (1991). https://doi.org/10.1145/125826.125925
Bhatt, S., Chen, M., Lin, C.Y., Liu, P.: Abstractions for parallel n-body simulations. In: Scalable High Performance Computing Conference, pp. 38–45. IEEE, Piscataway (1992). https://doi.org/10.1109/SHPCC.1992.232690
Blake, G., Dreslinski, R.G., Mudge, T., Flautner, K.: Evolution of thread-level parallelism in desktop applications. SIGARCH Comput. Archit. News 38(3), 302–313 (2010)
Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A portable programming interface for performance evaluation on modern processors. Int. J. High Perform. Comput. Appl. 14(3), 189–204 (2000). https://doi.org/10.1177/109434200001400303
Chapman, B., Jost, G., Pas, R.v.d.: Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation). MIT Press, Cambridge, MA (2007)
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.H., Skadron, K.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization, pp. 44–54. IEEE Computer Society, Washington (2009). https://doi.org/10.1109/IISWC.2009.5306797
Christmann, C., Hebisch, E., Weisbecker, A.: Oversubscription of computational resources on multicore desktop systems. In: International Conference on Multicore Software Engineering, Performance, and Tools, MSEPT’12, pp. 18–29. Springer, Berlin (2012)
Dongarra, J., Heroux, M.A., Luszczek, P.: HPCG benchmark: a new metric for ranking high performance computing systems. Knoxville, Tennessee (2015)
Hackenberg, D., Ilsche, T., Schone, R., Molka, D., Schmidt, M., Nagel, W.E.: Power measurement techniques on standard compute nodes: a quantitative comparison. In: IEEE International Symposium on Performance Analysis of Systems and Software, pp. 194–204. IEEE, Picataway (2013). https://doi.org/10.1109/ISPASS.2013.6557170
Hähnel, M., Döbel, B., Völp, M., Härtig, H.: Measuring energy consumption for short code paths using RAPL. SIGMETRICS Perform. Eval. Rev. 40(3), 13–17 (2012). https://doi.org/10.1145/2425248.2425252
Johnson, A., Jacobson, S.: On the convergence of generalized hill climbing algorithms. Discret. Appl. Math. 119(1), 37–57 (2002). Special Issue devoted to Foundation of Heuristics in Combinatorial Optimization
Lorenzon, A.F., Cera, M.C., Beck, A.C.S.: On the influence of static power consumption in multicore embedded systems. In: 2015 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1374–1377. IEEE, Piscataway (2015)
Lorenzon, A.F., Sartor, A.L., Cera, M.C., Beck, A.C.S.: Optimized use of parallel programming interfaces in multithreaded embedded architectures. In: 2015 IEEE Computer Society Annual Symposium on VLSI, pp. 410–415. IEEE, Piscataway (2015)
Lorenzon, A.F., Cera, M.C., Beck, A.C.S.: Investigating different general-purpose and embedded multicores to achieve optimal trade-offs between performance and energy. J. Parallel Distrib. Comput. 95(C), 107–123 (2016). https://doi.org/10.1016/j.jpdc.2016.04.003
Lorenzon, A.F., de Oliveira, C.C., Souza, J.D., Beck, A.C.S.: Aurora: seamless optimization of openMP applications. IEEE Trans. Parallel Distrib. Syst. 30(5), 1007–1021 (2019). https://doi.org/10.1109/TPDS.2018.2872992
McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE Computer Society Technical Committee on Computer Architecture Newsletter, pp. 19–25 (1995)
Petersen, W., Arbenz, P.: Introduction to parallel computing: a practical guide with examples in C. Oxford Texts in Applied and Engineering Mathematics. Oxford University Press, Oxford (2004)
Quinn, M.: Parallel Programming in C with MPI and OpenMP. McGraw-Hill Higher Education (2004)
Seo, S., Jo, G., Lee, J.: Performance characterization of the nas parallel benchmarks in opencl. In: IEEE International Symposium on Workload Characterization, pp. 137–148 (2011). https://doi.org/10.1109/IISWC.2011.6114174
Sridharan, S., Gupta, G., Sohi, G.S.: Adaptive, efficient, parallel execution of parallel programs. ACM SIGPLAN Notices 49(6), 169–180 (2014)
Suleman, M.A., Qureshi, M.K., Patt, Y.N.: Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPS. SIGARCH Comput. Archit. News 36(1), 277–286 (2008). https://doi.org/10.1145/1353534.1346317
Taborda, D., Zdravkovic, L.: Application of a hill-climbing technique to the formulation of a new cyclic nonlinear elastic constitutive model. Comput. Geotech. 43, 80—91 (2012)
Willhalm, T., Dementiev, R., Fay, P.: Intel performance counter monitor—a better way to measure cpu utilization. Tech. rep., Intel (2017)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2019 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Lorenzon, A.F., Schneider Beck Filho, A.C. (2019). Case Study: DCT with Aurora. In: Parallel Computing Hits the Power Wall. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-030-28719-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-28719-1_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28718-4
Online ISBN: 978-3-030-28719-1
eBook Packages: Computer ScienceComputer Science (R0)