Case Study: DCT with Aurora

Lorenzon, Arthur Francisco; Schneider Beck Filho, Antonio Carlos

doi:10.1007/978-3-030-28719-1_5

Arthur Francisco Lorenzon¹⁶ &
Antonio Carlos Schneider Beck Filho¹⁷

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

337 Accesses

Abstract

This chapter presents Aurora as a case study to optimize the execution of parallel applications. Aurora is an OpenMP framework that is completely transparent to both the designer and end user. Without any code transformation or recompilation, it is capable of automatically finding, at runtime and with minimum overhead, the optimal number of threads for each parallel loop region and readapts in cases the behavior of a region changes during execution. Therefore, Sect. 5.1 discusses the importance of providing an approach that, at the same time, is transparent to the user and provides adaptability regarding the execution environment. Then, Aurora is presented in Sect. 5.2 and evaluated through an extensive set of comparisons with some well-known state-of-the-art solutions in Sect. 5.3

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
GOMP_parallel_start is also named as GOMP_parallel.

References

Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., Simon, H.D., Venkatakrishnan, V., Weeratunga, S.K.: The NAS parallel benchmarks—summary and preliminary results. In: ACM/IEEE Conference on Supercomputing, pp. 158–165. ACM, New York (1991). https://doi.org/10.1145/125826.125925
Bhatt, S., Chen, M., Lin, C.Y., Liu, P.: Abstractions for parallel n-body simulations. In: Scalable High Performance Computing Conference, pp. 38–45. IEEE, Piscataway (1992). https://doi.org/10.1109/SHPCC.1992.232690
Blake, G., Dreslinski, R.G., Mudge, T., Flautner, K.: Evolution of thread-level parallelism in desktop applications. SIGARCH Comput. Archit. News 38(3), 302–313 (2010)
Article Google Scholar
Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A portable programming interface for performance evaluation on modern processors. Int. J. High Perform. Comput. Appl. 14(3), 189–204 (2000). https://doi.org/10.1177/109434200001400303
Article Google Scholar
Chapman, B., Jost, G., Pas, R.v.d.: Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation). MIT Press, Cambridge, MA (2007)
Google Scholar
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.H., Skadron, K.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization, pp. 44–54. IEEE Computer Society, Washington (2009). https://doi.org/10.1109/IISWC.2009.5306797
Christmann, C., Hebisch, E., Weisbecker, A.: Oversubscription of computational resources on multicore desktop systems. In: International Conference on Multicore Software Engineering, Performance, and Tools, MSEPT’12, pp. 18–29. Springer, Berlin (2012)
Chapter Google Scholar
Dongarra, J., Heroux, M.A., Luszczek, P.: HPCG benchmark: a new metric for ranking high performance computing systems. Knoxville, Tennessee (2015)
Google Scholar
Hackenberg, D., Ilsche, T., Schone, R., Molka, D., Schmidt, M., Nagel, W.E.: Power measurement techniques on standard compute nodes: a quantitative comparison. In: IEEE International Symposium on Performance Analysis of Systems and Software, pp. 194–204. IEEE, Picataway (2013). https://doi.org/10.1109/ISPASS.2013.6557170
Hähnel, M., Döbel, B., Völp, M., Härtig, H.: Measuring energy consumption for short code paths using RAPL. SIGMETRICS Perform. Eval. Rev. 40(3), 13–17 (2012). https://doi.org/10.1145/2425248.2425252
Article Google Scholar
Johnson, A., Jacobson, S.: On the convergence of generalized hill climbing algorithms. Discret. Appl. Math. 119(1), 37–57 (2002). Special Issue devoted to Foundation of Heuristics in Combinatorial Optimization
Article MathSciNet Google Scholar
Lorenzon, A.F., Cera, M.C., Beck, A.C.S.: On the influence of static power consumption in multicore embedded systems. In: 2015 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1374–1377. IEEE, Piscataway (2015)
Google Scholar
Lorenzon, A.F., Sartor, A.L., Cera, M.C., Beck, A.C.S.: Optimized use of parallel programming interfaces in multithreaded embedded architectures. In: 2015 IEEE Computer Society Annual Symposium on VLSI, pp. 410–415. IEEE, Piscataway (2015)
Google Scholar
Lorenzon, A.F., Cera, M.C., Beck, A.C.S.: Investigating different general-purpose and embedded multicores to achieve optimal trade-offs between performance and energy. J. Parallel Distrib. Comput. 95(C), 107–123 (2016). https://doi.org/10.1016/j.jpdc.2016.04.003
Article Google Scholar
Lorenzon, A.F., de Oliveira, C.C., Souza, J.D., Beck, A.C.S.: Aurora: seamless optimization of openMP applications. IEEE Trans. Parallel Distrib. Syst. 30(5), 1007–1021 (2019). https://doi.org/10.1109/TPDS.2018.2872992
Article Google Scholar
McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE Computer Society Technical Committee on Computer Architecture Newsletter, pp. 19–25 (1995)
Google Scholar
Petersen, W., Arbenz, P.: Introduction to parallel computing: a practical guide with examples in C. Oxford Texts in Applied and Engineering Mathematics. Oxford University Press, Oxford (2004)
MATH Google Scholar
Quinn, M.: Parallel Programming in C with MPI and OpenMP. McGraw-Hill Higher Education (2004)
Google Scholar
Seo, S., Jo, G., Lee, J.: Performance characterization of the nas parallel benchmarks in opencl. In: IEEE International Symposium on Workload Characterization, pp. 137–148 (2011). https://doi.org/10.1109/IISWC.2011.6114174
Sridharan, S., Gupta, G., Sohi, G.S.: Adaptive, efficient, parallel execution of parallel programs. ACM SIGPLAN Notices 49(6), 169–180 (2014)
Article Google Scholar
Suleman, M.A., Qureshi, M.K., Patt, Y.N.: Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPS. SIGARCH Comput. Archit. News 36(1), 277–286 (2008). https://doi.org/10.1145/1353534.1346317
Article Google Scholar
Taborda, D., Zdravkovic, L.: Application of a hill-climbing technique to the formulation of a new cyclic nonlinear elastic constitutive model. Comput. Geotech. 43, 80—91 (2012)
Article Google Scholar
Willhalm, T., Dementiev, R., Fay, P.: Intel performance counter monitor—a better way to measure cpu utilization. Tech. rep., Intel (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Federal University of Pampa (UNIPAMPA), Alegrete, Rio Grande do Sul, Brazil
Arthur Francisco Lorenzon
Institute of Informatics, Campus do Vale, Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Rio Grande do Sul, Brazil
Antonio Carlos Schneider Beck Filho

Authors

Arthur Francisco Lorenzon
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Carlos Schneider Beck Filho
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lorenzon, A.F., Schneider Beck Filho, A.C. (2019). Case Study: DCT with Aurora. In: Parallel Computing Hits the Power Wall. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-030-28719-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-28719-1_5
Published: 06 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28718-4
Online ISBN: 978-3-030-28719-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics