Advertisement

Case Study: DCT with Aurora

  • Arthur Francisco Lorenzon
  • Antonio Carlos Schneider Beck Filho
Chapter
Part of the SpringerBriefs in Computer Science book series (BRIEFSCOMPUTER)

Abstract

This chapter presents Aurora as a case study to optimize the execution of parallel applications. Aurora is an OpenMP framework that is completely transparent to both the designer and end user. Without any code transformation or recompilation, it is capable of automatically finding, at runtime and with minimum overhead, the optimal number of threads for each parallel loop region and readapts in cases the behavior of a region changes during execution. Therefore, Sect. 5.1 discusses the importance of providing an approach that, at the same time, is transparent to the user and provides adaptability regarding the execution environment. Then, Aurora is presented in Sect. 5.2 and evaluated through an extensive set of comparisons with some well-known state-of-the-art solutions in Sect. 5.3

References

  1. 4.
    Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., Simon, H.D., Venkatakrishnan, V., Weeratunga, S.K.: The NAS parallel benchmarks—summary and preliminary results. In: ACM/IEEE Conference on Supercomputing, pp. 158–165. ACM, New York (1991).  https://doi.org/10.1145/125826.125925
  2. 10.
    Bhatt, S., Chen, M., Lin, C.Y., Liu, P.: Abstractions for parallel n-body simulations. In: Scalable High Performance Computing Conference, pp. 38–45. IEEE, Piscataway (1992).  https://doi.org/10.1109/SHPCC.1992.232690
  3. 12.
    Blake, G., Dreslinski, R.G., Mudge, T., Flautner, K.: Evolution of thread-level parallelism in desktop applications. SIGARCH Comput. Archit. News 38(3), 302–313 (2010)CrossRefGoogle Scholar
  4. 15.
    Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A portable programming interface for performance evaluation on modern processors. Int. J. High Perform. Comput. Appl. 14(3), 189–204 (2000).  https://doi.org/10.1177/109434200001400303 CrossRefGoogle Scholar
  5. 22.
    Chapman, B., Jost, G., Pas, R.v.d.: Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation). MIT Press, Cambridge, MA (2007)Google Scholar
  6. 23.
    Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.H., Skadron, K.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization, pp. 44–54. IEEE Computer Society, Washington (2009).  https://doi.org/10.1109/IISWC.2009.5306797
  7. 26.
    Christmann, C., Hebisch, E., Weisbecker, A.: Oversubscription of computational resources on multicore desktop systems. In: International Conference on Multicore Software Engineering, Performance, and Tools, MSEPT’12, pp. 18–29. Springer, Berlin (2012)CrossRefGoogle Scholar
  8. 32.
    Dongarra, J., Heroux, M.A., Luszczek, P.: HPCG benchmark: a new metric for ranking high performance computing systems. Knoxville, Tennessee (2015)Google Scholar
  9. 39.
    Hackenberg, D., Ilsche, T., Schone, R., Molka, D., Schmidt, M., Nagel, W.E.: Power measurement techniques on standard compute nodes: a quantitative comparison. In: IEEE International Symposium on Performance Analysis of Systems and Software, pp. 194–204. IEEE, Picataway (2013).  https://doi.org/10.1109/ISPASS.2013.6557170
  10. 40.
    Hähnel, M., Döbel, B., Völp, M., Härtig, H.: Measuring energy consumption for short code paths using RAPL. SIGMETRICS Perform. Eval. Rev. 40(3), 13–17 (2012).  https://doi.org/10.1145/2425248.2425252 CrossRefGoogle Scholar
  11. 52.
    Johnson, A., Jacobson, S.: On the convergence of generalized hill climbing algorithms. Discret. Appl. Math. 119(1), 37–57 (2002). Special Issue devoted to Foundation of Heuristics in Combinatorial OptimizationMathSciNetCrossRefGoogle Scholar
  12. 68.
    Lorenzon, A.F., Cera, M.C., Beck, A.C.S.: On the influence of static power consumption in multicore embedded systems. In: 2015 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1374–1377. IEEE, Piscataway (2015)Google Scholar
  13. 70.
    Lorenzon, A.F., Sartor, A.L., Cera, M.C., Beck, A.C.S.: Optimized use of parallel programming interfaces in multithreaded embedded architectures. In: 2015 IEEE Computer Society Annual Symposium on VLSI, pp. 410–415. IEEE, Piscataway (2015)Google Scholar
  14. 71.
    Lorenzon, A.F., Cera, M.C., Beck, A.C.S.: Investigating different general-purpose and embedded multicores to achieve optimal trade-offs between performance and energy. J. Parallel Distrib. Comput. 95(C), 107–123 (2016).  https://doi.org/10.1016/j.jpdc.2016.04.003 CrossRefGoogle Scholar
  15. 73.
    Lorenzon, A.F., de Oliveira, C.C., Souza, J.D., Beck, A.C.S.: Aurora: seamless optimization of openMP applications. IEEE Trans. Parallel Distrib. Syst. 30(5), 1007–1021 (2019).  https://doi.org/10.1109/TPDS.2018.2872992 CrossRefGoogle Scholar
  16. 78.
    McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE Computer Society Technical Committee on Computer Architecture Newsletter, pp. 19–25 (1995)Google Scholar
  17. 89.
    Petersen, W., Arbenz, P.: Introduction to parallel computing: a practical guide with examples in C. Oxford Texts in Applied and Engineering Mathematics. Oxford University Press, Oxford (2004)zbMATHGoogle Scholar
  18. 94.
    Quinn, M.: Parallel Programming in C with MPI and OpenMP. McGraw-Hill Higher Education (2004)Google Scholar
  19. 103.
    Seo, S., Jo, G., Lee, J.: Performance characterization of the nas parallel benchmarks in opencl. In: IEEE International Symposium on Workload Characterization, pp. 137–148 (2011).  https://doi.org/10.1109/IISWC.2011.6114174
  20. 113.
    Sridharan, S., Gupta, G., Sohi, G.S.: Adaptive, efficient, parallel execution of parallel programs. ACM SIGPLAN Notices 49(6), 169–180 (2014)CrossRefGoogle Scholar
  21. 115.
    Suleman, M.A., Qureshi, M.K., Patt, Y.N.: Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPS. SIGARCH Comput. Archit. News 36(1), 277–286 (2008).  https://doi.org/10.1145/1353534.1346317 CrossRefGoogle Scholar
  22. 116.
    Taborda, D., Zdravkovic, L.: Application of a hill-climbing technique to the formulation of a new cyclic nonlinear elastic constitutive model. Comput. Geotech. 43, 80—91 (2012)CrossRefGoogle Scholar
  23. 124.
    Willhalm, T., Dementiev, R., Fay, P.: Intel performance counter monitor—a better way to measure cpu utilization. Tech. rep., Intel (2017)Google Scholar

Copyright information

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Arthur Francisco Lorenzon
    • 1
  • Antonio Carlos Schneider Beck Filho
    • 2
  1. 1.Department of Computer ScienceFederal University of Pampa (UNIPAMPA)AlegreteBrazil
  2. 2.Institute of Informatics, Campus do ValeFederal University of Rio Grande do Sul (UFRGS)Porto AlegreBrazil

Personalised recommendations