Skip to main content

Case Study: DCT with Aurora

  • Chapter
  • First Online:
Parallel Computing Hits the Power Wall

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

  • 337 Accesses

Abstract

This chapter presents Aurora as a case study to optimize the execution of parallel applications. Aurora is an OpenMP framework that is completely transparent to both the designer and end user. Without any code transformation or recompilation, it is capable of automatically finding, at runtime and with minimum overhead, the optimal number of threads for each parallel loop region and readapts in cases the behavior of a region changes during execution. Therefore, Sect. 5.1 discusses the importance of providing an approach that, at the same time, is transparent to the user and provides adaptability regarding the execution environment. Then, Aurora is presented in Sect. 5.2 and evaluated through an extensive set of comparisons with some well-known state-of-the-art solutions in Sect. 5.3

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    GOMP_parallel_start is also named as GOMP_parallel.

References

  1. Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., Simon, H.D., Venkatakrishnan, V., Weeratunga, S.K.: The NAS parallel benchmarks—summary and preliminary results. In: ACM/IEEE Conference on Supercomputing, pp. 158–165. ACM, New York (1991). https://doi.org/10.1145/125826.125925

  2. Bhatt, S., Chen, M., Lin, C.Y., Liu, P.: Abstractions for parallel n-body simulations. In: Scalable High Performance Computing Conference, pp. 38–45. IEEE, Piscataway (1992). https://doi.org/10.1109/SHPCC.1992.232690

  3. Blake, G., Dreslinski, R.G., Mudge, T., Flautner, K.: Evolution of thread-level parallelism in desktop applications. SIGARCH Comput. Archit. News 38(3), 302–313 (2010)

    Article  Google Scholar 

  4. Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A portable programming interface for performance evaluation on modern processors. Int. J. High Perform. Comput. Appl. 14(3), 189–204 (2000). https://doi.org/10.1177/109434200001400303

    Article  Google Scholar 

  5. Chapman, B., Jost, G., Pas, R.v.d.: Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation). MIT Press, Cambridge, MA (2007)

    Google Scholar 

  6. Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.H., Skadron, K.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization, pp. 44–54. IEEE Computer Society, Washington (2009). https://doi.org/10.1109/IISWC.2009.5306797

  7. Christmann, C., Hebisch, E., Weisbecker, A.: Oversubscription of computational resources on multicore desktop systems. In: International Conference on Multicore Software Engineering, Performance, and Tools, MSEPT’12, pp. 18–29. Springer, Berlin (2012)

    Chapter  Google Scholar 

  8. Dongarra, J., Heroux, M.A., Luszczek, P.: HPCG benchmark: a new metric for ranking high performance computing systems. Knoxville, Tennessee (2015)

    Google Scholar 

  9. Hackenberg, D., Ilsche, T., Schone, R., Molka, D., Schmidt, M., Nagel, W.E.: Power measurement techniques on standard compute nodes: a quantitative comparison. In: IEEE International Symposium on Performance Analysis of Systems and Software, pp. 194–204. IEEE, Picataway (2013). https://doi.org/10.1109/ISPASS.2013.6557170

  10. Hähnel, M., Döbel, B., Völp, M., Härtig, H.: Measuring energy consumption for short code paths using RAPL. SIGMETRICS Perform. Eval. Rev. 40(3), 13–17 (2012). https://doi.org/10.1145/2425248.2425252

    Article  Google Scholar 

  11. Johnson, A., Jacobson, S.: On the convergence of generalized hill climbing algorithms. Discret. Appl. Math. 119(1), 37–57 (2002). Special Issue devoted to Foundation of Heuristics in Combinatorial Optimization

    Article  MathSciNet  Google Scholar 

  12. Lorenzon, A.F., Cera, M.C., Beck, A.C.S.: On the influence of static power consumption in multicore embedded systems. In: 2015 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1374–1377. IEEE, Piscataway (2015)

    Google Scholar 

  13. Lorenzon, A.F., Sartor, A.L., Cera, M.C., Beck, A.C.S.: Optimized use of parallel programming interfaces in multithreaded embedded architectures. In: 2015 IEEE Computer Society Annual Symposium on VLSI, pp. 410–415. IEEE, Piscataway (2015)

    Google Scholar 

  14. Lorenzon, A.F., Cera, M.C., Beck, A.C.S.: Investigating different general-purpose and embedded multicores to achieve optimal trade-offs between performance and energy. J. Parallel Distrib. Comput. 95(C), 107–123 (2016). https://doi.org/10.1016/j.jpdc.2016.04.003

    Article  Google Scholar 

  15. Lorenzon, A.F., de Oliveira, C.C., Souza, J.D., Beck, A.C.S.: Aurora: seamless optimization of openMP applications. IEEE Trans. Parallel Distrib. Syst. 30(5), 1007–1021 (2019). https://doi.org/10.1109/TPDS.2018.2872992

    Article  Google Scholar 

  16. McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE Computer Society Technical Committee on Computer Architecture Newsletter, pp. 19–25 (1995)

    Google Scholar 

  17. Petersen, W., Arbenz, P.: Introduction to parallel computing: a practical guide with examples in C. Oxford Texts in Applied and Engineering Mathematics. Oxford University Press, Oxford (2004)

    MATH  Google Scholar 

  18. Quinn, M.: Parallel Programming in C with MPI and OpenMP. McGraw-Hill Higher Education (2004)

    Google Scholar 

  19. Seo, S., Jo, G., Lee, J.: Performance characterization of the nas parallel benchmarks in opencl. In: IEEE International Symposium on Workload Characterization, pp. 137–148 (2011). https://doi.org/10.1109/IISWC.2011.6114174

  20. Sridharan, S., Gupta, G., Sohi, G.S.: Adaptive, efficient, parallel execution of parallel programs. ACM SIGPLAN Notices 49(6), 169–180 (2014)

    Article  Google Scholar 

  21. Suleman, M.A., Qureshi, M.K., Patt, Y.N.: Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPS. SIGARCH Comput. Archit. News 36(1), 277–286 (2008). https://doi.org/10.1145/1353534.1346317

    Article  Google Scholar 

  22. Taborda, D., Zdravkovic, L.: Application of a hill-climbing technique to the formulation of a new cyclic nonlinear elastic constitutive model. Comput. Geotech. 43, 80—91 (2012)

    Article  Google Scholar 

  23. Willhalm, T., Dementiev, R., Fay, P.: Intel performance counter monitor—a better way to measure cpu utilization. Tech. rep., Intel (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2019 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Lorenzon, A.F., Schneider Beck Filho, A.C. (2019). Case Study: DCT with Aurora. In: Parallel Computing Hits the Power Wall. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-030-28719-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-28719-1_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-28718-4

  • Online ISBN: 978-3-030-28719-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics