Skip to main content

Cache-Aware Roofline Model and Medical Image Processing Optimizations in GPUs

  • Conference paper
  • First Online:
  • 1309 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11203))

Abstract

When optimizing or porting applications to new architectures, a preliminary characterization is necessary to exploit the maximum computing power of the employed devices. Profiling tools are available for numerous architectures and programming models, making it easier to spot possible bottlenecks. However, for a better interpretation of the collected results, current profilers rely on insightful performance models. In this paper, we describe the Cache Aware Roofline Model (CARM) and tools for its generation to enable the performance characterization of GPU architectures and workloads. We use CARM to characterize two kernels that are part of a 3D iterative reconstruction application for Computed Tomography (CT). These two kernels take most of the execution time of the whole method, being therefore suitable for a deeper analysis. By exploring the model and the methodology proposed, the overall performance of the kernels has been improved up to two times compared to the previous implementations.

This work has been partially supported through grant TIN2016-79637-P “Towards unification of HPC and Big Data Paradigms” from the Spanish Ministry of Economy and Competitiveness, the COST Action IC1305 “Network for Sustainable Ultrascale Computing Platforms” (NESUS), grant FPU14/03875 from the Spanish Ministry of Education and by the Portuguese national funds through FCT under the projects UID/CEC/50021/2013 and LISBOA-01-0145-FEDER-031901 (PTDC/CCI-COM/31901/2017).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Using the compilation flag –maxrregcount=32.

References

  1. Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)

    Article  Google Scholar 

  2. Ilic, A., Pratas, F., Sousa, L.: Cache-aware Roofline model: upgrading the loft. IEEE Comput. Archit. Lett. 13(1), 21–24 (2014)

    Article  Google Scholar 

  3. Ilic, A., Pratas, F., Sousa, L.: Beyond the roofline: cache-aware power and energy-efficiency modeling for multi-cores. IEEE Trans. Comput. 66(1), 52–58 (2017)

    Article  MathSciNet  Google Scholar 

  4. Shinsel, A.: Intel Advisor Roofline (2017). https://software.intel.com/en-us/articles/intel-advisor-roofline. Accessed 02 Mar 2017

  5. Marques, D., et al.: Performance analysis with cache-aware roofline model in intel advisor. In: 2017 International Conference on High Performance Computing and Simulation (HPCS), pp. 898–907. IEEE (2017)

    Google Scholar 

  6. Lopes, A., Pratas, F., Sousa, L., Ilic, A.: Exploring GPU performance, power and energy-efficiency bounds with Cache-aware Roofline Modeling. In: 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 259–268. IEEE (2017)

    Google Scholar 

  7. Feldkamp, L., Davis, L., Kress, J.: Practical cone-beam algorithm. JOSA A 1(6), 612–619 (1984)

    Article  Google Scholar 

  8. de Molina, C., Serrano, E., Garcia-Blas, J., Carretero, J., Desco, M., Abella, M.: Gpu-accelerated iterative reconstruction for limited-data tomography in CBCT systems. BMC Bioinform. 19(1), 171 (2018)

    Article  Google Scholar 

  9. Abella, M., et al.: FUX-Sim: implementation of a fast universal simulation/reconstruction framework for X-ray systems. PLOS ONE 12(7), 1–22 (2017)

    Article  Google Scholar 

  10. Weaver, V.M.: Linux perf\_event features and overhead. In: The 2nd International Workshop on Performance Analysis of Workload Optimized Systems, FastPath, vol. 13 (2013)

    Google Scholar 

  11. Dongarra, J., et al.: Performance application programming interface

    Google Scholar 

  12. NVidia, “NVidia Profiler.” http://docs.nvidia.com/cuda/profiler-users-guide/index.html

  13. Kim, K.-H., Kim, K., Park, Q.-H.: Performance analysis and optimization of three-dimensional FDTD on GPU using roofline model. Comput. Phys. Commun. 182(6), 1201–1207 (2011)

    Article  Google Scholar 

  14. Carvalho, P., Drummond, L.M.A., Bentes, C., Clua, E., Cataldo, E., Marzulo, L.A.J.: Analysis and characterization of GPU benchmarks for kernel concurrency efficiency. In: Mocskos, E., Nesmachnow, S. (eds.) CARLA 2017. CCIS, vol. 796, pp. 71–86. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73353-1_5

    Chapter  Google Scholar 

  15. Ryoo, J.H., Quirem, S.J., Lebeane, M., Panda, R., Song, S., John, L.K.: GPGPU benchmark suites: how well do they sample the performance spectrum? In: 2015 44th International Conference on Parallel Processing (ICPP), pp. 320–329. IEEE (2015)

    Google Scholar 

  16. Che, S., Skadron, K.: BenchFriend: correlating the performance of GPU benchmarks. Int. J. High Perform. Comput. Appl. 28(2), 238–250 (2014)

    Article  Google Scholar 

  17. Lopez-Novoa, U., Mendiburu, A., Miguel-Alonso, J.: A survey of performance modeling and simulation techniques for accelerator-based computing. IEEE Trans. Parallel Distrib. Syst. 26(1), 272–281 (2015)

    Article  Google Scholar 

  18. Jia, H., Zhang, Y., Long, G., Xu, J., Yan, S., Li, Y.: GPURoofline: a model for guiding performance optimizations on GPUs. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 920–932. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32820-6_90

    Chapter  Google Scholar 

  19. Konstantinidis, E., Cotronis, Y.: A practical performance model for compute and memory bound GPU kernels. In: 2015 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 651–658. IEEE (2015)

    Google Scholar 

  20. Nugteren, C., van den Braak, G.-J., Corporaal, H.: Roofline-aware DVFS for GPUs. In: Proceedings of International Workshop on Adaptive Self-tuning Computing Systems, p. 8. ACM (2014)

    Google Scholar 

  21. Wong, H., Papadopoulou, M.-M., Sadooghi-Alvandi, M., Moshovos, A.: Demystifying GPU microarchitecture through microbenchmarking. In: 2010 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 235–246. IEEE (2010)

    Google Scholar 

  22. Mei, X., Chu, X.: Dissecting GPU memory hierarchy through microbenchmarking. IEEE Trans. Parallel Distrib. Syst. 28(1), 72–86 (2017)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Estefania Serrano .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Serrano, E., Ilic, A., Sousa, L., Garcia-Blas, J., Carretero, J. (2018). Cache-Aware Roofline Model and Medical Image Processing Optimizations in GPUs. In: Yokota, R., Weiland, M., Shalf, J., Alam, S. (eds) High Performance Computing. ISC High Performance 2018. Lecture Notes in Computer Science(), vol 11203. Springer, Cham. https://doi.org/10.1007/978-3-030-02465-9_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-02465-9_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02464-2

  • Online ISBN: 978-3-030-02465-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics