Frameworks for Multi-core Architectures: A Comprehensive Evaluation Using 2D/3D Image Registration

  • Richard Membarth
  • Frank Hannig
  • Jürgen Teich
  • Mario Körner
  • Wieland Eckert
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6566)


The development of standard processors changed in the last years moving from bigger, more complex, and faster cores to putting several more simple cores onto one chip. This changed also the way programs are written in order to leverage the processing power of multiple cores of the same processor. In the beginning, programmers had to divide and distribute the work by hand to the available cores and to manage threads in order to use more than one core. Today, several frameworks exist to relieve the programmer from such tasks. In this paper, we present five such frameworks for parallelization on shared memory multi-core architectures, namely OpenMP, Cilk++, Threading Building Blocks, RapidMind, and OpenCL. To evaluate these frameworks, a real world application from medical imaging is investigated, the 2D/3D image registration. In an empirical study, a fine-grained data parallel and a coarse-grained task parallel parallelization approach are used to evaluate and estimate different aspects like usability, performance, and overhead of each framework.


Parallelization Frameworks Evaluation Medical Imaging 2D/3D Image Registration OpenMP Cilk++ Threading Building Blocks RapidMind OpenCL 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amdahl, G.: Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities. In: Proceedings of the AFIPS Spring Joint Computing Conference, pp. 483–485. ACM, New York (1967)Google Scholar
  2. 2.
    Blumofe, R., Joerg, C., Kuszmaul, B., Leiserson, C., Randall, K., Zhou, Y.: Cilk: An Efficient Multithreaded Runtime System. ACM SigPlan Notices 30(8), 207–216 (1995)CrossRefGoogle Scholar
  3. 3.
    Dagum, L., Menon, R.: OpenMP: An Industry Standard API for Shared-memory Programming. IEEE Computational Science & Engineering 5(1), 46–55 (2002)CrossRefGoogle Scholar
  4. 4.
    Kegel, P., Schellmann, M., Gorlatch, S.: Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-cores. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 654–665. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  5. 5.
    Kejariwal, A., Nicolau, A., Banerjee, U., Veidenbaum, A., Polychronopoulos, C.: Cache-Aware Iteration Space Partitioning. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pp. 269–270. ACM, Salt Lake (2008)Google Scholar
  6. 6.
    Kubias, A., Deinzer, F., Feldmann, T., Paulus, S., Paulus, D., Schreiber, B., Brunner, T.: 2D/3D Image Registration on the GPU. International Journal of Pattern Recognition and Image Analysis 18(3), 381–389 (2008)CrossRefGoogle Scholar
  7. 7.
    Leiserson, C.: The Cilk++ Concurrency Platform. In: Proceedings of the 46th Annual Design Automation Conference, pp. 522–527. ACM, New York (2009)Google Scholar
  8. 8.
    McCool, M., Du Toit, S.: Metaprogramming GPUs with Sh. AK Peters, Ltd, Stanford (2004)Google Scholar
  9. 9.
    Membarth, R., Hannig, F., Teich, J., Körner, M., Eckert, W.: Comparison of Parallelization Frameworks for Shared Memory Multi-Core Architectures. In: Proceedings of the Embedded World Conference, Nuremberg, Germany (March 2010)Google Scholar
  10. 10.
    Muchnick, S.: Advanced Compiler Design and Implementation. Morgan Kaufmann, San Francisco (1997)Google Scholar
  11. 11.
    Munshi, A.: The OpenCL Specification. Khronos OpenCL Working Group (2009)Google Scholar
  12. 12.
    Olivier, S., Prins, J.: Comparison of OpenMP 3.0 and Other Task Parallel Frameworks on Unbalanced Task Graphs. International Journal of Parallel Programming, 1–20 (2010)Google Scholar
  13. 13.
    RapidMind: RapidMind Development Platform Documentation. RapidMind Inc. (June 2009)Google Scholar
  14. 14.
    Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O’Reilly Media, Inc., Sebastopol (2007)Google Scholar
  15. 15.
    Trucco, E., Verri, A.: Introductory Techniques for 3-D Computer Vision. Prentice-Hall, New Jersey (1998)Google Scholar
  16. 16.
    Weese, J., Penney, G., Desmedt, P., Buzug, T., Hill, D., Hawkes, D.: Voxel-Based 2-D/3-D Registration of Fluoroscopy Images and CT Scans for Image-Guided Surgery. IEEE Transactions on Information Technology in Biomedicine 1(4), 284–293 (1997)CrossRefGoogle Scholar
  17. 17.
    Wolfe, M.: High Performance Compilers for Parallel Computing. Addison-Wesley, Reading (1996)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Richard Membarth
    • 1
  • Frank Hannig
    • 1
  • Jürgen Teich
    • 1
  • Mario Körner
    • 2
  • Wieland Eckert
    • 2
  1. 1.Hardware/Software Co-Design, Department of Computer ScienceUniversity of Erlangen-NurembergGermany
  2. 2.Siemens Healthcare Sector, H IM AXForchheimGermany

Personalised recommendations