Fast Wavelet Transform Utilizing a Multicore-Aware Framework

Stürmer, Markus; Köstler, Harald; Rüde, Ulrich

doi:10.1007/978-3-642-28145-7_31

Markus Stürmer¹⁶,
Harald Köstler¹⁶ &
Ulrich Rüde¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7134))

Included in the following conference series:

International Workshop on Applied Parallel Computing

1781 Accesses
2 Citations

Abstract

The move to multicore processors creates new demands on software development in order to profit from the improved capabilities. Most important, algorithm and code must be parallelized wherever possible, but also the growing memory wall must be considered. Additionally, high computational performance can only be reached if architecture-specific features are made use of. To address this complexity, we developed a C++ framework that simplifies the development of performance-optimized, parallel, memory-efficient, stencil-based codes on standard multicore processors and the heterogeneous Cell processor developed jointly by Sony, Toshiba, and IBM. We illustrate the implementation and optimization of the Fast Wavelet Transform and its inverse for Haar wavelets within our hybrid framework, using OpenMP, and using the Open Compute Language, and analyze performance results for different platforms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abschlussbericht des Projekts Ru 422/7-5 (DiME-2). Lehrstuhl für Informatik 10 (Systemsimulation), Friedrich-Alexander-Universität Erlangen-Nürnberg (2008)
Google Scholar
Christen, M., Schenk, O., Neufeld, E., Messmer, P., Burkhart, H.: Parallel data-locality aware stencil computations on modern micro-architectures. In: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, pp. 1–10. IEEE Computer Society (2009)
Google Scholar
Datta, K., Kamil, S., Williams, S., Oliker, L., Shalf, J., Yelick, K.: Optimization and performance modeling of stencil computations on modern microprocessors. SIAM Review 51(1), 129–159 (2009)
Article MATH Google Scholar
Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., Yelick, K.: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008, pp. 1–12 (2009)
Google Scholar
Franco, J., Bernabé, G., Fernández, J., Acacio, M.: A Parallel Implementation of the 2D Wavelet Transform Using CUDA. In: Parallel, Distributed and Network-Based Processing, pp. 111–118 (2009)
Google Scholar
Franco, J., Bernabé, G., Fernández, J., Ujaldón, M.: Parallel 3D fast wavelet transform on manycore GPUs and multicore CPUs. Procedia Computer Science 1(1), 1095–1104 (2010)
Article Google Scholar
Garcia, A., Shen, H.: GPU-based 3D wavelet reconstruction with tileboarding. The Visual Computer 21(8), 755–763 (2005)
Article Google Scholar
Haar, A.: Zur Theorie der orthogonalen Funktionensysteme. Mathematische Annalen 69, 331–371 (1910)
Article MathSciNet MATH Google Scholar
International Business Machines Corporation, Sony Computer Entertainment Incorporated, Toshiba Corporation: Cell Broadband Engine Architecture 1.02 (2007)
Google Scholar
Kowarschik, M.: Data Locality Optimizations for Iterative Numerical Algorithms and Cellular Automata on Hierarchical Memory Architectures (2004)
Google Scholar
McKinley, K.S., Carr, S., Tseng, C.W.: Improving data locality with loop transformations. ACM Trans. Program. Lang. Syst. 18(4), 424–453 (1996)
Article Google Scholar
Mohiyuddin, M., Hoemmen, M., Demmel, J., Yelick, K.: Minimizing communication in sparse matrix solvers. In: SC 2009: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. pp. 1–12. ACM, New York (2009)
Google Scholar
Ohshima, S., Hirasawa, S., Honda, H.: OMPCUDA: OpenMP Execution Framework for CUDA Based on Omni OpenMP Compiler. In: Beyond Loop Level Parallelism in OpenMP: Accelerators, Tasking and More, pp. 161–173 (2010)
Google Scholar
Stürmer, M., Rüde, U.: A framework that supports in writing performance-optimized stencil-based codes. Tech. Rep. 10-5, Lehrstuhl für Informatik 10 (Systemsimulation), Friedrich-Alexander-Universität Erlangen-Nürnberg (2010)
Google Scholar
Tenllado, C., Setoain, J., Prieto, M., et al.: Parallel implementation of the 2d discrete wavelet transform on graphics processing units: Filter bank versus lifting. IEEE Transactions on Parallel and Distributed Systems 19(3), 299–310 (2008)
Article Google Scholar
Weiß, C.: Data Locality Optimizations for Multigrid Methods on Structured Grids. Ph.D. thesis, Lehrstuhlr für Rechnertechnik und Rechnerorganisation, Institut für Informatik, Technische Universität München, Munich, Germany (2001)
Google Scholar
Wellein, G., Hager, G., Zeiser, T., Wittmann, M., Fehske, H.: Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization. In: Proceedings of the 2009 33rd Annual IEEE International Computer Software and Applications Conference, vol. 01, pp. 579–586. IEEE Computer Society (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

System Simulation Group, University of Erlangen-Nuremberg, Germany
Markus Stürmer, Harald Köstler & Ulrich Rüde

Authors

Markus Stürmer
View author publications
You can also search for this author in PubMed Google Scholar
Harald Köstler
View author publications
You can also search for this author in PubMed Google Scholar
Ulrich Rüde
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Kristján Jónasson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stürmer, M., Köstler, H., Rüde, U. (2012). Fast Wavelet Transform Utilizing a Multicore-Aware Framework. In: Jónasson, K. (eds) Applied Parallel and Scientific Computing. PARA 2010. Lecture Notes in Computer Science, vol 7134. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28145-7_31

Download citation

DOI: https://doi.org/10.1007/978-3-642-28145-7_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28144-0
Online ISBN: 978-3-642-28145-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics