Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Parallel implementations of the 3D fast wavelet transform on a Raspberry Pi 2 cluster

  • 311 Accesses

  • 3 Citations


Current low-cost general-purpose single-board computing (SDC) devices are gaining increasing interests in research computing due to their very low cost/performance ratio and energy consumption. Among all the SDCs available nowadays, Raspberry Pi devices constitute maybe the most renowned representatives. On the other hand, the wavelet transform plays an important role in contemporary standards for image compression (such as JPEG-2000) and video compression (MPEG-4). In this work, we present and evaluate three parallelization strategies of the 3D fast wavelet transform (3D-FWT) on a cluster of Raspberry Pi 2 SDCs. Each parallelization strategy has been implemented using both POSIX Threads (shared memory) and MPI (message passing). The set of implementations using POSIX Threads is restricted to runs on a single board, whereas multiple boards can be used for the MPI versions. We find out that noticeable speed-ups can be obtained when all MPI processes or POSIX Threads are run using the cores of a single Raspberry Pi 2 SDC. However, in the case of the MPI versions, we observe that performance drops drastically when all MPI processes spread to several boards. The reason for this is the limited bandwidth that the onboard LAN port can deliver, and that proves insufficient for the fine-grained, high-volume communication requirements of the studied parallelization strategies. Finally, we have also considered the execution of the POSIX Threads and MPI versions on a very high-performance but power-hungry 4-core Intel Xeon CPU E5606, obtaining that the Raspberry Pi 2 SDC can do the task with much lower total energy consumption (up to 4 times).

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5


  1. 1.

    A Pi Zero with smaller footprint and limited IO (GPIO) capabilities was released in November 2015 for US$5.


  1. 1.

    Membrey P, Hows D (2015) Learn Raspberry Pi 2 with Linux and Windows 10, 2nd edn. Apress, New York

  2. 2.

    Hague A, Hastings G, Klling M, Croston B, Oldknow A, Lockwood B, Beale C (2012) The Raspberry pi education manual version 1.0. Computing at School, Creative Commons License

  3. 3.

    Heeks R, Robinson A (2013) Ultra-low-cost computing and developing countries. Commun ACM 56(8):22–24

  4. 4.

    Schot N (2015) Feasibility of raspberry pi 2 based micro data centers in big data applications. In: Proceedings of the 23rd Twenty Student Conference on IT

  5. 5.

    Antonini M, Barlaud M (1992) Image coding using wavelet transform. IEEE Trans Image Process 1(2):205–220

  6. 6.

    Lewis AS, Knowles G (1992) Image compression using the 2D wavelet transform. IEEE Trans Image Process 1(2):244–256

  7. 7.

    Shapiro JM (1993) Embedded image coding using zerotrees of wavelets coefficients. IEEE Trans Signal Process 41(12):3445–3462

  8. 8.

    Marcellin MW, Gormish MJ, Bilgin A, Boliek MP (2000) An overview of JPEG-2000. In: Proceedings of Data Compression Conference

  9. 9.

    Santa-Cruz D, Ebrahimi T (2000) A study of JPEG 2000 still image coding versus others standards. In: Proceedings of X European Signal Processing Conference

  10. 10.

    Chen Y, Pearlman WA (1996) Three-dimensional subband coding of video using the zero-tree method. In: Proceedings of SPIE-Visual Communications and Image Processing, pp 1302–1310

  11. 11.

    Kim Y, Pearlman WA (2000) Stripe-based SPIHT Lossy compression of volumetric medical images for low memory usage and uniform reconstruction quality. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing, pp 2031–2034

  12. 12.

    Battista S, Casalino F, Lande C (1999) MPEG-4: a multimedia standard for the third millenium, Part 1. IEEE Multimed 6(4):74–83

  13. 13.

    Battista S, Casalino F, Lande C (2000) MPEG-4: a multimedia standard for the third millenium, Part 2. IEEE Multimed 7(1):76–84

  14. 14.

    Bernabé G, García JM, González J (2005) Reducing 3D wavelet transform execution time using blocking and the streaming SIMD extensions. J VLSI Signal Process 41(2):209–223

  15. 15.

    Bernabé G, Guerrero G, Fernández J (2012) CUDA and OpenCL implementations of 3D fast wavelet transform. In: 3rd IEEE Latin American symposium on circuits and systems, Playa del Carmen, Mexico

  16. 16.

    Franco J, Bernabé G, Fernández J, Acacio ME (2009) A parallel implementation of the 2D wavelet transform using CUDA. In: 17th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, Weimar

  17. 17.

    Franco J, Bernabé G, Fernández J, Ujaldón M (2012) The 2D wavelet transform on emerging architectures: GPUs and Multicores. J Real-Time Image Process 3:145–152. doi:10.1007/s11554-011-0224-7

  18. 18.

    Franco J, Bernabé G, Fernández J, Ujaldn M (2010) Parallel 3D fast wavelet transform on manycore GPUs and multicore CPUs. In: 10th International Conference on Computational Science, Amsterdam

  19. 19.

    Bernabé G, Fernández R, García JM, Acacio ME, González J (2007) An efficient implementation of a 3D wavelet transform based encoder on hyper-threading technology. J Parallel Comput 33(1):54–72

  20. 20.

    Butenhof D (1997) Programming with POSIX threads. Addison-Wesley Professional, Melbourne

  21. 21.

    Gropp W, Lusk E, Skjellum A (1999) Using MPI, Second Edition edn. MIT Press, Massachusetts

  22. 22.

    Ramesh M, Ragi G, Abishek T (2012) Low-power intelligent wearable cardiac sensor using discrete wavelet compression. In: Proceedings of 2012 International Conference on Advances in Mobile Networks, Communication and its Applications, pp 107–110

  23. 23.

    Navarro AA, Vélez JA, Satizabal JE, Múnera LE, Bernabé G (2003) virtual surgical telesimulations in ophtalmology. In: 17th International Congress on Computer Assisted Radiology and Surgery (CARS 2003), London

  24. 24.

    Vélez JA, Navarro AA, Roche CADL, Múnera LE, Bernabé G, Bermudez C, Jiménez JF, Kopec A (2003) A virtual surgical telesimulations in micrographic dermatologic surgery (MOHS). In: 17th International Congress on Computer Assisted Radiology and Surgery (CARS 2003), London

  25. 25.

    Navarro AA, Hernández CJ, Vélez JA, Múnera LE, Bernabé G, Gamboa CA, Reyes AJ (2005) Virtual surgical telesimulations in otolaryngology. In: 13th annual medicine meets virtual reality, Long Beach

  26. 26.

    Mallat S (1989) A theory for multiresolution signal descomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell 11(7):674–693

  27. 27.

    Bernabé G, González J, García JM, Duato J (2000) A New Lossy 3-D wavelet transform for high-quality compression of medical video. In: Proceedings of IEEE EMBS International Conference on Information Technology Applications in Biomedicine, pp 226–231

  28. 28.

    Daubechies I (1992) Ten lectures on wavelets. Society for Industrial and Applied Mathematics, Philadelphia

  29. 29.

    Meerwald P, Norcen R, Uhl A (2002) Cache issues with JPEG2000 wavelet lifting. In: Proceedings of Visual Communications and Image Processing Conference, pp 626–634

  30. 30.

    Shahbahrami A, Juurlink B, Vassiliadis S (2006) Improving the memory behavior of vertical filtering in the discrete wavelet transform. In: Proceedings of ACM Conference in Computing Frontiers, pp 253–260

  31. 31.

    Tao J, Shahbahrami A, Juurlink B, Buchty R, Karl W, Vassiliadis S (2007) Optimizing cache performance of the discrete wavelet transform using a visualization tool. In: Proceedings of IEEE International Symposium on Multimedia, pp 153–160

Download references


This work was supported by the Spanish MINECO, as well as by European Commission FEDER funds, under Grant TIN2015-66972-C5-3-R.

Author information

Correspondence to Gregorio Bernabé.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bernabé, G., Hernández, R. & Acacio, M.E. Parallel implementations of the 3D fast wavelet transform on a Raspberry Pi 2 cluster. J Supercomput 74, 1765–1778 (2018).

Download citation


  • General-purpose single-board computing (SDC)
  • Raspberry Pi 2
  • 3D fast wavelet transform (3D-FWT)
  • Parallelization strategies
  • MPI
  • POSIX Threads
  • Speed-up