International Journal of Parallel Programming

, Volume 42, Issue 2, pp 239–264 | Cite as

Parallelization of Full Search Motion Estimation Algorithm for Parallel and Distributed Platforms

  • Eduarda Monteiro
  • Bruno Vizzotto
  • Cláudio Diniz
  • Marilena Maule
  • Bruno Zatt
  • Sergio Bampi
Article

Abstract

This work presents an efficient method to map the Full Search algorithm for Motion Estimation (ME) onto General Purpose Graphic Processing Unit (GPGPU) architectures using Compute Unified Device Architecture (CUDA) programming model. Our method jointly exploits the massive parallelism available in current GPGPU devices and the parallelism potential of Full Search algorithm. Our main goal is to evaluate the feasibility of video codecs implementation using GPGPUs and its advantages and drawbacks compared to other platforms. Therefore, for comparison reasons, three solutions were developed using distinct programming paradigms for distinct underlying hardware architectures: (i) a sequential solution for general-purpose processor (GPP); (ii) a parallel solution for multi-core GPP using OpenMP library; (iii) a distributed solution for cluster/grid machines using Message Passing Interface (MPI) library. The CUDA-based solution for GPGPUs achieves speed-up compatible to the indicated by the theoretical model for different search areas. Our GPGPU Full Search Motion Estimation provides 2×, 20× and 1664× speed-up when compared to MPI, OpenMP and sequential implementations, respectively. Compared to state-of-the-art, our solution reaches up to 17× speed-up.

Keywords

Motion estimation Block matching GPU CUDA OpenMP MPI 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    ITU-T Recommendation H.261: Video Codec for Audiovisual Services at p×64 kbit/s, Version 1, ITU-T (1990)Google Scholar
  2. 2.
    ITU-T Recommendation H.264/AVC (03/10): Advanced Video Coding for Generic Audiovisual Services (2010)Google Scholar
  3. 3.
    Bhaskaran V., Konstantinides K.: Image and Video Compression Standards: Algorithms and Architectures, 2nd edn. Kluwer, Boston (1999)Google Scholar
  4. 4.
    Lin, C., Leou, J.: An adaptative fast full search motion estimation algorithm for H.264. In: Proceedings of the [S.l.]: IEEE, ISCAS 2005-IEEE International Symposium Circuits and Systems, pp. 1493–1496 (2005)Google Scholar
  5. 5.
    Huang Y-W., Chen C-Y., Tsai C-H., Shen C-F., Chen L-G.: “Survey on Block Matching Motion Estimation Algorithms and Architectures with New Results”. The Journal of VLSI Signal Processing 42(3), 297–320 (2006)CrossRefMATHGoogle Scholar
  6. 6.
    Yang, S., Lin, T., Chien, S.: Real-time motion estimation for 1080p videos on graphics processing units with shared memory optimization. In: IEEE Workshop on Signal Processing Systems, 2009, SiPS 2009, pp. 297–302, 7–9 Oct (2009)Google Scholar
  7. 7.
    Tan M., Siegel J.M., Siegel H.J.: Parallel Implementations of Block-Based Motion Vector Estimation for Video Compression on Four Parallel Processing Systems. International Journal of Parallel Programming 27(3), 195–225 (1999)CrossRefGoogle Scholar
  8. 8.
    Baglietto, P., Maresca, M., Migliaro, A., Migliardi, M.: Parallel implementation of the full search block matching algorithm for motion estimation. In: International Conference on Application Specific Array Processors, pp. 182–192, July (1995)Google Scholar
  9. 9.
    GPGPU: General purpose computation on graphics hardware. http://gpgpu.org. Accessed Mar 2012
  10. 10.
    Nvidia Fermi: NVIDIA’s next generation CUDATM compute architecture, Fermi. http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf (2009). Accessed 14 Dec 201
  11. 11.
    Nvidia: NVIDIA Corporation. http://www.nvidia.com. Accessed 14 Dec (2011)
  12. 12.
    Nvidia Cuda: NVIDIA CUDA Programming Guide. http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/NVIDIA_CUDA_ProgrammingGuide.pdf (2011). Accessed 14 Dec 2011
  13. 13.
    OpenMP: The OpenMP API specification for parallel programming. Available at http://openmp.org/wp/
  14. 14.
    MPI: The Message Passing Interface (MPI) standard. http://www.mcs.anl.gov/research/projects/mpi/. Accessed 14 Dec 2011
  15. 15.
    Kuhn, P.: Algorithms, Complexity Analysis and VLSI Architectures for MPEG4 Motion Estimation. Kluwer, Boston, p. 239, ISBN:0-7923-8516-0 (1999)Google Scholar
  16. 16.
    Suhring, K.: JM H.264/AVC Reference Software version 14.2: http://iphome.hhi.de/suehring/tml/download/. Accessed 14 Dec 2011
  17. 17.
    x264 codec: http://www.videolan.org/developers/x264.html. Accessed 14 Dec 2011
  18. 18.
    Chen, W.-N., Hang, H.-M.: H.264/AVC motion estimation implementation on compute unified device architecture (CUDA). In: IEEE International Conference on Multimedia and Expo (ICME), pp. 697–700 (2008)Google Scholar
  19. 19.
    Lin, Y.-C., Li, P.-L, Chang, C.-H., Wu, C.-L., Tsao, Y.-M., Chien, S.-Y.: Multi-pass algorithm of motion estimation in video encoding for generic GPU. In: IEEE International Symposium on Circuits and Systems (ISCAS), pp. 4451–4454 (2006)Google Scholar
  20. 20.
    Lee, C.-Y., Lin, Y.-C., Wu, C.-L., Chang, C.-H., Tsao, Y.-M., Chien, S.-Y.: Multi-pass and frame parallel algorithms of motion estimation in H.264/AVC for Generic GPU. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 1603–1606 (2007)Google Scholar
  21. 21.
    Kung, M.C., Au, O.C, Wong, P.H.W., Chun, L.H.: Block based parallel motion estimation using programmable graphics hardware. In: Proceedings of IEEE International Conference on Audio, Language and Image Processing (ICALIP), pp. 7–9, Shanghai, China (2008)Google Scholar
  22. 22.
    Cheng, R., Yang, E. Liu, T.: speeding up motion estimation algorithms on CUDA technology. In: Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics (PrimeAsia), 2010, pp. 93–96, 22–24 September (2010)Google Scholar
  23. 23.
    Colic, A., Kalva, H., Furht, B.: Exploring NVIDIA-CUDA for video coding. In: Proceedings of the First Annual ACM SIGMM Conference on Multimedia systems (MMSys ’10), pp. 13–22. ACM, New York, NY, USA (2010)Google Scholar
  24. 24.
    Yang K.-M., Sun M.-T., Wu L.: A family of VLSI designs for the motion compensation block-matching algorithm. IEEE Transactions on Circuits and Systems 36(10), 1317–1325 (1989)CrossRefGoogle Scholar
  25. 25.
    Xiru Cluster: Xiru Cluster member of Grid’5000. http://gppd.inf.ufrgs.br/cms/gppd/?q=en/resources-list. Accessed Mar 2012
  26. 26.
    Thrust: Thrust-Code at the speed of light. http://code.google.com/p/thrust/wiki/QuickStartGuide. Accessed 14 Dec 2011
  27. 27.
    Grid’5000: http://www.grid5000.fr/. Accessed 14 Dec 2011
  28. 28.
    GPU Direct: Nvidia GPU Direct. http://developer.nvidia.com/gpudirect. Accessed June 2012

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Eduarda Monteiro
    • 1
  • Bruno Vizzotto
    • 1
  • Cláudio Diniz
    • 1
  • Marilena Maule
    • 1
  • Bruno Zatt
    • 1
  • Sergio Bampi
    • 1
  1. 1.Informatics Institute, PPGC, PGMICROFederal University of Rio Grande do Sul (UFRGS)Porto AlegreBrazil

Personalised recommendations