High Performance Embedded Architectures and Compilers

Volume 5409 of the series Lecture Notes in Computer Science pp 404-418

Parallel H.264 Decoding on an Embedded Multicore Processor

  • Arnaldo AzevedoAffiliated withDelft University of Technology
  • , Cor MeenderinckAffiliated withDelft University of Technology
  • , Ben JuurlinkAffiliated withDelft University of Technology
  • , Andrei TerechkoAffiliated withNXP
  • , Jan HoogerbruggeAffiliated withNXP
  • , Mauricio AlvarezAffiliated withTechnical University of Catalonia (UPC)
  • , Alex RamirezAffiliated withTechnical University of Catalonia (UPC)Barcelona Supercomputing Center (BSC)

* Final gross prices may vary according to local VAT.

Get Access


In previous work the 3D-Wave parallelization strategy was proposed to increase the parallel scalability of H.264 video decoding. This strategy is based on the observation that inter-frame dependencies have a limited spatial range. The previous results, however, investigate application scalability on an idealized multiprocessor. This work presents an implementation of the 3D-Wave strategy on a multicore architecture composed of NXP TriMedia TM3270 embedded processors. The results show that the parallel H.264 implementation scales very well, achieving a speedup of more than 54 on a 64-core processor. Potential drawbacks of the 3D-Wave strategy are that the memory requirements increase since there can be many frames in flight, and that the latencies of some frames might increase. To address these drawbacks, policies to reduce the number of frames in flight and the frame latency are also presented. The results show that our policies combat memory and latency issues with a negligible effect on the performance scalability.