Memory Bandwidth: The True Bottleneck of SIMD Multimedia Performance on a Superscalar Processor

Purchase on

$29.95 / €24.95 / £19.95*

* Final gross prices may vary according to local VAT.

Get Access


This paper presents the performance of DSP, image and 3D applications on recent general-purpose microprocessors using streaming SIMD ISA extensions (integer and floating point). The 9 benchmarks benchmark we use for this evaluation have been optimized for DLP and caches use with SIMD extensions and data prefetch. The result of these cumulated optimizations is a speedup that ranges from 1.9 to 7.1.

All the benchmarks were originaly computation bound and 7 becomes memory bandwidth bound with the addition of SIMD and data prefetch. Quadrupling the memory bandwidth has no effect on original kernels but improves the performance of SIMD kernels by 15–55%.