Performance Analysis of Existing SIMD Architectures

  • Chao CuiEmail author
  • Xian Zhang
  • Zhicheng Jin
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1146)


SIMD (Single Instruction Multiple Data) architectures are widely used in application domains like the wireless communication, video and audio processing, and control engineering. The abundant data parallelism makes the SIMD architecture the proper match in data processing and performance improvement. However, there are also critical inefficiencies in current SIMD architectures. To understand such inefficiency, we carry out a deep investigation in the main components of Long Term Evolution (LTE) protocol, which is an important wireless communication protocol. Performance investigation is taken on a cycle-accurate simulator, featuring the main characteristics of existing SIMD architectures. Based on the investigation, we locate the inefficiencies in two aspects: the data communication operations among different processing units and the support for matrix-style computations. We have also carried out studies with enhanced SIMD architectures in the above two aspects. The overall performance of SIMD architectures can be greatly improved.


SIMD Inefficiency Communication 



We thank the anonymous reviewers for their valuable work. We greatly improve our work based on the reviews. We also thank Xiaohui Yang and Dongdeng Tang of National University of Defense Technology for their kindly help in building the experimental platform and feedbacks on the performance evaluation.


  1. 1.
    Ghandi, M., Heil, S., et, al.: A configurable cloud-scale DNN processor for real-time AI. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, Series ISCA 2018 (2018).
  2. 2.
    Tagliavini, G., Mach, S., Rossi, D., Marongiu, A., Benini, L.: Design and evaluation of SmallFloat SIMD extensions to the RISC-V ISA. In: Design, Automation & Test in Europe Conference & Exhibition (DATE) (2019).
  3. 3.
    Malkowsky, S., Prabhu, H., Liu, L., Edfors, O., Öwall, V.: A programmable 16-lane SIMD ASIP for massive MIMO. In: IEEE International Symposium on Circuits and Systems (ISCAS) (2019).
  4. 4.
    Khailany, B., et al.: Imagine: media processing with streams. IEEE Micro 21(2), 35–46 (2001). Scholar
  5. 5.
    Woh, M., Seo, S., Mahlke, S., Mudge, T., Chakrabarti, C., Flautner, K.: AnySP: anytime anywhere anyway signal processing. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, Series ISCA 2009 (2009). Scholar
  6. 6.
    Krashinsky, B., Batten, C., et al.: The vector-thread architecture. IEEE Micro 24(6), 84–90 (2004). Scholar
  7. 7.
    Lee, Y., Avizienis, R., et al.: Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators. In: International Symposium on Computer Architecture, pp. 129–140 (2011).
  8. 8.
    Lin, Y., Lee, H., et al.: SODA: a low-power architecture for software radio. In: Proceedings of the 33rd Annual International Symposium on Computer Architecture, Series ISCA 2006 (2006).
  9. 9.
    Wang, Y., Chen, S., et al.: A multiple SIMD, multiple data (MSMD) architecture: parallel execution of dynamic and static SIMD fragments. In: IEEE International Symposium on High Performance Computer Architectures (HPCA) (2013).
  10. 10.
    Wang, Y., Chen, S., et al.: Instruction shuffle: achieving MIMD-like performance on SIMD architectures. IEEE Comput. Archit. Lett. 11(2), 37–40 (2012). Scholar
  11. 11.
    Physical channels and modulation, 3GPP TS 36.211, European, Telecommunications, Standards, Institute (2013)Google Scholar
  12. 12.
    Multiplexing and channel coding, 3GPP TS 36.212, European, Telecommunications, Standards, Institute (2013)Google Scholar
  13. 13.
    Kanders, H., Mellqvist, T., Garrido, M., Palmkvist, K., Gustafsson, O.: A 1 million-point FFT on a single FPGA. IEEE Trans. Circuits Syst. I: Regul. Pap. (2019). Scholar
  14. 14.
    Incremona, A., De Nicolao, G.: Spectral characterization of the multi-seasonal component of the Italian electric load: a LASSO-FFT approach. IEEE Control Syst. Lett. (2019). Scholar
  15. 15.
    Buzachis, A., Galletta, A., Celesti, A., Fazio, M., Villari, M.: Development of a smart metering microservice based on Fast Fourier Transform (FFT) for Edge/Internet of Things Environments. In: IEEE 3rd International Conference on Fog and Edge Computing (ICFEC) (2019).
  16. 16.
    Liu, S., Chen, H., Wan, J., Wang, Y.: Mod (2P-1) shuffle memory-access instructions for FFTs on vector SIMD DSPs. In: IEEE Computer Society Annual Symposium on VLSI (ISVLSI) (2016).
  17. 17.
    Yang, C., Chen, S., Zhang, J., Lv, Z., Wang, Z.: A novel DSP architecture for scientific computing and deep learning. IEEE Access 7, 36413–36425 (2019). Scholar
  18. 18.
    Chen, S., et al.: FT-Matrix: a coordination-aware architecture for signal processing. IEEE Micro 34(6) (2014). Scholar
  19. 19.
    Wang, Y., Chen, X., Wang, D., Liu, S.: Dynamic per-warp reconvergence stack for efficient control flow handling in GPUs. In: IEEE Computer Society Annual Symposium on VLSI (ISVLSI) (2016).
  20. 20.
    Dong, W., Rongcai, Z., Qi, W., Yingying, L.: Outer-loop auto-vectorization for SIMD architectures based on Open64 compiler. In: 17th International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT), pp. 19–23 (2016).

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Beijing Institute of Control EngineeringBeijingChina
  2. 2.National University of Defense TechnologyChangshaChina

Personalised recommendations