Skip to main content

An Efficient Vector Memory Unit for SIMD DSP

  • Conference paper
Computer Engineering and Technology (NCCET 2014)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 491))

Included in the following conference series:

  • 714 Accesses

Abstract

The SIMD DSP is highly efficient for embedded applications whose parallel data are aligned. However, there are many unaligned and irregular data accesses in typical embedded algorithms such as FFT, FIR. The vectorization of these kinds of algorithms will need many additional shuffle instruction operations in the SIMD architecture with alignment restriction, which greatly decreases the computation efficiency with the increasing SIMD width. This paper proposes an efficient vector memory unit (VMU) with 16 memory blocks on a 16-way SIMD DSP, M-DSP. Each memory block contains four groups of multi-bank memory structure with most-lowest-bit interleaved addressing and affords double bandwidth as needed to reduce the parallel vector access conflicts. A high-bandwidth data shuffle unit capable of dual vector accesses alignment is carried out in the vector access pipelining, which not only efficiently supports the unaligned access but also the special vector access patterns for FFT. The experimental results have shown that the VMU could afford conflict-free parallel accesses between DMA and vector Load/Stores operations with no more than 10% area overhead, and M-DSP achieves an ideal accelerate rate for FFT and FIR algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Keller, R.M.: Look-ahead microprocessors. ACM Computing surveys 7(4), 177–195 (1975)

    Article  Google Scholar 

  2. Khailany, B., Dally, W.J., Chang, A., Kapasi, U.J., Namkoong, J., Towles, B.: VLSI design and verification of the Imagine microprocessor. In: Proceedings of the IEEE International Conference on Computer Design, pp. 289–296 (September 2002)

    Google Scholar 

  3. Woh, M., Seo, S., Mahlke, S., et al.: AnySP:Anytime Anywhere Anyway Signal Processing. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, Austin, Texas, USA, pp. 128–139 (June 2009)

    Google Scholar 

  4. Rowen, C., Nicolaescu, D., Ravindran, R., et al.: The World’s Fastest DSP core: Breaking the 100 GMAC/s Barrier. In: Proceedings of the 23rd Hot Chips Conference. Memorial Auditorium, Standford University (August 2011)

    Google Scholar 

  5. Chang, H., Sung, W.: Efficent vectorization of SIMD programs with non-aligned and irregular data access hardware. In: CASES 2008, pp. 167–175 (2008)

    Google Scholar 

  6. Sheng, L.: Researches on On-chip Parallel Data Access Techniques for SIMDDSPswith Very Wide Data Path. PhD Thesis, NUDT, Hunan, China (April 4, 2012)

    Google Scholar 

  7. Berkel, K., Heinle, F., et al.: Vector processing as an enabler for software-defined radio in handheld devices. EURASIP Journal on Applied Signal Processing 16, 2613–2625 (2005)

    Google Scholar 

  8. Khailany, B., Dally, W.J., et al.: Imagine: media processing with streams. IEEE Micro 3 (2001)

    Google Scholar 

  9. Seiler, L., et al.: Larrabee: A many-core x86 architecture for visual computing. In: SIGGRAPH 2008, New York, NY, USA, pp. 1–15 (2008)

    Google Scholar 

  10. Pajuelo, A., Gonzalez, A., Valero, M.: Speculative dynamic vectorization. In: Proceedings of the 29th Ann. Int’l Symp. Computer Architecture, pp. 271–280 (2002)

    Google Scholar 

  11. Liu, Z., Chen, Y.-Y., Chen, H.-Y.: A Vectorization of FIR Filter Supporting Arbitrary Coefficients Length and Data Types. Aca Electronica Sinica 41(2), 346–351 (2013)

    Google Scholar 

  12. Rodriguez V, P.: A radix-2 FFT algorithm for modern single instruction multiple data (SIMD) architectures. In: Proc. 2002 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2002), vol. 3, pp. 3220–3223 (2002)

    Google Scholar 

  13. Gou, C., Kuzmanov, G., Gaydadjiev, G.N.: SAMS: Single-Affiliation Multiple-Stride Parallel Memory Scheme. In: Proceedings of the Workshop on Memory Access on Future Processors: a Solved Problem, Ischia, Italy, pp. 350–368 (May 2008)

    Google Scholar 

  14. Texas Instruments, C64x Fixed-Point DSPs Bench-marks, http://www.ti.com/lsds/ti/dsp/c6000_dsp/c64x/benchmarks.page

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, H., Liu, Z., Liu, S., Ma, S. (2015). An Efficient Vector Memory Unit for SIMD DSP. In: Xu, W., Xiao, L., Li, J., Zhang, C., Zhu, Z. (eds) Computer Engineering and Technology. NCCET 2014. Communications in Computer and Information Science, vol 491. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45815-0_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-45815-0_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-45814-3

  • Online ISBN: 978-3-662-45815-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics