Skip to main content

Advertisement

Log in

A Family of Modular QRD-Accelerator Architectures and Circuits Cross-Layer Optimized for High Area- and Energy-Efficiency

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

QR-decomposition accelerators are attractive SoC components for many applications with a wide range of specifications. A new family of highly area- and energy-efficient, modular two-way linear-array QRD architectures based on the Givens algorithm and CORDIC rotations is proposed. The template architecture allows for implementations of real-/complex-valued and integer/floating-point QRDs. An accurate algebraic cost model enables cross-layer optimization over architecture, micro-architecture and circuit level using a rich set of parameters. Quantitative results for exemplary applications are presented for implementations in 40-nm CMOS, proving the significant improvement of efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Figure 23
Figure 24
Figure. 25
Figure 26
Figure 27
Figure 28
Figure 29
Figure 30
Figure 31
Figure 32
Figure 33
Figure 34
Figure 35
Figure 36

Similar content being viewed by others

References

  1. Senning, C., Staudacher, A., Burg. A. (2010). Systolic-array based regularized QR-decomposition for IEEE 802.11n Compliant Soft-MMSE Detection. In 2010 International Conference on Microelectronics (ICM), pp. 391–394

  2. Kung, S. Y. (1987). VLSI array processors. Upper Saddle River: Prentice-Hall, Inc.

    Google Scholar 

  3. Golub, G. H., & Van Loan, C. F. (1996). Matrix computations (3rd ed.). Baltimore: Johns Hopkins University Press.

    MATH  Google Scholar 

  4. Luethi, P., Studer, C., Duetsch, S., Zgraggen, E., Kaeslin, H., Felber, N., Fichtner, W., (2008). Gram-schmidt-based QR decomposition for MIMO detection: VLSI implementation and comparison. In Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), Macao, China, pp. 830–833.

  5. Elster, A., Cavallaro, J. R. (1991). A CORDIC processor array for the SVD of a complex matrix. In SVD and Signal Processing II: Algorithms, Analysis and Applications,Elsevier Publishers (pp. 227–239)

  6. Kung, H., Gentleman, W. (1982). Matrix triangularization by systolic arrays, vol. Paper 1603 of Computer Science Department. Carnegie Mellon Uminersity.

  7. Luethi, P., Burg, A., Haene, S., Perels, D., Felber, N., Fichtner, W. (2007). VLSI implementation of a high-speed iterative sorted MMSE QR decomposition. In Proceedings of International Symposium on Circuits and Systems (ISCAS), (New Orleans), pp. 1421–1424, IEEE.

  8. Liu, Z., McCanny, J., Lightbody, G., & Walke, R. (2003). Generic SoC QR array processor for adaptive beamforming. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 50(4), 169–175.

  9. Ma, L., Dickson, K., McAllister, J., & McCanny, J. (2011). QR decomposition-based matrix inversion for high performance embedded MIMO receivers. IEEE Transactions on Signal Processing, 59(4), 1858–1867.

  10. Misra, M., Moona, R. (1994). Design of systolic arrays for QR decomposition. In International Conference on Computer Systems and Education, IISc.

  11. Lightbody, G., Walke, R., Woods, R., & McCanny, J. (2000). Linear QR architecture for a single chip adaptive beamformer. Journal VLSI Signal Processing Systems Signal Image and Video Technology, 24(1), 67–81.

    Article  Google Scholar 

  12. Walke, R. (1997). High Sample-rate Givens Rotations for Recursive Least Squares. PhD thesis, University of Warwick.

  13. Huang, Z., & Tsai, P. (2011). Efficient implemetation of QR decomposition of gigabit MIMO-OFDM systems. IEEE Transactions on Circuits and Systems I: Regular Papers, 58, 2531–2542.

    Article  MathSciNet  Google Scholar 

  14. Shabany, M., Patel, D., & Gulak, P. (2013). A Low-latency low-power QR-Decomposition ASIC Implementation in 0.13 \( \mu \)m CMOS. IEEE Transactions on Circuits and Systems I: Regular Papers, 60, 327–340.

    Article  MathSciNet  Google Scholar 

  15. Ercegovac, M. D., & Lang, T. (2003). Digital arithmetic (1st ed.). San Francisco: Morgan Kaunfamm Publishers.

    Google Scholar 

  16. Chiu, P., Huang, L., Chai, L., & Huang, Y. (2011). Interpolation-based QR decomposition and channel estimation processor for MIMO-OFDM system. IEEE Transactions on Circuits and Systems I: Regular Papers, 58(5), 1129–1141.

    Article  MathSciNet  Google Scholar 

  17. Vishnoi, U., Noll, T. 2013. Cross-layer optimization of QRD accelerators. In Proceedings of IEEE European Solid-State Circuits Conference (ESSCIRC), Bucharest, Romania, pp. 263–266

  18. Liu, Z., Lightbody, G., Walke, R., Hu, Y., McCanny, J. (2001). Generic scheduling methods for a linear QR array SoC processor. In Proceedings ICASSP, vol. 2, pp. 1097–1100, IEEE.

  19. Patel, D., Shabany, M., Gulak, P. (2009). A low-complexity high-speed QR decomposition implementation for MIMO receivers. In Proceedings of International Symposium on Circuits and Systems (ISCAS), pp. 33–36.

  20. Vishnoi, U., & Noll, T. G. (2012). Area- and energy-efficient CORDIC accelerators in deep sub-micron CMOS technologies. Advances in Radio Science, 10, 207–213.

    Article  Google Scholar 

  21. Vishnoi, U., Meixner, M., & Noll, T. (2012). An approach for quantitative optimization of highly efficient dedicated CORDIC macros as SoC building blocks (pp. 242–247). Niagara Falls: Proceedings International System-On-Chip Conference.

    Google Scholar 

  22. Säll, E., Vesterbacka, M., Andersson, K. (2004). A study of digital decoders in flash analog-to-digital converters. In Proceedings of International Symposium on Circuits and Systems (ISCAS), pp. 129–132, IEEE.

  23. Weiss, O., Gansen, M., Noll, T. (2001). A flexible data path generator for physical oriented design. In Solid-State Circuits Conference (ESSCIRC)2001, Proceedings of the 27 th European, pp. 393–396.

  24. Careto, B., Masera, G., Nilsson, P. (2007). Hardware architecture for matrix factorization in MIMO receivers. In Proceedings of the 17 th ACM Great Lakes symposium on VLSI (GLSVLSI), Stresa-Lago Maggiore, (Italy), pp. 196–199.

  25. Studer, C., Blösch, P., Friedli, P.,Burg, A. (2007). Matrix decomposition architecture for MIMO systems: design and implementation trade-offs. In Proceedings of the Forsty-First Asilomar Conference on Signals, Systems and Computers, (Asilomar, USA), pp. 1986–1990.

  26. Mohamed, M. I. A., Mohammed, K., & Daneshrad, B. (2014). Energy efficient programmable MIMO decoder accelerator chip in 65-nm CMOS. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 22(7), 1481–1490.

    Article  Google Scholar 

  27. Korb, M. (20132). Deep-submicron full-custom VLSI-design of highly optimized high-throughput low-latency LDPC decoders. PhD dissertation thesis, RWTH Aachen University, pp. 39–40.

  28. Vishnoi, U., Noll, T. (2013). A family of modular area- and energy-efficient QRD- accelerator architectures. In Proceedings International Symposium on System-on-Chip Conference(SoC), (Tampere), Finland, pp. 1–8.

  29. Salmela, P., Burian, A., Sorokin, H., Takala, J. (2008). Complex-valued QR decomposition implementation for MIMO receivers. In Proceedings of International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 1433–1436, IEEE.

Download references

Acknowledgments

The authors would like to thank their colleague Jos Huisken for many discussions and helpful comments as well as Eqbal Maraqa for his highly valuable contributions in the validation of the cost model.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Upasna Vishnoi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vishnoi, U., Meixner, M. & Noll, T.G. A Family of Modular QRD-Accelerator Architectures and Circuits Cross-Layer Optimized for High Area- and Energy-Efficiency. J Sign Process Syst 83, 329–356 (2016). https://doi.org/10.1007/s11265-015-0976-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-015-0976-6

Keywords

Navigation