Skip to main content

A Reconfigurable Hardware Architecture for Principal Component Analysis


Principal component analysis (PCA) is one of the widely used techniques for dimensionality reduction in multivariate statistical analysis. This article presents an efficient architecture design and implementation of the PCA algorithm on a field-programmable gate array (FPGA). The designed reconfigurable architecture is modeled in both floating-point and fixed-point representations using our custom-developed library of numerical operations. The PCA architecture supports input dataset matrices with parameterizable dimensions. The synthesizable model of the PCA architecture is modeled in Verilog hardware description language, and its cycle-accurate and bit-true simulation results are verified against its software simulation models. The characteristics and implementation results of the PCA architecture on a Xilinx Virtex-7 FPGA and in a standard 45-nm CMOS technology are presented. The execution times of the implemented PCA system on the FPGA for different data sizes are compared with those of the graphics processing unit and general-purpose processor implementations. To the best of our knowledge, this work is the first high-throughput design and implementation of the reconfigurable PCA architecture, including both the learning and mapping phases, on a single FPGA.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11


  1. 1.

    H. Abdi, Multivariate analysis, in Encyclopedia for Research Methods for the Social Sciences, ed. by M. Lewis-Beck, A. Bryman, T. Futing (Sage, Thousand Oaks, 2003)

    Google Scholar 

  2. 2.

    H. Abdi, L.J. Williams, Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2(4), 433–459 (2010)

    Article  Google Scholar 

  3. 3.

    A.A.S. Ali, A. Amira, F. Bensaali, M. Benammar, Hardware PCA for gas identification systems using high level synthesis on the Zynq SoC, in IEEE International Conference on Electronics, Circuits, and Systems, (2013), pp. 707–710

  4. 4.

    T.J. Bauer, The design of an efficient hardware subroutine protocol for FPGAs, in Ph.D. Dissertation, (Massachusetts Institute of Technology, Cambridge, 1994)

  5. 5.

    M. Borgognone, J. Bussi, G. Hough, Principal component analysis in sensory analysis: covariance or correlation matrix? Food Qual. Prefer. 12(5), 323–326 (2001)

    Article  Google Scholar 

  6. 6.

    T.C. Chen, W. Liu, L.G. Chen, VLSI architecture of leading eigenvector generation for on-chip principal component analysis spike sorting system, in International Conference of the IEEE Engineering in Medicine and Biology Society, (2008), pp. 3192–3195

  7. 7.

    A. Das, S. Misra, S. Joshi, J. Zambreno, G. Memik, A. Choudhary, An efficient FPGA implementation of principle component analysis based network intrusion detection system, in Proceedings of the Conference on Design, Automation and Test in Europe, (2008), pp. 1160–1165

  8. 8.

    G.H. Golub, H.A. Van der Vorst, Eigenvalue computation in the 20th century. J. Comput. Appl. Math. 123(1), 35–65 (2000)

    MathSciNet  Article  MATH  Google Scholar 

  9. 9.

    G.H. Golub, C.F. Van Loan, Matrix Computations (Johns Hopkins University Press, Baltimore, 2012)

    MATH  Google Scholar 

  10. 10.

    G.H. Golub, F. Uhlig, The QR algorithm: 50 years later its genesis by John Francis and Vera Kublanovskaya and subsequent developments. IMA J. Numer. Anal. 29, 467–485 (2009)

    MathSciNet  Article  MATH  Google Scholar 

  11. 11.

    T. Karnthak P. Kumhom, A hardware implementation of PCA based-on the networks-on-chip paradigm, in International Symposium on Communications and Information Technologies, (2012), pp. 834–839

  12. 12.

    U.A. Korat, A reconfigurable hardware implementation for the principal component analysis, in M.S. Thesis, (Department of Electrical and Computing Engineering, San Diego State University, San Diego, CA, 2016)

  13. 13.

    U.A. Korat, P. Yadav, H. Shah, An efficient hardware implementation of vector-based odd-even merge sorting, in IEEE Conference Ubiquitous Computing, Electronics and Mobile Communication, (2017), pp. 654–657

  14. 14.

    E. Kreyszig, Adv. Eng. Math. (Wiley, New Yourk, 2011)

    Google Scholar 

  15. 15.

    R. Mueller, J. Teubner, G. Alonso, Sorting networks on FPGAs. Int. J. Very Large Data Bases 21(1), 1–23 (2012)

    Article  Google Scholar 

  16. 16.

    A. Ray, A survey of CORDIC algorithms for FPGA based computers, in International Symposium on Field Programmable Gate Arrays, (1998), pp. 191–200

  17. 17.

    M. Ren, Cordic-based Givens QR decomposition for MIMO detectors, in Master’s Thesis, (Georgia Institute of Technology, Atlanta, 2013)

  18. 18.

    M.U. Torun, O. Yilmaz, A.N. Akansu, FPGA, GPU, and CPU implementations of Jacobi algorithm for eigenanalysis. J. Parallel Distrib. Comput. 96, 172–180 (2016)

    Article  Google Scholar 

  19. 19.

    L. Vuataz, Some points of methodology in multidimensional data analysis as applied to sensory evaluation, in Nestle Research News, (1976), pp. 57–71

Download references


We would like to thank Mr. Ian Schofield for providing the GPU execution time for the PCA implemented on the GPU.

Author information



Corresponding author

Correspondence to Uday A. Korat.

Additional information

This work was supported (in part) by the Center for Sensorimotor Neural Engineering (CSNE), a National Science Foundation Engineering Research Center (EEC-1028725).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Korat, U.A., Alimohammad, A. A Reconfigurable Hardware Architecture for Principal Component Analysis. Circuits Syst Signal Process 38, 2097–2113 (2019).

Download citation


  • Reconfigurable hardware
  • Vector processor
  • Principal component analysis
  • Dimensionality reduction
  • Covariance-based PCA
  • EigenSolver
  • QR decomposition
  • Givens rotation
  • Odd–even merge sort
  • FPGA and ASIC implementation