Skip to main content

Online feature extraction based on accelerated kernel principal component analysis for data stream

Abstract

Kernel principal component analysis (KPCA) is known as a nonlinear feature extraction method. Takeuchi et al. have proposed an incremental type of KPCA (IKPCA) that can update an eigen-space incrementally for a sequence of data. However, in IKPCA, the eigenvalue decomposition should be carried out for every single data, even though a chunk of data is given at one time. To reduce the computational costs in learning chunk data, this paper proposes an extended IKPCA called Chunk IKPCA (CIKPCA) where a chunk of multiple data is learned with single eigenvalue decomposition. For a large data chunk, to reduce further computation time and memory usage, it is first divided into several smaller chunks, and only useful data are selected based on the accumulation ratio. In the proposed CIKPCA, a small set of independent data are first selected from a reduced set of data so that eigenvectors in a high-dimensional feature space can be represented as a linear combination of such independent data. Then, the eigenvectors are incrementally updated by keeping only an eigenspace model that consists of the sextuplet such as independent data, coefficients, eigenvalues, and mean information. The proposed CIKPCA can augment an eigen-feature space based on the accumulation ratio that can also be updated without keeping all the past data, and the eigen-feature space is rotated by solving an eigenvalue problem once for each data chunk. The experiment results show that the learning time of the proposed CIKPCA is greatly reduced as compared with KPCA and IKPCA without sacrificing recognition accuracy.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

Notes

  1. 1.

    For the last small chunk, the number of data could be smaller than \(n\).

References

  1. Abe S (2010) Support vector machines for pattern classification, advances in pattern recognition. Springer, London

    Book  Google Scholar 

  2. Aoki D, Omori T, Ozawa S (2013) A robust incremental principal component analysis for feature extraction from stream data with missing values. In: Procccdings of international joint conference on neural networks, pp 1–8

  3. Asunction S, Newman DJ (2007) UCI machine learning repository. Irvine, School of Info. and Comp. Sci, UC

    Google Scholar 

  4. Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data streams systems. In: Procedings 21st ACM SIGMOID-SIGACT-SIGART symposium on principles of database systems. ACM Press, New York

  5. Baudat G, Anouar F (2003) Feature vector selection and projection using kernels. Neurocomputing 55:21–38

    Article  Google Scholar 

  6. Case J, Jain S, Lange S, Zeugmann T (1999) Incremental concept learning for bounded data mining. Inf Comput 152:74–110

    Article  MathSciNet  MATH  Google Scholar 

  7. Chin TJ, Suter D (2007) Incremental kernel principal component analysis. IEEE Trans Image Process 16:1662–1674

    Article  MathSciNet  Google Scholar 

  8. Domingos P, Hulten G (2001) Catching up with the data: research issues in mining data streams. In: ACM SIGMOID workshop on research issues in data mining and knowledge discovery

  9. Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Trans Neural Netw 22:1517–1531

    Article  Google Scholar 

  10. Honeine P (2012) Online kernel principal component analysis: a reduced-order model. IEEE Trans Pattern Anal Mach Intell 34:1814–1826

    Article  Google Scholar 

  11. Jang YM, Lee M, Ozawa S (2011) A real-time personal authentication system based on incremental feature extraction and classification of audiovisual information. Evolv Syst 2:261–272

    Article  Google Scholar 

  12. Jolliffe I (1986) Principal component analysis. Springer, New York

    Book  Google Scholar 

  13. Joseph AA, Jang YM, Ozawa S, Lee M (2012) Extension of incremental linear discriminant analysis to online feature extraction under nonstationary environments. In: Proceedings of 19th international conferene on neural information processing, pp 640–647

  14. Kasabov N (2003) Evolving connectionist systems: methods and applications in bioinformatics, brain study and intelligent machines. Springer, London

    Book  Google Scholar 

  15. Kim K, Franz M, Schölkopf B (2005) Iterative kernel principal component analysis for image modeling. IEEE Trans Pattern Anal Mach Intell 27:1351–1366

    Article  Google Scholar 

  16. Minku FL, Inoue H, Yao X (2009) Negative correlation in incremental learning. Natl Comput J Special Issue Nat Inspir Learn Adapt Syst 8:289–320

    MathSciNet  MATH  Google Scholar 

  17. Oja E (1982) A simplified neuron model as a principal component Analyzer. J Math Biol 15:267–273

    Article  MathSciNet  MATH  Google Scholar 

  18. Oja E (1992) Principal component, minor components, and linear neural networks. Neural Netw 5:927–935

    Article  Google Scholar 

  19. Ozawa S, Toh SL, Abe S, Pang S, Kasabov N (2005) Incremental learning of feature space and classifier for face recognition. Neural Netw 18:575–584

    Article  Google Scholar 

  20. Ozawa S, Pang S, Kasabov N (2008) Incremental learning of chunk data for on-line pattern classification systems. IEEE Trans Neural Netw 19:1061–1074

    Article  Google Scholar 

  21. Sanger TD (1989) Optimal Unsupervised learning in a single-layer linear feedforward neural network. Neural Netw 12:459–473

    Article  Google Scholar 

  22. Schölkopf B, Mika S, Burges CJC, Knirsch P, Müller KR, Rätsch G, Smola AJ (1999) Input space vs. feature space in Kernel-based methods. IEEE Trans Neural Netw 10:1000–1017

    Article  Google Scholar 

  23. Schölkopf B, Smola A, Müller KR (1997) Kernel principal component analysis. In: Gerstner W, Hasler M, Germond A, Nicoud J-D (eds) ICANN 1997, LNCS, Springer, Berlin 1327:583–588

  24. Takeuchi Y, Ozawa S, Abe S (2007) An efficient incremental kernel principal component analysis for online feature selection. In: Proceedings of international joint conference on neural networks, pp 2346–2351

  25. Tokumoto T, Ozawa S (2012) A property of learning chunk data using incremental kernel principal component analysis. In: Proceedings of IEEE workshop on evolving and adaptive intelligent systems (EAIS2012, Madrid), pp 7–10

  26. Weng J, Zhang Y, Hwang WS (2003) Candid covariance-free incremental principal component analysis. IEEE Trans Pattern Anal Mach Intell 25:1034–1040

    Article  Google Scholar 

  27. Weng J, Evans CH, Hwang WS (2000) An incremental learning method for face recognition under continuous video stream. In: Proceedings of fourth IEEE international conference on automatic face and gesture recognition, pp 251–256

  28. Xu Y, Yang JY, Lu JF (2005) An efficient kernel-based nonlinear regression method for two-class classification. In: Proceedings of international conference on machine learning and cybernetics, pp 4442–4445

  29. Yang MH (2002) Kernel eigenfaces vs. kernel fisherfaces: face recognition using kernel methods. In: Proceedings fourth 5th IEEE conference on automatic face and festure recognition, pp 215–220

  30. Zhang Z, Tian Z, Duan X, Fu X (2013) Adaptive kernel subspace method for speeding up feature extraction. Neurocomputing 113:58–66

    Article  Google Scholar 

  31. Zhao H, Yuen PC, Kwok JT (2006) A novel incremental principal component analysis and its application for face recognition. IEEE Trans Syst Man Cybernet Part B 36:873–886

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Annie Anak Joseph.

Appendix: derivation of updated accumulation ratio \(C^{\prime }(d)\) in Eq. (12)

Appendix: derivation of updated accumulation ratio \(C^{\prime }(d)\) in Eq. (12)

Since the \(i\)th eigenvalue \(\lambda _i\) is equivalent to the variance of data projections to the \(i\)th eigenvector, the numerator of \(C(d)\) in Eq. (12) is calculated by

$$\begin{aligned} \sum _{i=1}^d \lambda ^{\prime }_i = \frac{1}{N + n} \sum _{i=1}^d \left\{ \sum _{j=1}^N \Vert {\varvec{z}}_i^T \left( \phi ({\varvec{x}}_j) - {\varvec{c}}^{\prime } \right) \Vert ^2 + \sum _{j=1}^n \Vert {\varvec{z}}_i^T \left( \phi ({\varvec{y}}_j) - {\varvec{c}}^{\prime } \right) \Vert ^2 \right\} . \end{aligned}$$
(42)

Using Eq. (33) and the equation \(\lambda _i = \frac{1}{N} \sum _{j=1}^N \Vert {\varvec{z}}_i^T (\phi ({\varvec{x}}_j) - {\varvec{c}}) \Vert ^2\), the first and the second terms in Eq. (42) are respectively given by

$$\begin{aligned} \sum _{j=1}^N \Vert {\varvec{z}}_i^T \left( \phi ({\varvec{x}}_j) - {\varvec{c}}^{\prime } \right) \Vert ^2 = N \lambda _i + \frac{N n^2}{(N+n)^2} \Vert {\varvec{z}}_i^T ({\varvec{c}} - {\varvec{c}}_y) \Vert ^2 \end{aligned}$$
(43)
$$\begin{aligned} \sum _{j=1}^n \Vert {\varvec{z}}_i^T \left( \phi ({\varvec{y}}_j) - {\varvec{c}}^{\prime } \right) \Vert ^2 = \sum _{j=1}^n \Vert {\varvec{z}}_i^T \left( \phi ({\varvec{y}}_j) - {\varvec{c}}_y \right) \Vert ^2 + \frac{N^2 n}{(N+n)^2} \Vert {\varvec{z}}_i^T \left( {\varvec{c}} - {\varvec{c}}_y \right) \Vert ^2. \end{aligned}$$
(44)

Then, the numerator of \(C(d)\) is reduced to

$$\begin{aligned} \sum _{i=1}^d \lambda ^{\prime }_i = \frac{1}{N+n} \sum _{i=1}^d \left\{ N \lambda _i + \frac{N n}{N + n} \Vert {\varvec{z}}_i^T ( {\varvec{c}} - {\varvec{c}}_y ) \Vert ^2 + \sum _{j=1}^n \Vert {\varvec{z}}_i^T \left( \phi ({\varvec{y}}_j) - {\varvec{c}}_y \right) \Vert ^2 \right\} . \end{aligned}$$
(45)

On the other hand, the denominator is defined by

$$\begin{aligned} \sum _{i=1}^m \lambda ^{\prime }_i = \frac{1}{N + n} \left\{ \sum _{j=1}^N \Vert \phi ({\varvec{x}}_j) - {\varvec{c}}^{\prime } \Vert ^2 + \sum _{j=1}^n \Vert \phi ({\varvec{y}}_j) - {\varvec{c}}^{\prime } \Vert ^2 \right\} . \end{aligned}$$
(46)

Then, the first and the second terms in Eq. (46) are respectively given by

$$\begin{aligned} \sum _{j=1}^N \Vert \left( \phi ({\varvec{x}}_j) - {\varvec{c}}^{\prime } \right) \Vert ^2 = N \sum _{i=1}^m \lambda _i + \frac{N n^2}{(N + n)^2} \Vert {\varvec{c}} - {\varvec{c}}_y \Vert ^2. \end{aligned}$$
(47)
$$\begin{aligned} \sum _{j=1}^n \Vert \left( \phi ({\varvec{y}}_j) - {\varvec{c}}^{\prime } \right) \Vert ^2 = \sum _{j=1}^n \Vert \phi ({\varvec{y}}_j) - {\varvec{c}}_y \Vert ^2 + \frac{N^2 n}{(N + n)^2} \Vert {\varvec{c}} - {\varvec{c}}_y \Vert ^2. \end{aligned}$$
(48)

Then, the denominator of \(C(d)\) is reduced to

$$\begin{aligned} \sum _{i=1}^m \lambda ^{\prime }_i = \frac{1}{N + n} \Biggr \{ N \sum _{i=1}^m \lambda _i + \frac{N n}{N+n} \Vert {\varvec{c}} - {\varvec{c}}_y \Vert ^2 + \sum _{j=1}^n \Vert \phi ({\varvec{y}}_j) - {\varvec{c}}_y \Vert ^2 \Biggl \}. \end{aligned}$$
(49)

Since Eqs. (45) and (49) are hardly computed directly, the kernel trick is applied to each term. The second term of Eq. (45) is shown as follows:

$$\begin{aligned} \Vert {\varvec{z}}_i^T \left( {\varvec{c}} - {\varvec{c}}_y \right) \Vert ^2 = \left\{ {\varvec{\alpha }}_i^T \left( {\varvec{\varPhi }}_m^T {\varvec{c}} - {\varvec{\varPhi }}_m^T {\varvec{c}}_y \right) \right\} ^2. \end{aligned}$$
(50)

The first and second terms of Eq. (50) are shown in Eqs. (11) and (17), respectively. The third term of Eq. (45) is shown as follows:

$$\begin{aligned} \Vert {\varvec{z}}_i^T \left( \phi ({\varvec{y}}_j) - {\varvec{c}}_y \right) \Vert ^2 = \left\{ {\varvec{\alpha }}_i^T \left( {\varvec{\varPhi }}_m^T \phi ({\varvec{y}}_j) - {\varvec{\varPhi }}_m^T {\varvec{c}}_y \right) \right\} ^2 \end{aligned}$$
(51)

and the first term of Eq. (51) is shown in Eq. (18).

On the other hand, the second term of Eq. (49) is calculated as follows:

$$\begin{aligned} \Vert {\varvec{c}} - {\varvec{c}}_y \Vert ^2= & {} \Vert {\varvec{c}} \Vert ^2 + \Vert {\varvec{c}}_y \Vert ^2 - 2 {\varvec{c}}_y^T {\varvec{c}} \nonumber \\= & {} \frac{1}{N^2} \sum _{i=1}^N \sum _{j=1}^N k({\varvec{x}}_i, {\varvec{x}}_j) + \frac{1}{n^2} \sum _{i=1}^n \sum _{j=1}^n k({\varvec{y}}_i, {\varvec{y}}_j) - \frac{2}{n} \sum _{i=1}^n \phi ({\varvec{y}}_i)^T {\varvec{c}} \end{aligned}$$
(52)

The third term still needs to be derived in a computable form. Since the primal components of \({\varvec{c}}\) exist in the \(d\)-dimensional eigenspace, when calculating the inner product \(\phi ({\varvec{y}}_i)^T {\varvec{c}}\), we can consider only the components of \(\phi ({\varvec{y}}_i)\) in the eigenspace. Then, let us approximate \(\phi ({\varvec{y}}_i)\) by the following linear combination of \(m\) independent data \({\varvec{\varPhi }}_m\):

$$\begin{aligned} \phi ({\varvec{y}}_i) \approx \sum _{j=1}^m {\varvec{\beta }}_i \phi (\hat{{\varvec{x}}}_j) = {\varvec{\varPhi }}_m {\varvec{\beta }}_i \end{aligned}$$
(53)

Then, the third term can be approximated by

$$\begin{aligned} \phi ({\varvec{y}}_i)^T {\varvec{c}} \approx {\varvec{\beta }}_i^T ({\varvec{\varPhi }}_m^T {\varvec{c}}). \end{aligned}$$
(54)

where

$$\begin{aligned} {\varvec{\beta }}_i \approx ({\varvec{\varPhi }}_m^T {\varvec{\varPhi }}_m)^{-1} {\varvec{\varPhi }}_m^T \phi ({\varvec{y}}_i). \end{aligned}$$
(55)

Finally, the third term of Eq. (49) can be reduced to

$$\begin{aligned} \Vert \phi ({\varvec{y}}_j) - {\varvec{c}}_y \Vert ^2& = k({\varvec{y}}_j, {\varvec{y}}_j) - 2 \phi ( {\varvec{y}}_j )^T {\varvec{c}}_y + \Vert {\varvec{c}}_y \Vert ^2 \nonumber \\ &= k({\varvec{y}}_j, {\varvec{y}}_j) - \frac{2}{n} \sum _{i=1}^n k({\varvec{y}}_j, {\varvec{y}}_i) + \frac{1}{n^2} \sum _{i=1}^n \sum _{j=1}^n k({\varvec{y}}_i, {\varvec{y}}_j). \end{aligned}$$
(56)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Joseph, A.A., Tokumoto, T. & Ozawa, S. Online feature extraction based on accelerated kernel principal component analysis for data stream. Evolving Systems 7, 15–27 (2016). https://doi.org/10.1007/s12530-015-9131-7

Download citation

Keywords

  • Online learning
  • Incremental learning
  • Feature extraction
  • Kernel principal component analysis