Applied Intelligence

, Volume 48, Issue 2, pp 390–415 | Cite as

Sparse kernel minimum squared error using Householder transformation and givens rotation

  • Yong-Ping ZhaoEmail author
  • Peng-Peng Xi
  • Bing Li
  • Zhi-Qiang Li


Two obvious limitations exist for baseline kernel minimum squared error (KMSE): lack of sparseness of the solution and the ill-posed problem. Previous sparse methods for KMSE have overcome the second limitation using a regularization strategy, which introduces an increase in the computational cost to determine the regularization parameter. Hence, in this paper, a constructive sparse algorithm for KMSE (CS-KMSE) and its improved version (ICS-KMSE) are proposed which will simultaneously address the two limitations described above. CS-KMSE chooses the training samples that incur the largest reductions on the objective function as the significant nodes on the basis of the Householder transformation. In contrast with CS-KMSE, there is an additional replacement mechanism using Givens rotation in ICS-KMSE, which results in ICS-KMSE giving better performance than CS-KMSE in terms of sparseness. CS-KMSE and ICS-KMSE do not require the regularization parameter at all before they begin to choose significant nodes, which is beneficial since it saves on the model selection time. More importantly, CS-KMSE and ICS-KMSE terminate their procedures with an early stopping strategy that acts as an implicit regularization term, which avoids overfitting and curbs the sparse level on the solution of the baseline KMSE. Finally, in comparison with other algorithms, both ICS-KMSE and CS-KMSE have superior sparseness, and extensive comparisons confirm their effectiveness and feasibility.


Kernel method Kernel minimum squared error Householder transformation Givens rotation Sparseness 



This research was supported by the Fundamental Research Funds for the Central Universities under Grant no. NJ20160021. Moreover, the author wish to thank the anonymous reviewers for their constructive comments and great help in the writing process, which improve the manuscript significantly.


  1. 1.
    Traboulsi YE, Dornaika F, Assoum A (2015) Kernel flexible manifold embedding for pattern classification. Neurocomputing 167:517–527. [Online]. Available: CrossRefGoogle Scholar
  2. 2.
    Zhao Y-P (2016) Parsimonious kernel extreme learning machine in primal via cholesky factorization. Neural Netw 80:95–109. [Online]. Available: CrossRefGoogle Scholar
  3. 3.
    Vapnik VN (1995) The Nature of statistical learning theory. Springer, New YorkCrossRefzbMATHGoogle Scholar
  4. 4.
    Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–99CrossRefGoogle Scholar
  5. 5.
    Xu J, Zhang X, Li Y (2001) Kernel mse algorithm: a unified framework for kfd, ls-svm and krr International joint conference on neural networks, vol 2, pp 1486–1491Google Scholar
  6. 6.
    Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300CrossRefzbMATHGoogle Scholar
  7. 7.
    Zhao Y-P, Sun J-G, Du Z-H, Zhang Z-A, Li Y-B (2012) Online independent reduced least squares support vector regression. Inf Sci 201:37–52. [Online]. Available: CrossRefzbMATHGoogle Scholar
  8. 8.
    Zhao Y-P, Wang K-K, Li F (2015) A pruning method of refining recursive reduced least squares support vector regression. Inf Sci 296:160–174. [Online]. Available: MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Saunders C, Gammerman A, Vovk V (1998) Ridge regression learning algorithm in dual variables Proceedings of the 15th international conference on machine learning, pp 515–521Google Scholar
  10. 10.
    Mika S, Rätsch G, Weston J, Schölkopf B, Mller KR (1999) Fisher discriminant analysis with kernels Proceedings of the 1999 IEEE signal processing society workshop, pp 41–48Google Scholar
  11. 11.
    Chen Z, Haykin S (2002) On different facets of regularization theory. Neural Comput 14(12):2791–2846CrossRefzbMATHGoogle Scholar
  12. 12.
    Morozov VA (1984) Methods for solving incorrectly posed problems. Springer-VerlagGoogle Scholar
  13. 13.
    Tipping ME (2001) Sparse bayesian learning and the relevance vector machine. J Mach Learn Res 1(3):211–244MathSciNetzbMATHGoogle Scholar
  14. 14.
    Zhao Y, Wang K (2014) Fast cross validation for regularized extreme learning machine. J Syst Eng Electron 25(5):895–900CrossRefGoogle Scholar
  15. 15.
    Xu Y, Yang JY, Lu JF (2005) An efficient kernel-based nonlinear regression method for two-class classification International conference on machine learning and cybernetics, vol 7, pp 4442–4445Google Scholar
  16. 16.
    Xu Y, Zhang D, Jin Z, Li M, Yang J-Y (2006) A fast kernel-based nonlinear discriminant analysis for multi-class problems. Pattern Recogn 39(6):1026–1033. [Online]. Available: CrossRefzbMATHGoogle Scholar
  17. 17.
    Wang J, Wang P, Li Q, You J (2013) Improvement of the kernel minimum squared error model for fast feature extraction. Neural Comput Appl 23(1):53–59CrossRefGoogle Scholar
  18. 18.
    Jiang J, Chen X, Gan H, Sang N (2014) Sparsity based feature extraction for kernel minimum squared error Chinese conference on pattern recognition, vol 483, pp 273–282Google Scholar
  19. 19.
    Zhu Q (2009) A method for rapid feature extraction based on kmse 2010 Second WRI Global Congress on Intelligent Systems, pp 335–338Google Scholar
  20. 20.
    Zhu Q (2010) Reformative nonlinear feature extraction using kernel mse. Neurocomputing 73(16–18):3334–3337. 10th Brazilian Symposium on Neural Networks (SBRN2008). [Online]. Available: CrossRefGoogle Scholar
  21. 21.
    Zhu Q, Xu Y, Cui J, Chen CF (2009) A method for constructing simplified kernel model based on kernel-mse Asia-Pacific Conference on Computational Intelligence and Industrial Applications, pp 237–240Google Scholar
  22. 22.
    Zhao Y-P, Du Z-H, Zhang Z-A, Zhang H-B (2011) A fast method of feature extraction for kernel mse. Neurocomputing 74(10):1654–1663. [Online]. Available: CrossRefGoogle Scholar
  23. 23.
    Zhao Y-P, Sun J-G, Du Z-H, Zhang Z-A, Zhang H-B (2011) Pruning least objective contribution in kmse. Neurocomputing 74(17):3009–3018. [Online]. Available: CrossRefGoogle Scholar
  24. 24.
    Zhao Y-P, Liang D, Ji Z (2017) A method of combining forward with backward greedy algorithms for sparse approximation to kmse. Soft Comput 21(9):2367–2383. [Online]. Available. doi: 10.1007/s00500-015-1947-3 CrossRefGoogle Scholar
  25. 25.
    Zhao Y-P, Wang K-K, Liu J, Huerta R (2014) Incremental kernel minimum squared error (kmse). Inform Sci 270:92–111. [Online]. Available: MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Lauer F, Bloch G (2006) Hockashyap classifier with early stopping for regularization. Pattern Recogn Lett 27(9):1037–1044. [Online]. Available: CrossRefGoogle Scholar
  27. 27.
    Vincent P, Bengio Y (2002) Kernel matching pursuit. Mach Learn 48(1–3):165–187. [Online]. Available. doi: 10.1023/A:1013955821559 CrossRefzbMATHGoogle Scholar
  28. 28.
    Jiao L, Bo L, Wang L (2007) Fast sparse approximation for least squares support vector machine. IEEE Trans Neural Netw 18(3):685–697. [Online]. Available. doi: 10.1109/TNN.2006.889500 CrossRefGoogle Scholar
  29. 29.
    Scholköpf B, Herbrich R, Smola AJ (2001) A generalized representer theorem Conference on computational learning theory and and European conference on computational learning theory, pp 416–426Google Scholar
  30. 30.
    An S, Liu W, Venkatesh S (2007) Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression. Pattern Recogn 40(8):2154–2162. part Special Issue on Visual Information Processing. [Online]. Available: CrossRefzbMATHGoogle Scholar
  31. 31.
    Chen S, Billings S, Luo W (1989) Orthogonal least squares methods and their application to non-linear system identification. Int J Control 50(5):1873 –1896CrossRefzbMATHGoogle Scholar
  32. 32.
    Zhao Y-P, Sun J-G (2010) Thrust estimator design based on least squares support vector regression machine. J Harbin Inst Technol (New Series) 17(4):578–583Google Scholar
  33. 33.
    Zhang X (2004) Matrix analysis and applications. Tsinghua University PressGoogle Scholar
  34. 34.
    Dubrulle AA (2000) Householder transformations revisited. Siam J Matrix Anal Appl 22(1):33–40MathSciNetCrossRefzbMATHGoogle Scholar
  35. 35.
    Householder AS (1958) Unitary triangularization of a nonsymmetric matrix. J ACM 5(4):339–342MathSciNetCrossRefzbMATHGoogle Scholar
  36. 36.
    Givens W (1958) Computation of plane unitary rotations transforming a general matrix to triangular form. J Soc Indust Appl Math 6(1):26–50MathSciNetCrossRefzbMATHGoogle Scholar
  37. 37.
    Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. WileyGoogle Scholar
  38. 38.
    Suykens J, Lukas L, Vandewalle J (2000) Sparse approximation using least squares support vector machines Proceedings of IEEE international symposium on circuits and systems, vol 2. Geneva, Switz, pp II–757–II–760Google Scholar
  39. 39.
    Zhao Y, Sun J (2009) Recursive reduced least squares support vector regression. Pattern Recogn 42 (5):837–842. [Online]. Available. doi: 10.1016/j.patcog.2008.09.028 CrossRefzbMATHGoogle Scholar
  40. 40.
    An S, Liu W, Venkatesh S (2007) Fast cross-validation algorithms for least squares support vector machine and kernel ridge regression. Pattern Recogn 40(8):2154–2162. [Online]. Available. doi: 10.1016/j.patcog.2006.12.015 CrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  • Yong-Ping Zhao
    • 1
    Email author
  • Peng-Peng Xi
    • 1
  • Bing Li
    • 1
  • Zhi-Qiang Li
    • 1
  1. 1.Jiangsu Province Key Laboratory of Aerospace Power Systems College of Energy and Power EngineeringNanjing University of Aeronautics and AstronauticsNanjingChina

Personalised recommendations