Statistical Learning Theory and Kernel-Based Methods

  • Chris Aldrich
  • Lidia Auret
Part of the Advances in Computer Vision and Pattern Recognition book series (ACVPR)


The basics of kernel methods and their position in the generalized data-driven fault diagnostic framework are reviewed. The review starts out with statistical learning theory, covering concepts such as loss functions, overfitting and structural and empirical risk minimization. This is followed by linear margin classifiers, kernels and support vector machines. Transductive support vector machines are discussed and illustrated by way of an example related to multivariate image analysis of coal particles on conveyor belts. Finally, unsupervised kernel methods, such as kernel principal component analysis, are considered in detail, analogous to the application of linear principal component analysis in multivariate statistical process control. Fault diagnosis in a simulated nonlinear system by the use of kernel principal component analysis is included as an example to illustrate the concepts.


Support Vector Machine Kernel Function Support Vector Regression Input Space Kernel Principal Component Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Belousov, A. I., Verzakov, S. A., & von Frese, J. (2002). Applicational aspects of support vector machines. Journal of Chemometrics, 16(8–10), 482–489.CrossRefGoogle Scholar
  2. Berk, R. A. (2008). Statistical learning from a regression perspective (1st ed.). New York: Springer.zbMATHGoogle Scholar
  3. Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational Learning Theory – COLT’92. The 5th annual workshop (pp. 144–152), Pittsburgh, PA, USA. Available at: Accessed 27 May 2011.
  4. Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge/New York: Cambridge University Press.zbMATHGoogle Scholar
  5. Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.CrossRefGoogle Scholar
  6. Chapelle, O., & Zien, A. (2005). Semi-supervised classification by low-density separation. In Proceedings of the 10th international workshop on Artificial Intelligence and Statistics (pp. 57–64).Google Scholar
  7. Cortes, C., & Vapnik, V. (1995). Support vector networks. Machine Learning, 20(3), 273–297.zbMATHGoogle Scholar
  8. Cover, T. M. (1965). Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Transactions on Electronic Computers, EC-14(3), 326–334.CrossRefGoogle Scholar
  9. Dong, D., & McAvoy, T. J. (1992). Nonlinear principal component analysis – Based on principal curves and neural networks. Computers and Chemical Engineering, 16, 313–328.CrossRefGoogle Scholar
  10. Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188.CrossRefGoogle Scholar
  11. Franc, V., Schlesinger, M. I., & Hlavac, V. (2008). Statistical pattern recognition toolbox for Matlab. Available at: Accessed 12 Dec 2011.
  12. Hastie, T., & Stuetzle, W. (1989). Principal curves. Journal of the American Statistical Association, 84, 502–516.MathSciNetzbMATHCrossRefGoogle Scholar
  13. Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24(6), 417–441. Available at: Accessed 13 Apr 2011.CrossRefGoogle Scholar
  14. Hsieh, W. (2009). Machine learning methods in the environmental sciences: Neural networks and kernels. Cambridge/New York: Cambridge University Press.CrossRefGoogle Scholar
  15. Jemwa, G. T., & Aldrich, C. (2006). Kernel-based fault diagnosis on mineral processing plants. Minerals Engineering, 19(11), 1149–1162.CrossRefGoogle Scholar
  16. Jemwa, G. T., & Aldrich, C. (2012). Estimating size fraction categories of coal particles on conveyor belts using image texture modelling methods. Expert Systems with Applications, 39(9), 7947–7960.CrossRefGoogle Scholar
  17. Kaartinen, J., Hätönen, J., Hyötyniemi, H., & Miettunen, J. (2006). Machine-visionbasedcontrol of zinc flotation – A case study. Control Engineering Practice, 14, 1455–1466.CrossRefGoogle Scholar
  18. Kwok, J. T.-Y., & Tsang, I. W.-H. (2004). The pre-image problem in kernel methods. IEEE Transactions on Neural Networks, 15(6), 1517–1525. Available at: Accessed 19 Aug 2011.CrossRefGoogle Scholar
  19. Mika, S., Schölkopf, B., Smola, A., Müller, K.-R., Scholz, M., & Rätsch, G. (1999). Kernel PCA and de-noising in feature spaces. In Advances in neural information processing systems 11 (pp. 536–542). Cambridge: MIT Press.Google Scholar
  20. Moolman, D. W., Aldrich, C., van Deventer, J. S. J., & Stange, W. W. (1995). The classification offroth structures in a copper flotation plant by means of a neural net. International Journal of Mineral Processing, 43, 23–30.CrossRefGoogle Scholar
  21. Müller, K.-R., Mika, S., Ratsch, G., Tsuda, K., & Scholkopf, B. (2001). An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks, 12(2), 181–201.CrossRefGoogle Scholar
  22. Schölkopf, B., & Smola, A. J. (2001). Learning with kernels: Support vector machines, regularization, optimization, and beyond (1st ed.). Cambridge: MIT Press.Google Scholar
  23. Schölkopf, B., Smola, A., & Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), 1299–1319.CrossRefGoogle Scholar
  24. Schölkopf, B., Mika, S., Burges, C. J. C., Knirsch, P., Muller, K.-R., Ratsch, G., & Smola, A. J. (1999). Input space versus feature space in kernel-based methods. IEEE Transactions on Neural Networks, 10(5), 1000–1017. Available at: Accessed 19 Aug 2011.CrossRefGoogle Scholar
  25. Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., & Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural Computation, 13(7), 1443–1471. Available at: Accessed 30 May 2011.zbMATHCrossRefGoogle Scholar
  26. Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  27. Smola, A. J., & Schölkopf, B. (2000). Sparse greedy matrix approximation for machine learning. In Proceedings of the seventeenth International Conference on Machine Learning. ICML’00 (pp. 911–918). San Francisco: Morgan Kaufmann Publishers Inc.. Available at: Scholar
  28. Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14(3), 199–222. Available at: Accessed 30 May 2011.MathSciNetCrossRefGoogle Scholar
  29. Smola, A. J., Mangasarian, O. L., & Schölkopf, B. (1999). Sparse kernel feature analysis. Madison: Data Mining Institute.Google Scholar
  30. Tessier, J., Duchesne, C., & Bartolacci, G. (2007). A machine vision approach to on-line estimation of run-of-mine ore composition on conveyor belts. Minerals Engineering, 20(12), 1129–1144.CrossRefGoogle Scholar
  31. Tipping, M. (2001). Sparse kernel principal component analysis. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems (Neural Information Processing Systems 13 (NIPS 2000), pp. 633–639). Cambridge, MA: MIT Press.Google Scholar
  32. Vapnik, V. (2006). Transductive inference and semi-supervisedlearning. In O. Chapelle, B. Schölkopf, & A. Zien (Eds.), Semi-supervised learning (pp. 453–472). Cambridge, MA: MIT Press.Google Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  • Chris Aldrich
    • 1
    • 2
  • Lidia Auret
    • 2
  1. 1.Western Australian School of MinesCurtin UniversityPerthAustralia
  2. 2.Department of Process EngineeringUniversity of StellenboschStellenboschSouth Africa

Personalised recommendations