Statistics and Computing

, Volume 25, Issue 6, pp 1143–1162 | Cite as

Kernel discriminant analysis and clustering with parsimonious Gaussian process models

Article

Abstract

This work presents a family of parsimonious Gaussian process models which allow to build, from a finite sample, a model-based classifier in an infinite dimensional space. The proposed parsimonious models are obtained by constraining the eigen-decomposition of the Gaussian processes modeling each class. This allows in particular to use non-linear mapping functions which project the observations into infinite dimensional spaces. It is also demonstrated that the building of the classifier can be directly done from the observation space through a kernel function. The proposed classification method is thus able to classify data of various types such as categorical data, functional data or networks. Furthermore, it is possible to classify mixed data by combining different kernels. The methodology is as well extended to the unsupervised classification case and an EM algorithm is derived for the inference. Experimental results on various data sets demonstrate the effectiveness of the proposed method. A Matlab toolbox implementing the proposed classification methods is provided as supplementary material.

Keywords

Model-based classification Kernel methods Gaussian process parsimonious models Mixed data 

Notes

Acknowledgments

The authors would like to greatly thank the associate editor and the referee for their helpful remarks and comments on the manuscript.

Supplementary material

11222_2014_9505_MOESM1_ESM.tar (80 kb)
Supplementary material 1 (tar 80 KB)

References

  1. Akaike, Hirotugu: A new look at the statistical model identification. IEEE Trans. Autom. Control 19(6), 716–723 (1974)MATHMathSciNetCrossRefGoogle Scholar
  2. Andrews, J.L., McNicholas, P.D.: Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions. Stat. Comput. 22(5), 1021–1029 (2012)MATHMathSciNetCrossRefGoogle Scholar
  3. Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Mach. Intell. 22(7), 719–725 (2001)CrossRefGoogle Scholar
  4. Bouguila, N., Ziou, D., Vaillancourt, J.: Novel mixtures based on the Dirichlet distribution: application to data and image classification. In: Machine Learning and Data Mining in Pattern Recognition, pp. 172–181. Springer, Berlin (2003)Google Scholar
  5. Bouveyron, C., Brunet, C.: Simultaneous model-based clustering and visualization in the Fisher discriminative subspace. Stat. Comput. 22(1), 301–324 (2012)MathSciNetCrossRefGoogle Scholar
  6. Bouveyron, C., Brunet-Saumard, C.: Model-based clustering of high-dimensional data: a review. Comput. Stat. Data Anal. 71, 52–78 (2013)MathSciNetCrossRefGoogle Scholar
  7. Bouveyron, C., Girard, S.: Robust supervised classification with mixture models: learning from data with uncertain labels. Pattern Recognit. 42(11), 2649–2658 (2009)MATHCrossRefGoogle Scholar
  8. Bouveyron, C., Jacques, J.: Model-based clustering of time series in group-specific functional subspaces. Adv. Data Anal. Classif. 5(4), 281–300 (2011)MATHMathSciNetCrossRefGoogle Scholar
  9. Bouveyron, C., Girard, S., Schmid, C.: High-dimensional discriminant analysis. Commun. Stat. 36, 2607–2623 (2007a)MATHMathSciNetCrossRefGoogle Scholar
  10. Bouveyron, C., Girard, S., Schmid, C.: High-dimensional data clustering. Comput. Stat. Data Anal. 52, 502–519 (2007b)MATHMathSciNetCrossRefGoogle Scholar
  11. Canu, S., Grandvalet, Y., Guigue, V., Rakotomamonjy, A.: SVM and kernel methods matlab toolbox. In: Perception Systemes et Information. INSA de Rouen, Rouen (2005)Google Scholar
  12. Caponnetto, A., Micchelli, C.A., Pontil, M., Ying, Y.: Universal multi-task kernels. J. Mach. Learn. Res. 68, 16151646 (2008)MathSciNetGoogle Scholar
  13. Cattell, R.: The scree test for the number of factors. Multivar. Behav. Res. 1(2), 245–276 (1966)CrossRefGoogle Scholar
  14. Celeux, G., Govaert, G.: Clustering criteria for discrete data and latent class models. J. Classif. 8(2), 157–176 (1991)MATHCrossRefGoogle Scholar
  15. Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-Supervised Learning. MIT Press, Cambridge. http://www.kyb.tuebingen.mpg.de/ssl-book (2006)
  16. Couto, J.: Kernel k-means for categorical data. In: Advances in Intelligent Data Analysis VI, vol. 3646 of Lecture Notes in Computer Science, pp. 739–739. Springer, Berlin (2005)Google Scholar
  17. Cuturi, M., Vert, J.P.: The context-tree kernel for strings. Neural Netw. 18(8), 1111–1123 (2005)CrossRefGoogle Scholar
  18. Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39(1), 1–38 (1977)MATHMathSciNetGoogle Scholar
  19. Dundar, M.M., Landgrebe, D.A.: Toward an optimal supervised classifier for the analysis of hyperspectral data. IEEE Trans. Geosci. Remote Sens. 42(1), 271–277 (2004)CrossRefGoogle Scholar
  20. Evgeniou, T., Micchelli, C.A., Pontil, M.: Learning multiple tasks with kernel methods. J. Mach. Learn. Res. 6, 615637 (2005)MathSciNetGoogle Scholar
  21. Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7, 179–188 (1936)CrossRefGoogle Scholar
  22. Forbes, F., Wraith, D.: A new family of multivariate heavy-tailed distributions with variable marginal amounts of tailweight: application to robust clustering. Stat. Comput. (to appear) (2014)Google Scholar
  23. Franczak, B.C., Browne, R.P., McNicholas, P.D.: Mixtures of shifted asymmetric Laplace distributions. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1149–1157 (2014)CrossRefGoogle Scholar
  24. Girolami, M.: Mercer kernel-based clustering in feature space. IEEE Trans. Neural Netw. 13(3), 780–784 (2002)CrossRefGoogle Scholar
  25. Gönen, M., Alpaydin, E.: Multiple kernel learning algorithms. J. Mach. Learn. Res. 12, 2211–2268 (2011)MATHMathSciNetGoogle Scholar
  26. Hofmann, T., Schölkopf, B., Smola, A.: Kernel methods in machine learning. Ann. Stat. 36(3), 1171–1220 (2008)MATHCrossRefGoogle Scholar
  27. Kadri, H., Rakotomamonjy, A., Bach, F., Preux, P.: Multiple Operator-Valued Kernel Learning. In: Neural Information Processing Systems (NIPS), pp. 1172–1080 (2012)Google Scholar
  28. Kuss, M., Rasmussen, C.: Assessing approximate inference for binary Gaussian process classification. J. Mach. Learn. Res. 6, 1679–1704 (2005)MATHMathSciNetGoogle Scholar
  29. Lee, S., McLachlan, G.J.: Finite mixtures of multivariate skew t-distributions: some recent and new results. Stat. Comput. 24(2), 181–202 (2013)MathSciNetCrossRefGoogle Scholar
  30. Lehoucq, R., Sorensen, D.: Deflation techniques for an implicitly restarted arnoldi iteration. SIAM J. Matrix Anal. Appl. 17(4), 789–821 (1996)MATHMathSciNetCrossRefGoogle Scholar
  31. Lin, T.I.: Robust mixture modeling using multivariate skew t distribution. Stat. Comput. 20, 343–356 (2010)MathSciNetCrossRefGoogle Scholar
  32. Lin, T.I., Lee, J.C., Hsieh, W.J.: Robust mixture modeling using the skew t distribution. Stat. Comput. 17, 81–92 (2007)MathSciNetCrossRefGoogle Scholar
  33. Mahé, P., Vert, J.P.: Graph kernels based on tree patterns for molecules. Mach. Learn. 75(1), 3–35 (2009)CrossRefGoogle Scholar
  34. McLachlan, G.: Discriminant Analysis and Statistical Pattern Recognition. Wiley, New York (1992)CrossRefGoogle Scholar
  35. McLachlan, G., Peel, D., Bean, R.: Modelling high-dimensional data by mixtures of factor analyzers. Comput. Stat. Data Anal. 41, 379–388 (2003)MATHMathSciNetCrossRefGoogle Scholar
  36. McNicholas, P., Murphy, B.: Parsimonious Gaussian mixture models. Stat. Comput. 18(3), 285–296 (2008)MathSciNetCrossRefGoogle Scholar
  37. Mika, S., Ratsch, G., Weston, J., Schölkopf, B., Müllers, K.R.: Fisher discriminant analysis with kernels. In: Neural Networks for Signal Processing (NIPS), pp. 41–48 (1999)Google Scholar
  38. Minka, T.: Expectation propagation for approximate bayesian inference. In: Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, pp. 362–369. Morgan Kaufmann, San Francisco (2001)Google Scholar
  39. Montanari, A., Viroli, C.: Heteroscedastic factor mixture analysis. Stat. Model. 10(4), 441–460 (2010)MathSciNetCrossRefGoogle Scholar
  40. Murphy, T.B., Dean, N., Raftery, A.E.: Variable selection and updating in model-based discriminant analysis for high dimensional data with food authenticity applications. Ann. Appl. Stat. 4(1), 219–223 (2010)MathSciNetCrossRefGoogle Scholar
  41. Murua, A., Wicker, N.: Kernel-based Mixture Models for Classification. Technical Report, University of Montréal (2014)Google Scholar
  42. Pekalska, E., Haasdonk, B.: Kernel discriminant analysis for positive definite and indefinite kernels. IEEE Trans. Pattern Anal. Mach. Intell. 31(6), 1017–1032 (2009)CrossRefGoogle Scholar
  43. Ramsay, J.O., Silverman, B.W.: Functional Data Analysis. Springer Series in Statistics, 2nd edn. Springer, New York (2005)Google Scholar
  44. Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning Matlab Toolbox. MIT, Cambridge (2006a)Google Scholar
  45. Rasmussen, C., Williams, C.: Gaussian Processes for Machine Learning. MIT, Cambridge (2006b)MATHGoogle Scholar
  46. Scholkopf, B., Smola, A.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT, Cambridge (2001)Google Scholar
  47. Schölkopf, B., Smola, A., Müller, K.-R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10(5), 1299–1319 (1998)CrossRefGoogle Scholar
  48. Schölkopf, B., Tsuda, K., Vert, J.-P. (eds.): Kernel Methods in Computational Biology. MIT, Cambridge (2004)Google Scholar
  49. Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)MATHCrossRefGoogle Scholar
  50. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)CrossRefGoogle Scholar
  51. Shorack, G.R., Wellner, J.A.: Empirical Processes with Applications to Statistics. Wiley, New York (1986)MATHGoogle Scholar
  52. Smola, A., Kondor, R.: Kernels and regularization on graphs. In: Proceedings of Conference on Learning Theory and Kernel Machines, pp. 144–158 (2003)Google Scholar
  53. Wang, J., Lee, J., Zhang, C.: Kernel trick embedded Gaussian mixture model. In: Proceedings of the 14th International Conference on Algorithmic Learning Theory, pp. 159–174 (2003)Google Scholar
  54. Xu, Z., Huang, K., Zhu, J., King, I., Lyu, M.R.: A novel kernel-based maximum a posteriori classification method. Neural Netw. 22, 977–987 (2009)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Laboratoire MAP5UMR 8145, Université Paris Descartes & Sorbonne Paris CitéParisFrance
  2. 2.Laboratoire DYNAFORUMR 1201, INRA & Université de ToulouseToulouseFrance
  3. 3.Equipe MISTISINRIA Grenoble Rhône-Alpes & LJKGrenoble CedexFrance

Personalised recommendations