On the Combination of Information-Theoretic Kernels with Generative Embeddings

  • Pedro M. Q. Aguiar
  • Manuele Bicego
  • Umberto Castellani
  • Mário A. T. Figueiredo
  • André T. Martins
  • Vittorio Murino
  • Alessandro Perina
  • Aydın Ulaş
Part of the Advances in Computer Vision and Pattern Recognition book series (ACVPR)

Abstract

Classical methods to obtain classifiers for structured objects (e.g., sequences, images) are based on generative models and adopt a classical generative Bayesian framework. To embrace discriminative approaches (namely, support vector machines), the objects have to be mapped/embedded onto a Hilbert space; one way that has been proposed to carry out such an embedding is via generative models (maybe learned from data). This type of hybrid discriminative/generative approach has been recently shown to outperform classifiers obtained directly from the generative model upon which the embedding is built.

Discriminative approaches based on generative embeddings involve two key components: a generative model used to define the embedding; a discriminative learning algorithms to obtain a (maybe kernel) classifier. The literature on generative embedding is essentially focused on defining the embedding, and some standard off-the-shelf kernel and learning algorithm are usually adopted. Recently, we have proposed a different approach that exploits the probabilistic nature of generative embeddings, by using information-theoretic kernels defined on probability distributions. In this chapter, we review this approach and its building blocks. We illustrate the performance of this approach on two medical applications.

References

  1. 1.
    Ng, A.Y., Jordan, M.I.: On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. In: In Advances in Neural Information Processing Systems (NIPS), vol. 14, pp. 841–848. MIT Press, Cambridge (2002) Google Scholar
  2. 2.
    Dan Rubinstein, Y., Hastie, T.: Discriminative vs informative learning. In: International Conference on Knowledge Discovery and Data Mining, KDD’1997, pp. 49–53. AAAI Press, Menlo Park (1997) Google Scholar
  3. 3.
    Ripley, B.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996) MATHGoogle Scholar
  4. 4.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Berlin (1995) CrossRefMATHGoogle Scholar
  5. 5.
    Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002) Google Scholar
  6. 6.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004) CrossRefGoogle Scholar
  7. 7.
    Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Advances in Neural Information Processing Systems (NIPS), vol. 11, pp. 487–493. MIT Press, Cambridge (1998) Google Scholar
  8. 8.
    Lasserre, J., Bishop, C., Minka, T.: Principled hybrids of generative and discriminative models. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 87–94 (2006) Google Scholar
  9. 9.
    Bicego, M., Murino, V., Figueiredo, M.: Similarity-based classification of sequences using hidden Markov models. Pattern Recognit. 37(12), 2281–2291 (2004) Google Scholar
  10. 10.
    Bosch, A., Zisserman, A., Munoz, X.: Scene classification via pLSA. In: European Conference on Computer Vision (ECCV), pp. 517–530 (2006) Google Scholar
  11. 11.
    Perina, A., Cristani, M., Castellani, U., Murino, V., Jojic, N.: A hybrid generative/discriminative classification framework based on free-energy terms. In: IEEE International Conference on Computer Vision (ICCV), pp. 2058–2065 (2009) Google Scholar
  12. 12.
    Perina, A., Cristani, M., Castellani, U., Murino, V., Jojic, N.: Free energy score space. In: Advances in Neural Information Processing Systems (NIPS), vol. 22, pp. 1428–1436. MIT Press, Cambridge (2009) Google Scholar
  13. 13.
    Chandalia, G., Beal, M.J.: Using fisher kernels from topic models for dimensionality reduction. In: NIPS Workshop on Novel Applications of Dimensionality Reduction (2006) Google Scholar
  14. 14.
    Chappelier, J.-C., Eckard, E.: PLSI: The true Fisher kernel and beyond. In: European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), pp. 195–210 (2009) CrossRefGoogle Scholar
  15. 15.
    Figueiredo, M., Aguiar, P., Martins, A., Murino, V., Bicego, M.: Information theoretical kernels for generative embeddings based on hidden Markov models. In: Joint IAPR International Workshops on Structural, Syntactic, and Statistical Pattern Recognition—S+SSPR’2010, Izmir, Turkey (2010) Google Scholar
  16. 16.
    Bicego, M., Perina, A., Murino, V., Martins, A., Aguiar, P., Figueiredo, M.: Combining free energy score spaces with information theoretic kernels: application to scene classification. In: IEEE International Conference on Image Processing—ICIP’2010, Hong Kong (2010) Google Scholar
  17. 17.
    Bicego, M., Ulaş, A., Schüffler, P., Castellani, U., Mirtuono, P., Murino, V., Martins, A., Aguiar, P., Figueiredo, M.: Renal cancer cell classification using generative embeddings and information theoretic kernels. In: International Conference on Pattern Recognition in Bioinformatics (PRIB) (2011) Google Scholar
  18. 18.
    Martins, A., Smith, N., Xing, E., Aguiar, P., Figueiredo, M.: Nonextensive information theoretic kernels on measures. J. Mach. Learn. Res. 10, 935–975 (2009) MathSciNetMATHGoogle Scholar
  19. 19.
    Cuturi, M., Vert, J.-P.: Semigroup kernels on finite sets. In: Advances in Neural Information Processing Systems (NIPS), pp. 329–336. MIT Press, Cambridge (2005) Google Scholar
  20. 20.
    Cuturi, M., Fukumizu, K., Vert, J.-P.: Semigroup kernels on measures. J. Mach. Learn. Res. 6, 1169–1198 (2005) MathSciNetMATHGoogle Scholar
  21. 21.
    Moreno, P., Ho, P., Vasconcelos, N.: Kullback–Leibler divergence based kernel for SVM classification in multimedia applications. In: Advances in Neural Information Processing Systems (NIPS). MIT Press, Cambridge (2003) Google Scholar
  22. 22.
    Tsuda, K., Kawanabe, M., Rätsch, G., Sonnenburg, S., Müller, K.-R.: A new discriminative kernel from probabilistic models. Neural Comput. 14, 2397–2414 (2002) CrossRefMATHGoogle Scholar
  23. 23.
    Smith, N., Gales, M.: Speech recognition using SVMs. In: Advances in Neural Information Processing Systems (NIPS), vol. 14, pp. 1197–1204. MIT Press, Cambridge (2002) Google Scholar
  24. 24.
    Li, X., Lee, T.S., Liu, Y.: Hybrid generative-discriminative classification using posterior divergence. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2713–2720 (2011) Google Scholar
  25. 25.
    Bicego, M., Pekalska, E., Tax, D.M.J., Duin, R.P.W.: Component-based discriminative classification for hidden Markov models. Pattern Recognit. 42, 2637–2648 (2009) CrossRefMATHGoogle Scholar
  26. 26.
    Krishnapuram, B., Carin, L., Figueiredo, M.A.T., Hartemink, A.J.: Sparse multinomial logistic regression: fast algorithms and generalization bounds. IEEE Trans. Pattern Anal. Mach. Intell. 27, 957–968 (2005) CrossRefGoogle Scholar
  27. 27.
    Bicego, M., Lovato, P., Oliboni, B., Perina, A.: Expression microarray classification using topic models. In: ACM Symposium on Applied Computing, pp. 1516–1520 (2010) Google Scholar
  28. 28.
    Castellani, U., Perina, A., Murino, V., Bellani, M., Rambaldelli, G., Tansella, M., Brambilla, P.: Brain morphometry by probabilistic latent semantic analysis. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 177–184 (2010) Google Scholar
  29. 29.
    Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1–2), 177–196 (2001) CrossRefMATHGoogle Scholar
  30. 30.
    Hofmann, T.: Learning the similarity of documents: an information-geometric approach to document retrieval and categorization. In: Advances in Neural Information Processing Systems (NIPS), pp. 914–920. MIT Press, Cambridge (2000) Google Scholar
  31. 31.
    Smith, N., Gales, M.: Using SVMs to classify variable length speech patterns. Technical Report CUED/F-INFENG/TR–412, Cambridge University Engineering Department (2002) Google Scholar
  32. 32.
    Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1979) CrossRefGoogle Scholar
  33. 33.
    Suyari, H.: Generalization of Shannon–Khinchin axioms to nonextensive systems and the uniqueness theorem for the nonextensive entropy. IEEE Trans. Inf. Theory 50(8) (2004) Google Scholar
  34. 34.
    Cover, T., Thomas, J.: Elements of Information Theory. Wiley, New York (1991) CrossRefMATHGoogle Scholar
  35. 35.
    Tsallis, C.: Possible generalization of Boltzmann–Gibbs statistics. J. Stat. Phys. 52, 479–487 (1988) MathSciNetCrossRefMATHGoogle Scholar
  36. 36.
    Burbea, J., Rao, C.: On the convexity of some divergence measures based on entropy functions. IEEE Trans. Inf. Theory 28(3) (1982) Google Scholar
  37. 37.
    Lin, J.: Divergence measures based on Shannon entropy. IEEE Trans. Inf. Theory 37 (1991) Google Scholar
  38. 38.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004) CrossRefGoogle Scholar
  39. 39.
    Schüffler, P., Fuchs, T., Ong, C.S., Roth, V., Buhmann, J.: Computational TMA analysis and cell nucleus classification of renal cell carcinoma. In: 32nd DAGM Conference on Pattern Recognition, pp. 202–211. Springer, Berlin (2010) Google Scholar
  40. 40.
    Bosch, A., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid kernel. In: 6th ACM International Conference on Image and Video Retrieval (CIVR), pp. 401–408 (2007) CrossRefGoogle Scholar
  41. 41.
    Rogers, S., Girolami, M., Campbell, C., Breitling, R.: The latent process decomposition of cdna microarray data sets. IEEE/ACM Trans. Comput. Biol. Bioinform. 2(2), 143–156 (2005) CrossRefGoogle Scholar
  42. 42.
    Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96(12), 6745–6750 (1999) CrossRefGoogle Scholar
  43. 43.
    Ulaş, A., Schüffler, P., Bicego, M., Castellani, U., Murino, V.: Hybrid generative-discriminative nucleus classification of renal cell carcinoma. In: Pelillo, M., Hancock, E. (eds.) International Workshop on Similarity-Based Pattern Analysis (SIMBAD). LNCS, vol. 7005, pp. 77–88. Springer, Berlin (2011) Google Scholar
  44. 44.
    Deegalla, S., Bostrom, H.: Fusion of dimensionality reduction methods: a case study in microarray classification. In: Proc. Int. Conf. on Information Fusion, pp. 460–465 (2009) Google Scholar
  45. 45.
    German, D., Afsari, B., Choon, T.A., Naiman, D.Q.: Microarray classification from several two-gene expression comparisons. In: Proc. Int. Conf. on Machine Learning and Applications, pp. 583–585 (2008) Google Scholar
  46. 46.
    Liu, H., Liu, L., Zhang, H.: Ensemble gene selection by grouping for microarray data classification. J. Biomed. Inform. 43(1), 81–87 (2010) CrossRefGoogle Scholar
  47. 47.
    Wang, L., Zhu, J., Zou, H.: Hybrid Huberized support vector machines for microarray classification and gene selection. Bioinformatics 24(3), 412–419 (2008) CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  • Pedro M. Q. Aguiar
    • 1
  • Manuele Bicego
    • 2
  • Umberto Castellani
    • 2
  • Mário A. T. Figueiredo
    • 3
  • André T. Martins
    • 3
  • Vittorio Murino
    • 2
  • Alessandro Perina
    • 4
  • Aydın Ulaş
    • 2
  1. 1.Instituto de Sistemas e RobóticaInstituto Superior TécnicoLisboaPortugal
  2. 2.Dipartimento di InformaticaUniversity of VeronaVeronaItaly
  3. 3.Instituto de TelecomunicaçõesInstituto Superior TécnicoLisboaPortugal
  4. 4.Microsoft ResearchRedmondUSA

Personalised recommendations