Advertisement

PAC-Bayesian Analysis for a Two-Step Hierarchical Multiview Learning Approach

  • Anil GoyalEmail author
  • Emilie Morvant
  • Pascal Germain
  • Massih-Reza Amini
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10535)

Abstract

We study a two-level multiview learning with more than two views under the PAC-Bayesian framework. This approach, sometimes referred as late fusion, consists in learning sequentially multiple view-specific classifiers at the first level, and then combining these view-specific classifiers at the second level. Our main theoretical result is a generalization bound on the risk of the majority vote which exhibits a term of diversity in the predictions of the view-specific classifiers. From this result it comes out that controlling the trade-off between diversity and accuracy is a key element for multiview learning, which complements other results in multiview learning. Finally, we experiment our principle on multiview datasets extracted from the Reuters RCV1/RCV2 collection.

Keywords

PAC-Bayesian theory Multiview learning 

Notes

Acknowledgments

This work was partially funded by the French ANR project LIVES ANR-15-CE23-0026-03, the “Région Rhône-Alpes”, and the CIFAR program in Learning in Machines & Brains.

References

  1. 1.
    Amini, M.-R., Usunier, N., Goutte, C.: Learning from multiple partially observed views - an application to multilingual text categorization. In: NIPS, pp. 28–36 (2009)Google Scholar
  2. 2.
    Atrey, P.K., Hossain, M.A., El-Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimedia Syst. 16(6), 345–379 (2010)CrossRefGoogle Scholar
  3. 3.
    Bégin, L., Germain, P., Laviolette, F., Roy, J.-F.: PAC-Bayesian bounds based on the Rényi divergence. In: AISTATS, pp. 435–444 (2016)Google Scholar
  4. 4.
    Blum, A., Mitchell, T.M.: Combining Labeled and Unlabeled Data with Co-training. In: COLT, pp. 92–100 (1998)Google Scholar
  5. 5.
    Catoni, O.: PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning, vol. 56. Institute of Mathematical Statistic, Shaker Heights (2007)zbMATHGoogle Scholar
  6. 6.
    Chapelle, O., Schlkopf, B., Zien, A.: Semi-Supervised Learning, 1st edn. The MIT Press, Cambridge (2010). ISBN 0262514125, 9780262514125Google Scholar
  7. 7.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)zbMATHGoogle Scholar
  8. 8.
    Donsker, M.D., Varadhan, S.S.: Asymptotic evaluation of certain markov process expectations for large time, I. Commun. Pure Appl. Math. 28(1), 1–47 (1975)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Germain, P., Lacasse, A., Laviolette, F., Marchand, M.: PAC-Bayesian learning of linear classifiers. In: ICML, pp. 353–360 (2009)Google Scholar
  10. 10.
    Germain, P., Lacasse, A., Laviolette, F., Marchand, M., Roy, J.: Risk bounds for the majority vote: from a PAC-Bayesian analysis to a learning algorithm. JMLR 16, 787–860 (2015)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Goyal, A., Morvant, E., Germain, P., Amini, M.-R.: PAC-Bayesian analysis for a two-step hierarchical multiview learning approach. arXiv preprint arXiv:1606.07240 (2016)
  12. 12.
    Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998).  https://doi.org/10.1007/BFb0026683. ISBN 3-540-64417-2CrossRefGoogle Scholar
  13. 13.
    Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, Hoboken (2004). ISBN 0471210781CrossRefzbMATHGoogle Scholar
  14. 14.
    Lacasse, A., Laviolette, F., Marchand, M., Germain, P., Usunier, N.: PAC-Bayes bounds for the risk of the majority vote and the variance of the Gibbs classifier. In: NIPS, pp. 769–776 (2006)Google Scholar
  15. 15.
    Langford, J.: Tutorial on practical prediction theory for classification. JMLR 6, 273–306 (2005)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Langford, J., Shawe-Taylor, J.: PAC-Bayes & margins. In: NIPS, pp. 423–430. MIT Press (2002)Google Scholar
  17. 17.
    Laviolette, F., Marchand, M., Roy, J.-F.: From PAC-Bayes bounds to quadratic programs for majority votes. In: ICML (2011)Google Scholar
  18. 18.
    Lecué, G., Rigollet, P.: Optimal learning with Q-aggregation. Ann. Statist. 42(1), 211–224 (2014).  https://doi.org/10.1214/13-AOS1190 MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Lehmann, E.: Nonparametric Statistical Methods Based on Ranks. McGraw-Hill, New York (1975)zbMATHGoogle Scholar
  20. 20.
    Maillard, O.-A., Vayatis, N.: Complexity versus agreement for many views. In: Gavaldà, R., Lugosi, G., Zeugmann, T., Zilles, S. (eds.) ALT 2009. LNCS (LNAI), vol. 5809, pp. 232–246. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-04414-4_21 CrossRefGoogle Scholar
  21. 21.
    McAllester, D.A.: Some PAC-Bayesian theorems. Mach. Learn. 37, 355–363 (1999)CrossRefzbMATHGoogle Scholar
  22. 22.
    McAllester, D.A.: PAC-Bayesian stochastic model selection. Mach. Learn. 51, 5–21 (2003)CrossRefzbMATHGoogle Scholar
  23. 23.
    Morvant, E., Habrard, A., Ayache, S.: Majority vote of diverse classifiers for late fusion. In: Fränti, P., Brown, G., Loog, M., Escolano, F., Pelillo, M. (eds.) S+SSPR 2014. LNCS, vol. 8621, pp. 153–162. Springer, Heidelberg (2014).  https://doi.org/10.1007/978-3-662-44415-3_16 Google Scholar
  24. 24.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  25. 25.
    Pentina, A., Lampert, C.H.: A PAC-Bayesian bound for lifelong learning. In: ICML, pp. 991–999 (2014)Google Scholar
  26. 26.
    Roy, J.-F., Marchand, M., Laviolette, F.: A column generation bound minimization approach with PAC-Bayesian generalization guarantees. In: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pp. 1241–1249 (2016)Google Scholar
  27. 27.
    Seeger, M.W.: PAC-Bayesian generalisation error bounds for gaussian process classification. JMLR 3, 233–269 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Snoek, C., Worring, M., Smeulders, A.W.M.: Early versus late fusion in semantic video analysis. In: ACM Multimedia, pp. 399–402 (2005)Google Scholar
  29. 29.
    Sun, S., Shawe-Taylor, J., Mao, L.: PAC-Bayes analysis of multi-view learning. CoRR, abs/1406.5614 (2016)Google Scholar
  30. 30.
    Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Anil Goyal
    • 1
    • 2
    Email author
  • Emilie Morvant
    • 1
  • Pascal Germain
    • 3
    • 4
  • Massih-Reza Amini
    • 2
  1. 1.Univ Lyon, UJM-Saint-Etienne, CNRS, Institut d’Optique Graduate School, Laboratoire Hubert Curien UMR 5516Saint-EtienneFrance
  2. 2.Univ. Grenoble Alps, Laboratoire d’Informatique de Grenoble, AMA, Centre Equation 4Grenoble Cedex 9France
  3. 3.Département d’informatique de l’ENS École Normale Supérieure, CNRS, PSL Research UniversityParisFrance
  4. 4.INRIAParisFrance

Personalised recommendations