Scalable Large Margin Gaussian Process Classification

Wistuba, Martin; Rawat, Ambrish

doi:10.1007/978-3-030-46147-8_30

Scalable Large Margin Gaussian Process Classification

Martin Wistuba¹⁴ &
Ambrish Rawat¹⁴

Conference paper
First Online: 30 April 2020

1374 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11907))

Abstract

We introduce a new Large Margin Gaussian Process (LMGP) model by formulating a pseudo-likelihood for a generalised multi-class hinge loss. We derive a highly scalable training objective for the proposed model using variational-inference and inducing point approximation. Additionally, we consider the joint learning of LMGP-DNN which combines the proposed model with traditional Deep Learning methods to enable learning for unstructured data. We demonstrate the effectiveness of the Large Margin GP with respect to both training time and accuracy in an extensive classification experiment consisting of 68 structured and two unstructured data sets. Finally, we highlight the key capability and usefulness of our model in yielding prediction uncertainty for classification by demonstrating its effectiveness in the tasks of large-scale active learning and detection of adversarial images.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Andrews, D.F., Mallows, C.L.: Scale mixtures of normal distributions. J. Roy. Stat. Soc.: Ser. B (Methodol.) 36(1), 99–102 (1974). http://www.jstor.org/stable/2984774
MathSciNet MATH Google Scholar
Blei, D.M., Kucukelbir, A., McAuliffe, J.D.: Variational inference: a review for statisticians. CoRR abs/1601.00670 (2016). http://arxiv.org/abs/1601.00670
Bradshaw, J., de G. Matthews, A.G., Ghahramani, Z.: Adversarial examples, uncertainty, and transfer testing robustness in Gaussian process hybrid deep networks (2017)
Google Scholar
Carlini, N., Wagner, D.: Adversarial examples are not easily detected: bypassing ten detection methods (2017)
Google Scholar
Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2, 265–292 (2001). http://www.jmlr.org/papers/v2/crammer01a.html
MATH Google Scholar
Dogan, Ü., Glasmachers, T., Igel, C.: A unified view on multi-class support vector classification. J. Mach. Learn. Res. 17, 45:1–45:32 (2016). http://jmlr.org/papers/v17/11-229.html
MathSciNet MATH Google Scholar
Gal, Y., Islam, R., Ghahramani, Z.: Deep Bayesian active learning with image data. In: Proceedings of the 34th International Conference on Machine Learning (ICML 2017), Sydney, NSW, Australia, 6–11 August 2017, pp. 1183–1192 (2017). http://proceedings.mlr.press/v70/gal17a.html
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples (2014)
Google Scholar
Henao, R., Yuan, X., Carin, L.: Bayesian nonlinear support vector machines and discriminative factor modeling. In: Advances in Neural Information Processing Systems, pp. 1754–1762 (2014)
Google Scholar
Hensman, J., de G. Matthews, A.G., Ghahramani, Z.: Scalable variational Gaussian process classification. In: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2015), San Diego, California, USA, 9–12 May 2015 (2015). http://jmlr.org/proceedings/papers/v38/hensman15.html
Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box functions. J. Global Optim. 13(4), 455–492 (1998). https://doi.org/10.1023/A:1008306431147
Article MathSciNet MATH Google Scholar
Jordan, M.I., Ghahramani, Z., Jaakkola, T.S., Saul, L.K.: An introduction to variational methods for graphical models. Mach. Learn. 37(2), 183–233 (1999). https://doi.org/10.1023/A:1007665907178
Article MATH Google Scholar
Jørgensen, B.: Statistical Properties of the Generalized Inverse Gaussian Distribution. LNS, vol. 9. Springer, New York (1982)
Book MATH Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980
Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report (2009)
Google Scholar
Kuss, M., Rasmussen, C.E.: Assessing approximate inference for binary Gaussian process classification. J. Mach. Learn. Res. 6, 1679–1704 (2005). http://www.jmlr.org/papers/v6/kuss05a.html
MathSciNet MATH Google Scholar
LeCun, Y., Cortes, C.: MNIST handwritten digit database (2010). http://yann.lecun.com/exdb/mnist/
Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th International Conference on World Wide Web (WWW 2010), Raleigh, North Carolina, USA, 26–30 April 2010, pp. 661–670 (2010). https://doi.org/10.1145/1772690.1772758. http://doi.acm.org/10.1145/1772690.1772758
Li, Y., Gal, Y.: Dropout inference in Bayesian neural networks with alpha-divergences. In: Proceedings of the 34th International Conference on Machine Learning (ICML 2017), Sydney, NSW, Australia, 6–11 August 2017, pp. 2052–2061 (2017). http://proceedings.mlr.press/v70/li17a.html
Luts, J., Ormerod, J.T.: Mean field variational Bayesian inference for support vector machine classification. Comput. Stat. Data Anal. 73, 163–176 (2014). https://doi.org/10.1016/j.csda.2013.10.030
Article MathSciNet MATH Google Scholar
de G. Matthews, A.G., et al.: GPflow: a Gaussian process library using TensorFlow. J. Mach. Learn. Res. 18(40), 1–6 (2017). http://jmlr.org/papers/v18/16-537.html
Olson, R.S., La Cava, W., Orzechowski, P., Urbanowicz, R.J., Moore, J.H.: PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Min. 10(1), 36 (2017). https://doi.org/10.1186/s13040-017-0154-4
Article Google Scholar
Perkins, H., Xu, M., Zhu, J., Zhang, B.: Fast parallel SVM using data augmentation. arXiv preprint arXiv:1512.07716 (2015)
Polson, N.G., Scott, S.L., et al.: Data augmentation for support vector machines. Bayesian Anal. 6(1), 1–23 (2011)
Article MathSciNet MATH Google Scholar
Pu, Y., et al.: Variational autoencoder for deep learning of images, labels and captions. In: Advances in Neural Information Processing Systems, pp. 2352–2360 (2016)
Google Scholar
Pu, Y., Yuan, W., Stevens, A., Li, C., Carin, L.: A deep generative deconvolutional image model. In: Artificial Intelligence and Statistics, pp. 741–750 (2016)
Google Scholar
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge (2006). http://www.worldcat.org/oclc/61285753
MATH Google Scholar
Seeger, M.: Bayesian model selection for support vector machines, Gaussian processes and other kernel classifiers. In: Advances in Neural Information Processing Systems, pp. 603–609 (2000)
Google Scholar
Settles, B.: Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison (2009)
Google Scholar
Smith, L., Gal, Y.: Understanding measures of uncertainty for adversarial example detection. arXiv preprint arXiv:1803.08533 (2018)
Snelson, E., Ghahramani, Z.: Sparse Gaussian processes using pseudo-inputs. In: Advances in Neural Information Processing Systems 18. Neural Information Processing Systems (NIPS 2005), Vancouver, British Columbia, Canada, 5–8 December 2005, pp. 1257–1264 (2005). http://papers.nips.cc/paper/2857-sparse-gaussian-processes-using-pseudo-inputs
Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held 3–6 December 2012, Lake Tahoe, Nevada, United States, pp. 2960–2968 (2012). http://papers.nips.cc/paper/4522-practical-bayesian-optimization-of-machine-learning-algorithms
Sollich, P.: Probabilistic methods for support vector machines. In: Advances in Neural Information Processing Systems, pp. 349–355 (2000)
Google Scholar
Titsias, M.K.: Variational learning of inducing variables in sparse Gaussian processes. In: Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS 2009), Clearwater Beach, Florida, USA, 16–18 April 2009, pp. 567–574 (2009). http://www.jmlr.org/proceedings/papers/v5/titsias09a.html
Wenzel, F., Galy-Fajou, T., Deutsch, M., Kloft, M.: Bayesian nonlinear support vector machines for big data. In: Ceci, M., Hollmén, J., Todorovski, L., Vens, C., Džeroski, S. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10534, pp. 307–322. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71249-9_19
Chapter Google Scholar
Williams, C.K.I., Barber, D.: Bayesian classification with Gaussian processes. IEEE Trans. Pattern Anal. Mach. Intell. 20(12), 1342–1351 (1998). https://doi.org/10.1109/34.735807
Article Google Scholar

Download references

Author information

Authors and Affiliations

IBM Research, Dublin, Republic of Ireland
Martin Wistuba & Ambrish Rawat

Authors

Martin Wistuba
View author publications
You can also search for this author in PubMed Google Scholar
Ambrish Rawat
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Wistuba .

Editor information

Editors and Affiliations

Leuphana University, Lüneburg, Germany
Ulf Brefeld
IRISA/Inria, Rennes, France
Elisa Fromont
University of Würzburg, Würzburg, Germany
Andreas Hotho
Leiden University, Leiden, The Netherlands
Arno Knobbe
ETH Zurich, Zurich, Switzerland
Marloes Maathuis
Institut National des Sciences Appliquées, Villeurbanne, France
Céline Robardet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wistuba, M., Rawat, A. (2020). Scalable Large Margin Gaussian Process Classification. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11907. Springer, Cham. https://doi.org/10.1007/978-3-030-46147-8_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-46147-8_30
Published: 30 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46146-1
Online ISBN: 978-3-030-46147-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)