Generative Max-Mahalanobis Classifiers for Image Classification, Generation and More

Yang, Xiulong; Ye, Hui; Ye, Yang; Li, Xiang; Ji, Shihao

doi:10.1007/978-3-030-86520-7_5

Xiulong Yang¹³,
Hui Ye¹³,
Yang Ye¹³,
Xiang Li¹³ &
…
Shihao Ji¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12976))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1873 Accesses

Abstract

Joint Energy-based Model (JEM) of [11] shows that a standard softmax classifier can be reinterpreted as an energy-based model (EBM) for the joint distribution \(p(\boldsymbol{x}, y)\); the resulting model can be optimized to improve calibration, robustness and out-of-distribution detection, while generating samples rivaling the quality of recent GAN-based approaches. However, the softmax classifier that JEM exploits is inherently discriminative and its latent feature space is not well formulated as probabilistic distributions, which may hinder its potential for image generation and incur training instability. We hypothesize that generative classifiers, such as Linear Discriminant Analysis (LDA), might be more suitable for image generation since generative classifiers model the data generation process explicitly. This paper therefore investigates an LDA classifier for image classification and generation. In particular, the Max-Mahalanobis Classifier (MMC) [30], a special case of LDA, fits our goal very well. We show that our Generative MMC (GMMC) can be trained discriminatively, generatively or jointly for image classification and generation. Extensive experiments on multiple datasets show that GMMC achieves state-of-the-art discriminative and generative performances, while outperforming JEM in calibration, adversarial robustness and out-of-distribution detection by a significant margin. Our source code is available at https://github.com/sndnyang/GMMC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
To avoid notational clutter in later derivations, we use \(\boldsymbol{\phi }\) to denote a CNN feature extractor and its parameter. But the meaning of \(\boldsymbol{\phi }\) is clear given the context.
2.
We can treat \(\gamma \) as a tunable hyperparameter or we can estimate it by post-processing. In this work, we take the latter approach as discussed in Sect. 3.1.
3.
Supplementary material: https://arxiv.org/abs/2101.00122.
4.
https://github.com/wgrathwohl/JEM.

References

Ardizzone, L., Mackowiak, R., Rother, C., Köthe, U.: Training normalizing flows with the information bottleneck for competitive generative classification. In: Neural Information Processing Systems (NeurIPS) (2020)
Google Scholar
Behrmann, J., Grathwohl, W., Chen, R.T., Duvenaud, D., Jacobsen, J.H.: Invertible residual networks. arXiv preprint arXiv:1811.00995 (2018)
Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: ICLR (2019)
Google Scholar
Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: IEEE Symposium on Security and Privacy (S&P) (2017)
Google Scholar
Chapelle, O., Scholkopf, B., Zien, A.: Semi-supervised learning. IEEE Trans. Neural Netw. 20(3), 542–542 (2009)
Article Google Scholar
Chen, R.T., Behrmann, J., Duvenaud, D., Jacobsen, J.H.: Residual flows for invertible generative modeling. arXiv preprint arXiv:1906.02735 (2019)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc.: Ser. B (Methodol.) 39(1), 1–22 (1977)
MathSciNet MATH Google Scholar
Du, Y., Mordatch, I.: Implicit generation and generalization in energy-based models. arXiv preprint arXiv:1903.08689 (2019)
Gao, R., Nijkamp, E., Kingma, D.P., Xu, Z., Dai, A.M., Wu, Y.N.: Flow contrastive estimation of energy-based models. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
Goodfellow, I., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: International Conference on Learning Representations (ICLR) (2015)
Google Scholar
Grathwohl, W., Wang, K.C., Jacobsen, J.H., Duvenaud, D., Norouzi, M., Swersky, K.: Your classifier is secretly an energy based model and you should treat it like one. In: International Conference on Learning Representations (ICLR) (2020)
Google Scholar
Grathwohl, W., Wang, K.C., Jacobsen, J.H., Duvenaud, D., Zemel, R.: Learning the stein discrepancy for training and evaluating energy-based models without sampling. In: Proceedings of the 37th International Conference on Machine Learning (ICML) (2020)
Google Scholar
Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1321–1330. JMLR. org (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. In: International Conference on Learning Representations (ICLR) (2016)
Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in Neural Information Processing Systems, pp. 6626–6637 (2017)
Google Scholar
Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002)
Article Google Scholar
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report, Citeseer (2009)
Google Scholar
Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial machine learning at scale. In: International Conference on Learning Representations (ICLR) (2017)
Google Scholar
LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., Huang, F.: A tutorial on energy-based learning. Predicting Structured Data 1(0) (2006)
Google Scholar
Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. (JMLR) 9(Nov), 2579–2605 (2008)
Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: ICLR (2018)
Google Scholar
Mnih, A., Teh, Y.W.: A fast and simple algorithm for training neural probabilistic language models. In: International Conference on Machine Learning (2012)
Google Scholar
Nalisnick, E., Matsukawa, A., Teh, Y.W., Gorur, D., Lakshminarayanan, B.: Do deep generative models know what they don’t know? arXiv preprint arXiv:1810.09136 (2018)
Nalisnick, E., Matsukawa, A., Teh, Y.W., Lakshminarayanan, B.: Detecting out-of-distribution inputs to deep generative models using a test for typicality. arXiv preprint arXiv:1906.02994 (2019)
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011)
Google Scholar
Nijkamp, E., Hill, M., Han, T., Zhu, S.C., Wu, Y.N.: On the anatomy of MCMC-based maximum likelihood learning of energy-based models. arXiv preprint arXiv:1903.12370 (2019)
Nijkamp, E., Zhu, S.C., Wu, Y.N.: On learning non-convergent short-run MCMC toward energy-based model. arXiv preprint arXiv:1904.09770 (2019)
Pang, T., Du, C., Zhu, J.: Max-mahalanobis linear discriminant analysis networks. In: ICML (2018)
Google Scholar
Pang, T., Xu, K., Dong, Y., Du, C., Chen, N., Zhu, J.: Rethinking softmax cross-entropy loss for adversarial robustness. In: ICLR (2020)
Google Scholar
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 400–407 (1951)
Google Scholar
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: NeurIPS (2016)
Google Scholar
Santurkar, S., Ilyas, A., Tsipras, D., Engstrom, L., Tran, B., Madry, A.: Image synthesis with a single (robust) classifier. In: Advances in Neural Information Processing Systems (2019)
Google Scholar
Szegedy, C., et al.: Intriguing properties of neural networks. In: International Conference on Learning Representations (ICLR) (2014)
Google Scholar
Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., Madry, A.: Robustness may be at odds with accuracy. arXiv preprint arXiv:1805.12152 (2018)
Vahdat, A., Kautz, J.: NVAE: a deep hierarchical variational autoencoder. In: Neural Information Processing Systems (NeurIPS) (2020)
Google Scholar
Wan, W., Zhong, Y., Li, T., Chen, J.: Rethinking feature distribution for loss functions in image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9117–9126 (2018)
Google Scholar
Welling, M., Teh, Y.W.: Bayesian learning via stochastic gradient langevin dynamics. In: ICML, pp. 681–688 (2011)
Google Scholar
Xie, J., Lu, Y., Zhu, S.C., Wu, Y.: A theory of generative convnet. In: International Conference on Machine Learning, pp. 2635–2644 (2016)
Google Scholar
Zagoruyko, S., Komodakis, N.: Wide residual networks. In: The British Machine Vision Conference (BMVC) (2016)
Google Scholar
Zhao, S., Jacobsen, J.H., Grathwohl, W.: Joint energy-based models for semi-supervised classification. In: ICML 2020 Workshop on Uncertainty and Robustness in Deep Learning (2020)
Google Scholar

Download references

Acknowledgment

We would like to thank the anonymous reviewers for their comments and suggestions, which helped improve the quality of this paper. We would also gratefully acknowledge the support of VMware Inc. for its university research fund to this research.

Author information

Authors and Affiliations

Department of Computer Science, Georgia State University, Atlanta, Georgia
Xiulong Yang, Hui Ye, Yang Ye, Xiang Li & Shihao Ji

Authors

Xiulong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hui Ye
View author publications
You can also search for this author in PubMed Google Scholar
Yang Ye
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Shihao Ji
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shihao Ji .

Editor information

Editors and Affiliations

ELLIS - The European Laboratory for Learning and Intelligent Systems, Alicante, Spain
Nuria Oliver
ETHZ and EPFL, Zürich, Switzerland
Fernando Pérez-Cruz
Johannes Gutenberg University of Mainz, Mainz, Germany
Stefan Kramer
École Polytechnique, Palaiseau, France
Jesse Read
Basque Center for Applied Mathematics, Bilbao, Spain
Jose A. Lozano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, X., Ye, H., Ye, Y., Li, X., Ji, S. (2021). Generative Max-Mahalanobis Classifiers for Image Classification, Generation and More. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Research Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12976. Springer, Cham. https://doi.org/10.1007/978-3-030-86520-7_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-86520-7_5
Published: 10 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86519-1
Online ISBN: 978-3-030-86520-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Generative Max-Mahalanobis Classifiers for Image Classification, Generation and More