A Tale of Three Families: Discriminative, Descriptive, and Generative Models

Zhu, Song-Chun; Wu, Ying Nian

doi:10.1007/978-3-030-96530-3_12

Song-Chun Zhu³ &
Ying Nian Wu⁴

478 Accesses

Abstract

This chapter gives a general introduction to three families of probabilistic models and their connections. Most of the models studied in the previous chapters, as well as most of the models in the current machine learning and deep learning literature, belong to these three families of models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Hardcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In International Conference on Machine Learning (pp. 1–8). ACM.
Google Scholar
Alain, G., & Bengio, Y. (2014). What regularized auto-encoders learn from the data-generating distribution. The Journal of Machine Learning Research, 15(1), 3563–3593.
MathSciNet MATH Google Scholar
Amit, D. J. (1989). Modeling brain function: The world of attractor neural networks. In Modeling Brain Function.
Google Scholar
Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein gan. arXiv preprint arXiv:1701.07875.
Google Scholar
Barrett, D. G., & Dherin, B. (2020). Implicit gradient regularization. arXiv preprint arXiv:2009.11162.
Google Scholar
Behrmann, J., Grathwohl, W., Chen, R. T., Duvenaud, D., & Jacobsen, J.-H. (2018). Invertible residual networks. arXiv preprint arXiv:1811.00995.
Google Scholar
Bengio, Y., Yao, L., Alain, G., & Vincent, P. (2013). Generalized denoising auto-encoders as generative models. In Advances in neural information processing systems, pp. 899–907.
Google Scholar
Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational inference: a review for statisticians. Journal of the American Statistical Association, 112(518), 859–877.
Article MathSciNet Google Scholar
Dai, J., Lu, Y., & Wu, Y. N. (2014). Generative modeling of convolutional neural networks. arXiv preprint arXiv:1412.6296.
Google Scholar
Dai, Z., Almahairi, A., Bachman, P., Hovy, E., & Courville, A. (2017). Calibrating energy-based generative adversarial networks. In International Conference on Learning Representations.
Google Scholar
Dinh, L., Krueger, D., & Bengio, Y. (2014). NICE: Non-linear independent components estimation. arXiv preprint arXiv:1410.8516.
Google Scholar
Dinh, L., Sohl-Dickstein, J., & Bengio, S. (2017). Density estimation using real NVP. In International Conference on Learning Representations, abs/1605.08803.
Google Scholar
Earl, D. J., & Deem, M. W. (2005). Parallel tempering: Theory, applications, and new perspectives. Physical Chemistry Chemical Physics, 7(23), 3910–3916.
Article Google Scholar
Finn, C., Christiano, P., Abbeel, P., & Levine, S. (2016). A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models. arXiv preprint arXiv:1611.03852.
Google Scholar
Gao, R., Lu, Y., Zhou, J., Zhu, S.-C., & Wu, Y. N. (2018). Learning generative ConvNets via multi-grid modeling and sampling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 9155–9164).
Google Scholar
Gao, R., Nijkamp, E., Kingma, D. P., Xu, Z., Dai, A. M., & Wu, Y. N. (2020a). Flow contrastive estimation of energy-based models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7518–7528).
Google Scholar
Gao, R., Song, Y., Poole, B., Wu, Y. N., & Kingma, D. P. (2020b). Learning energy-based models by diffusion recovery likelihood. arXiv preprint arXiv:2012.08125.
Google Scholar
Geyer, C. J., & Thompson, E. A. (1995). Annealing markov chain monte carlo with applications to ancestral inference. Journal of the American Statistical Association, 90(431), 909–920.
Article MATH Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Advances in Neural Information Processing Systems (pp. 2672–2680).
Google Scholar
Grathwohl, W., Chen, R. T., Betterncourt, J., Sutskever, I., & Duvenaud, D. (2019). FFJORD: Free-form continuous dynamics for scalable reversible generative models. In International Conference on Learning Representations.
Google Scholar
Gutmann, M., & Hyvärinen, A. (2010). Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 297–304). JMLR Workshop and Conference Proceedings.
Google Scholar
Han, T., Nijkamp, E., Fang, X., Hill, M., Zhu, S.-C., & Wu, Y. N. (2018). Divergence triangle for joint training of generator model, energy-based model, and inference model. arXiv preprint arXiv:1812.10907, pp. 8670–8679.
Google Scholar
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770–778).
Google Scholar
Hill, M., Nijkamp, E., & Zhu, S.-C. (2019). Building a telescope to look into high-dimensional image spaces. Quarterly of Applied Mathematics, 77(2), 269–321.
Article MathSciNet MATH Google Scholar
Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8), 1771–1800.
Article MATH Google Scholar
Hinton, G. E., Dayan, P., Frey, B. J., & Neal, R. M. (1995). The wake-sleep algorithm for unsupervised neural networks. Science, 268(5214), 1158–1161.
Article Google Scholar
Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. arXiv preprint arXiv:2006.11239.
Google Scholar
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Article Google Scholar
Hoffman, M., Sountsov, P., Dillon, J. V., Langmore, I., Tran, D., & Vasudevan, S. (2019). Neutra-lizing bad geometry in hamiltonian monte carlo using neural transport. arXiv preprint arXiv:1903.03704.
Google Scholar
Hoffman, M. D., Gelman, A., et al. (2014). The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15(1), 1593–1623.
MathSciNet MATH Google Scholar
Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79(8), 2554–2558.
Article MathSciNet MATH Google Scholar
Hyvärinen, A. (2005). Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(Apr), 695–709.
MathSciNet MATH Google Scholar
Hyvarinen, A. (2007). Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. IEEE Transactions on Neural Networks, 18(5), 1529–1531.
Article MathSciNet Google Scholar
Jin, L., Lazarow, J., & Tu, Z. (2017). Introspective classification with convolutional nets. In Advances in Neural Information Processing Systems (pp. 823–833).
Google Scholar
Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K. (1999). An introduction to variational methods for graphical models. Machine Learning, 37(2), 183–233.
Article MATH Google Scholar
Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of gans for improved quality, stability, and variation. In International Conference on Learning Representations.
Google Scholar
Kim, T., & Bengio, Y. (2016). Deep directed generative models with energy-based probability estimation. In ICLR Workshop.
Google Scholar
Kingma, D., & Welling, M. (2014a). Efficient gradient-based inference through transformations between bayes nets and neural nets. In International Conference on Machine Learning (pp. 1782–1790).
Google Scholar
Kingma, D. P., & Dhariwal, P. (2018). Glow: Generative flow with invertible 1 × 1 convolutions. In Advances in Neural Information Processing Systems (pp. 10215–10224).
Google Scholar
Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., & Welling, M. (2016). Improved variational inference with inverse autoregressive flow. In Advances in Neural Information Processing Systems (pp. 4743–4751).
Google Scholar
Kingma, D. P., & Welling, M. (2014b). Auto-encoding variational bayes. In International Conference for Learning Representations.
Google Scholar
Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations.
Google Scholar
Kirkpatrick, S., Gelatt Jr, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220(4598), 671–680.
Article MathSciNet MATH Google Scholar
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (pp. 1097–1105).
Google Scholar
Kumar, M., Babaeizadeh, M., Erhan, D., Finn, C., Levine, S., Dinh, L., & Kingma, D. (2019). Videoflow: A flow-based generative model for video. arXiv preprint arXiv:1903.01434.
Google Scholar
Lafferty, J., McCallum, A., & Pereira, F. C. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In International Conference on Machine Learning (pp. 282–289).
Google Scholar
Lazarow, J., Jin, L., & Tu, Z. (2017). Introspective neural networks for generative modeling. In IEEE International Conference on Computer Vision (pp. 2774–2783).
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Article Google Scholar
Lee, K., Xu, W., Fan, F., & Tu, Z. (2018). Wasserstein introspective neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 3702–3711).
Google Scholar
Lu, Y., Zhu, S.-C., & Wu, Y. N. (2016). Learning FRAME models using CNN filters. Thirtieth AAAI Conference on Artificial Intelligence.
Google Scholar
Marinari, E., & Parisi, G. (1992). Simulated tempering: a new monte carlo scheme. EPL (Europhysics Letters), 19(6), 451.
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In ICLR Workshop.
Google Scholar
Neal, R. M. et al. (2011). MCMC using hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, 2(11), 2.
MATH Google Scholar
Ngiam, J., Chen, Z., Koh, P. W., & Ng, A. Y. (2011). Learning deep energy models (pp. 1105–1112).
Google Scholar
Nijkamp, E., Gao, R., Sountsov, P., Vasudevan, S., Pang, B., Zhu, S.-C., & Wu, Y. N. (2021). MCMC should mix: Learning energy-based model with neural transport latent space MCMC. In International Conference on Learning Representations.
Google Scholar
Nijkamp, E., Hill, M., Han, T., Zhu, S.-C., & Wu, Y. N. (2019a). On the anatomy of MCMC-based maximum likelihood learning of energy-based models. arXiv.
Google Scholar
Nijkamp, E., Pang, B., Han, T., Zhou, L., Zhu, S.-C., & Wu, Y. N. (2020). Learning multi-layer latent variable model via variational optimization of short run mcmc for approximate inference. In European Conference on Computer Vision (pp. 361–378). Springer.
Google Scholar
Nijkamp, E., Zhu, S.-C., & Wu, Y. N. (2019b). Learning non-convergent short-run MCMC toward energy-based model. In NeurIPS.
Google Scholar
Olshausen, B. A., & Field, D. J. (1996b). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607.
Article Google Scholar
Pang, B., Han, T., Nijkamp, E., Zhu, S.-C., & Wu, Y. N. (2020). Learning latent space energy-based prior model. arXiv preprint arXiv:2006.08205.
Google Scholar
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Conference on Empirical Methods in Natural Language Processing (pp. 1532–1543).
Google Scholar
Poucet, B., & Save, E. (2005). Attractors in memory. Science, 308(5723), 799–800.
Article Google Scholar
Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.
Google Scholar
Rezende, D. J., & Mohamed, S. (2015). Variational inference with normalizing flows. In International Conference on Machine Learning.
Google Scholar
Rhodes, B., Xu, K., & Gutmann, M. U. (2020). Telescoping density-ratio estimation. Advances in Neural Information Processing Systems, 33, 4905–4916.
Google Scholar
Richardson, M., & Domingos, P. (2006). Markov logic networks. Machine Learning, 62(1), 107–136.
Article MATH Google Scholar
Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, pp. 400–407.
Google Scholar
Smith, S. L., Dherin, B., Barrett, D. G., & De, S. (2021). On the origin of implicit regularization in stochastic gradient descent. arXiv preprint arXiv:2101.12176.
Google Scholar
Sohl-Dickstein, J., Weiss, E. A., Maheswaranathan, N., & Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. arXiv preprint arXiv:1503.03585.
Google Scholar
Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., & Poole, B. (2020). Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456.
Google Scholar
Sugita, Y., & Okamoto, Y. (1999). Replica-exchange molecular dynamics method for protein folding. Chemical Physics Letters, 314(1–2), 141–151.
Article Google Scholar
Swersky, K., Ranzato, M., Buchman, D., Marlin, B., & Freitas, N. (2011). On autoencoders and score matching for energy based models. In Getoor, L., & Scheffer, T. (Eds.) Proceedings of the 28th international conference on machine learning (ICML-11) (pp. 1201–1208). ACM.
Google Scholar
Tran, D., Vafa, K., Agrawal, K. K., Dinh, L., & Poole, B. (2019). Discrete flows: Invertible generative models of discrete data. arXiv preprint arXiv:1905.10347.
Google Scholar
Tu, Z. (2007). Learning generative models via discriminative approaches. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 1–8). IEEE.
Google Scholar
Tyleček, R., & Šára, R. (2013). Spatial pattern templates for recognition of objects with regular structure. In German Conference on Pattern Recognition (pp. 364–374). Springer.
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (pp. 5998–6008).
Google Scholar
Vincent, P. (2010). A connection between score matching and denoising autoencoders. Neural Computation, 23(7), 1661–1674.
Article MathSciNet MATH Google Scholar
Wu, Y. N., Gao, R., Han, T., & Zhu, S.-C. (2019). A tale of three probabilistic families: Discriminative, descriptive and generative models. Quarterly of Applied Mathematics, 77(2), 423–465.
Article MathSciNet MATH Google Scholar
Xie, J., Lu, Y., Gao, R., & Wu, Y. N. (2018a). Cooperative learning of energy-based model and latent variable model via MCMC teaching. In The AAAI Conference on Artificial Intelligence.
Google Scholar
Xie, J., Lu, Y., Gao, R., Zhu, S.-C., & Wu, Y. N. (2018b). Cooperative training of descriptor and generator networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, (preprints).
Google Scholar
Xie, J., Lu, Y., Gao, R., Zhu, S.-C., & Wu, Y. N. (2018c). Cooperative training of descriptor and generator networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(1), 27–45.
Article Google Scholar
Xie, J., Lu, Y., Zhu, S.-C., & Wu, Y. N. (2016b). A theory of generative ConvNet. In International Conference on Machine Learning (pp. 2635–2644).
Google Scholar
Xie, J., Zheng, Z., Fang, X., Zhu, S.-C., & Wu, Y. N. (2019b). Multimodal conditional learning with fast thinking policy-like model and slow thinking planner-like model. arXiv preprint arXiv:1902.02812.
Google Scholar
Xie, J., Zheng, Z., Fang, X., Zhu, S.-C., & Wu, Y. N. (2021). Cooperative training of fast thinking initializer and slow thinking solver for conditional learning. In IEEE Transactions on Pattern Analysis and Machine Intelligence.
Google Scholar
Xie, J., Zheng, Z., Gao, R., Wang, W., Zhu, S.-C., & Wu, Y. N. (2018d). Learning descriptor networks for 3D shape synthesis and analysis (pp. 8629–8638).
Google Scholar
Xie, J., Zhu, S.-C., & Nian Wu, Y. (2017). Synthesizing dynamic patterns by spatial-temporal generative ConvNet (pp. 7093–7101).
Google Scholar
Ziebart, B. D., Maas, A. L., Bagnell, J. A., & Dey, A. K. (2008). Maximum entropy inverse reinforcement learning. In The AAAI Conference on Artificial Intelligence.
Google Scholar

Download references

Author information

Authors and Affiliations

Beijing Institute for General Artificial Intelligence, Peking and Tsinghua Universities jointly, Beijing, China
Song-Chun Zhu
Department of Statistics, University of California, Los Angeles, Los Angeles, CA, USA
Ying Nian Wu

Authors

Song-Chun Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Ying Nian Wu
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhu, SC., Wu, Y.N. (2023). A Tale of Three Families: Discriminative, Descriptive, and Generative Models. In: Computer Vision. Springer, Cham. https://doi.org/10.1007/978-3-030-96530-3_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-96530-3_12
Published: 15 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96529-7
Online ISBN: 978-3-030-96530-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics