Abstract
This chapter gives a general introduction to three families of probabilistic models and their connections. Most of the models studied in the previous chapters, as well as most of the models in the current machine learning and deep learning literature, belong to these three families of models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In International Conference on Machine Learning (pp. 1–8). ACM.
Alain, G., & Bengio, Y. (2014). What regularized auto-encoders learn from the data-generating distribution. The Journal of Machine Learning Research, 15(1), 3563–3593.
Amit, D. J. (1989). Modeling brain function: The world of attractor neural networks. In Modeling Brain Function.
Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein gan. arXiv preprint arXiv:1701.07875.
Barrett, D. G., & Dherin, B. (2020). Implicit gradient regularization. arXiv preprint arXiv:2009.11162.
Behrmann, J., Grathwohl, W., Chen, R. T., Duvenaud, D., & Jacobsen, J.-H. (2018). Invertible residual networks. arXiv preprint arXiv:1811.00995.
Bengio, Y., Yao, L., Alain, G., & Vincent, P. (2013). Generalized denoising auto-encoders as generative models. In Advances in neural information processing systems, pp. 899–907.
Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational inference: a review for statisticians. Journal of the American Statistical Association, 112(518), 859–877.
Dai, J., Lu, Y., & Wu, Y. N. (2014). Generative modeling of convolutional neural networks. arXiv preprint arXiv:1412.6296.
Dai, Z., Almahairi, A., Bachman, P., Hovy, E., & Courville, A. (2017). Calibrating energy-based generative adversarial networks. In International Conference on Learning Representations.
Dinh, L., Krueger, D., & Bengio, Y. (2014). NICE: Non-linear independent components estimation. arXiv preprint arXiv:1410.8516.
Dinh, L., Sohl-Dickstein, J., & Bengio, S. (2017). Density estimation using real NVP. In International Conference on Learning Representations, abs/1605.08803.
Earl, D. J., & Deem, M. W. (2005). Parallel tempering: Theory, applications, and new perspectives. Physical Chemistry Chemical Physics, 7(23), 3910–3916.
Finn, C., Christiano, P., Abbeel, P., & Levine, S. (2016). A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models. arXiv preprint arXiv:1611.03852.
Gao, R., Lu, Y., Zhou, J., Zhu, S.-C., & Wu, Y. N. (2018). Learning generative ConvNets via multi-grid modeling and sampling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 9155–9164).
Gao, R., Nijkamp, E., Kingma, D. P., Xu, Z., Dai, A. M., & Wu, Y. N. (2020a). Flow contrastive estimation of energy-based models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7518–7528).
Gao, R., Song, Y., Poole, B., Wu, Y. N., & Kingma, D. P. (2020b). Learning energy-based models by diffusion recovery likelihood. arXiv preprint arXiv:2012.08125.
Geyer, C. J., & Thompson, E. A. (1995). Annealing markov chain monte carlo with applications to ancestral inference. Journal of the American Statistical Association, 90(431), 909–920.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Advances in Neural Information Processing Systems (pp. 2672–2680).
Grathwohl, W., Chen, R. T., Betterncourt, J., Sutskever, I., & Duvenaud, D. (2019). FFJORD: Free-form continuous dynamics for scalable reversible generative models. In International Conference on Learning Representations.
Gutmann, M., & Hyvärinen, A. (2010). Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 297–304). JMLR Workshop and Conference Proceedings.
Han, T., Nijkamp, E., Fang, X., Hill, M., Zhu, S.-C., & Wu, Y. N. (2018). Divergence triangle for joint training of generator model, energy-based model, and inference model. arXiv preprint arXiv:1812.10907, pp. 8670–8679.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770–778).
Hill, M., Nijkamp, E., & Zhu, S.-C. (2019). Building a telescope to look into high-dimensional image spaces. Quarterly of Applied Mathematics, 77(2), 269–321.
Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8), 1771–1800.
Hinton, G. E., Dayan, P., Frey, B. J., & Neal, R. M. (1995). The wake-sleep algorithm for unsupervised neural networks. Science, 268(5214), 1158–1161.
Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. arXiv preprint arXiv:2006.11239.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Hoffman, M., Sountsov, P., Dillon, J. V., Langmore, I., Tran, D., & Vasudevan, S. (2019). Neutra-lizing bad geometry in hamiltonian monte carlo using neural transport. arXiv preprint arXiv:1903.03704.
Hoffman, M. D., Gelman, A., et al. (2014). The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15(1), 1593–1623.
Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79(8), 2554–2558.
Hyvärinen, A. (2005). Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(Apr), 695–709.
Hyvarinen, A. (2007). Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. IEEE Transactions on Neural Networks, 18(5), 1529–1531.
Jin, L., Lazarow, J., & Tu, Z. (2017). Introspective classification with convolutional nets. In Advances in Neural Information Processing Systems (pp. 823–833).
Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K. (1999). An introduction to variational methods for graphical models. Machine Learning, 37(2), 183–233.
Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of gans for improved quality, stability, and variation. In International Conference on Learning Representations.
Kim, T., & Bengio, Y. (2016). Deep directed generative models with energy-based probability estimation. In ICLR Workshop.
Kingma, D., & Welling, M. (2014a). Efficient gradient-based inference through transformations between bayes nets and neural nets. In International Conference on Machine Learning (pp. 1782–1790).
Kingma, D. P., & Dhariwal, P. (2018). Glow: Generative flow with invertible 1 × 1 convolutions. In Advances in Neural Information Processing Systems (pp. 10215–10224).
Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., & Welling, M. (2016). Improved variational inference with inverse autoregressive flow. In Advances in Neural Information Processing Systems (pp. 4743–4751).
Kingma, D. P., & Welling, M. (2014b). Auto-encoding variational bayes. In International Conference for Learning Representations.
Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations.
Kirkpatrick, S., Gelatt Jr, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220(4598), 671–680.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (pp. 1097–1105).
Kumar, M., Babaeizadeh, M., Erhan, D., Finn, C., Levine, S., Dinh, L., & Kingma, D. (2019). Videoflow: A flow-based generative model for video. arXiv preprint arXiv:1903.01434.
Lafferty, J., McCallum, A., & Pereira, F. C. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In International Conference on Machine Learning (pp. 282–289).
Lazarow, J., Jin, L., & Tu, Z. (2017). Introspective neural networks for generative modeling. In IEEE International Conference on Computer Vision (pp. 2774–2783).
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Lee, K., Xu, W., Fan, F., & Tu, Z. (2018). Wasserstein introspective neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 3702–3711).
Lu, Y., Zhu, S.-C., & Wu, Y. N. (2016). Learning FRAME models using CNN filters. Thirtieth AAAI Conference on Artificial Intelligence.
Marinari, E., & Parisi, G. (1992). Simulated tempering: a new monte carlo scheme. EPL (Europhysics Letters), 19(6), 451.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In ICLR Workshop.
Neal, R. M. et al. (2011). MCMC using hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, 2(11), 2.
Ngiam, J., Chen, Z., Koh, P. W., & Ng, A. Y. (2011). Learning deep energy models (pp. 1105–1112).
Nijkamp, E., Gao, R., Sountsov, P., Vasudevan, S., Pang, B., Zhu, S.-C., & Wu, Y. N. (2021). MCMC should mix: Learning energy-based model with neural transport latent space MCMC. In International Conference on Learning Representations.
Nijkamp, E., Hill, M., Han, T., Zhu, S.-C., & Wu, Y. N. (2019a). On the anatomy of MCMC-based maximum likelihood learning of energy-based models. arXiv.
Nijkamp, E., Pang, B., Han, T., Zhou, L., Zhu, S.-C., & Wu, Y. N. (2020). Learning multi-layer latent variable model via variational optimization of short run mcmc for approximate inference. In European Conference on Computer Vision (pp. 361–378). Springer.
Nijkamp, E., Zhu, S.-C., & Wu, Y. N. (2019b). Learning non-convergent short-run MCMC toward energy-based model. In NeurIPS.
Olshausen, B. A., & Field, D. J. (1996b). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607.
Pang, B., Han, T., Nijkamp, E., Zhu, S.-C., & Wu, Y. N. (2020). Learning latent space energy-based prior model. arXiv preprint arXiv:2006.08205.
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Conference on Empirical Methods in Natural Language Processing (pp. 1532–1543).
Poucet, B., & Save, E. (2005). Attractors in memory. Science, 308(5723), 799–800.
Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.
Rezende, D. J., & Mohamed, S. (2015). Variational inference with normalizing flows. In International Conference on Machine Learning.
Rhodes, B., Xu, K., & Gutmann, M. U. (2020). Telescoping density-ratio estimation. Advances in Neural Information Processing Systems, 33, 4905–4916.
Richardson, M., & Domingos, P. (2006). Markov logic networks. Machine Learning, 62(1), 107–136.
Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, pp. 400–407.
Smith, S. L., Dherin, B., Barrett, D. G., & De, S. (2021). On the origin of implicit regularization in stochastic gradient descent. arXiv preprint arXiv:2101.12176.
Sohl-Dickstein, J., Weiss, E. A., Maheswaranathan, N., & Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. arXiv preprint arXiv:1503.03585.
Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., & Poole, B. (2020). Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456.
Sugita, Y., & Okamoto, Y. (1999). Replica-exchange molecular dynamics method for protein folding. Chemical Physics Letters, 314(1–2), 141–151.
Swersky, K., Ranzato, M., Buchman, D., Marlin, B., & Freitas, N. (2011). On autoencoders and score matching for energy based models. In Getoor, L., & Scheffer, T. (Eds.) Proceedings of the 28th international conference on machine learning (ICML-11) (pp. 1201–1208). ACM.
Tran, D., Vafa, K., Agrawal, K. K., Dinh, L., & Poole, B. (2019). Discrete flows: Invertible generative models of discrete data. arXiv preprint arXiv:1905.10347.
Tu, Z. (2007). Learning generative models via discriminative approaches. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 1–8). IEEE.
Tyleček, R., & Šára, R. (2013). Spatial pattern templates for recognition of objects with regular structure. In German Conference on Pattern Recognition (pp. 364–374). Springer.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (pp. 5998–6008).
Vincent, P. (2010). A connection between score matching and denoising autoencoders. Neural Computation, 23(7), 1661–1674.
Wu, Y. N., Gao, R., Han, T., & Zhu, S.-C. (2019). A tale of three probabilistic families: Discriminative, descriptive and generative models. Quarterly of Applied Mathematics, 77(2), 423–465.
Xie, J., Lu, Y., Gao, R., & Wu, Y. N. (2018a). Cooperative learning of energy-based model and latent variable model via MCMC teaching. In The AAAI Conference on Artificial Intelligence.
Xie, J., Lu, Y., Gao, R., Zhu, S.-C., & Wu, Y. N. (2018b). Cooperative training of descriptor and generator networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, (preprints).
Xie, J., Lu, Y., Gao, R., Zhu, S.-C., & Wu, Y. N. (2018c). Cooperative training of descriptor and generator networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(1), 27–45.
Xie, J., Lu, Y., Zhu, S.-C., & Wu, Y. N. (2016b). A theory of generative ConvNet. In International Conference on Machine Learning (pp. 2635–2644).
Xie, J., Zheng, Z., Fang, X., Zhu, S.-C., & Wu, Y. N. (2019b). Multimodal conditional learning with fast thinking policy-like model and slow thinking planner-like model. arXiv preprint arXiv:1902.02812.
Xie, J., Zheng, Z., Fang, X., Zhu, S.-C., & Wu, Y. N. (2021). Cooperative training of fast thinking initializer and slow thinking solver for conditional learning. In IEEE Transactions on Pattern Analysis and Machine Intelligence.
Xie, J., Zheng, Z., Gao, R., Wang, W., Zhu, S.-C., & Wu, Y. N. (2018d). Learning descriptor networks for 3D shape synthesis and analysis (pp. 8629–8638).
Xie, J., Zhu, S.-C., & Nian Wu, Y. (2017). Synthesizing dynamic patterns by spatial-temporal generative ConvNet (pp. 7093–7101).
Ziebart, B. D., Maas, A. L., Bagnell, J. A., & Dey, A. K. (2008). Maximum entropy inverse reinforcement learning. In The AAAI Conference on Artificial Intelligence.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Zhu, SC., Wu, Y.N. (2023). A Tale of Three Families: Discriminative, Descriptive, and Generative Models. In: Computer Vision. Springer, Cham. https://doi.org/10.1007/978-3-030-96530-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-96530-3_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96529-7
Online ISBN: 978-3-030-96530-3
eBook Packages: Computer ScienceComputer Science (R0)