Skip to main content

A Tale of Three Families: Discriminative, Descriptive, and Generative Models

  • Chapter
  • First Online:
Computer Vision
  • 478 Accesses

Abstract

This chapter gives a general introduction to three families of probabilistic models and their connections. Most of the models studied in the previous chapters, as well as most of the models in the current machine learning and deep learning literature, belong to these three families of models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 84.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abbeel, P., & Ng, A. Y. (2004). Apprenticeship learning via inverse reinforcement learning. In International Conference on Machine Learning (pp. 1–8). ACM.

    Google Scholar 

  2. Alain, G., & Bengio, Y. (2014). What regularized auto-encoders learn from the data-generating distribution. The Journal of Machine Learning Research, 15(1), 3563–3593.

    MathSciNet  MATH  Google Scholar 

  3. Amit, D. J. (1989). Modeling brain function: The world of attractor neural networks. In Modeling Brain Function.

    Google Scholar 

  4. Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein gan. arXiv preprint arXiv:1701.07875.

    Google Scholar 

  5. Barrett, D. G., & Dherin, B. (2020). Implicit gradient regularization. arXiv preprint arXiv:2009.11162.

    Google Scholar 

  6. Behrmann, J., Grathwohl, W., Chen, R. T., Duvenaud, D., & Jacobsen, J.-H. (2018). Invertible residual networks. arXiv preprint arXiv:1811.00995.

    Google Scholar 

  7. Bengio, Y., Yao, L., Alain, G., & Vincent, P. (2013). Generalized denoising auto-encoders as generative models. In Advances in neural information processing systems, pp. 899–907.

    Google Scholar 

  8. Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational inference: a review for statisticians. Journal of the American Statistical Association, 112(518), 859–877.

    Article  MathSciNet  Google Scholar 

  9. Dai, J., Lu, Y., & Wu, Y. N. (2014). Generative modeling of convolutional neural networks. arXiv preprint arXiv:1412.6296.

    Google Scholar 

  10. Dai, Z., Almahairi, A., Bachman, P., Hovy, E., & Courville, A. (2017). Calibrating energy-based generative adversarial networks. In International Conference on Learning Representations.

    Google Scholar 

  11. Dinh, L., Krueger, D., & Bengio, Y. (2014). NICE: Non-linear independent components estimation. arXiv preprint arXiv:1410.8516.

    Google Scholar 

  12. Dinh, L., Sohl-Dickstein, J., & Bengio, S. (2017). Density estimation using real NVP. In International Conference on Learning Representations, abs/1605.08803.

    Google Scholar 

  13. Earl, D. J., & Deem, M. W. (2005). Parallel tempering: Theory, applications, and new perspectives. Physical Chemistry Chemical Physics, 7(23), 3910–3916.

    Article  Google Scholar 

  14. Finn, C., Christiano, P., Abbeel, P., & Levine, S. (2016). A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models. arXiv preprint arXiv:1611.03852.

    Google Scholar 

  15. Gao, R., Lu, Y., Zhou, J., Zhu, S.-C., & Wu, Y. N. (2018). Learning generative ConvNets via multi-grid modeling and sampling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 9155–9164).

    Google Scholar 

  16. Gao, R., Nijkamp, E., Kingma, D. P., Xu, Z., Dai, A. M., & Wu, Y. N. (2020a). Flow contrastive estimation of energy-based models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7518–7528).

    Google Scholar 

  17. Gao, R., Song, Y., Poole, B., Wu, Y. N., & Kingma, D. P. (2020b). Learning energy-based models by diffusion recovery likelihood. arXiv preprint arXiv:2012.08125.

    Google Scholar 

  18. Geyer, C. J., & Thompson, E. A. (1995). Annealing markov chain monte carlo with applications to ancestral inference. Journal of the American Statistical Association, 90(431), 909–920.

    Article  MATH  Google Scholar 

  19. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Advances in Neural Information Processing Systems (pp. 2672–2680).

    Google Scholar 

  20. Grathwohl, W., Chen, R. T., Betterncourt, J., Sutskever, I., & Duvenaud, D. (2019). FFJORD: Free-form continuous dynamics for scalable reversible generative models. In International Conference on Learning Representations.

    Google Scholar 

  21. Gutmann, M., & Hyvärinen, A. (2010). Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 297–304). JMLR Workshop and Conference Proceedings.

    Google Scholar 

  22. Han, T., Nijkamp, E., Fang, X., Hill, M., Zhu, S.-C., & Wu, Y. N. (2018). Divergence triangle for joint training of generator model, energy-based model, and inference model. arXiv preprint arXiv:1812.10907, pp. 8670–8679.

    Google Scholar 

  23. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770–778).

    Google Scholar 

  24. Hill, M., Nijkamp, E., & Zhu, S.-C. (2019). Building a telescope to look into high-dimensional image spaces. Quarterly of Applied Mathematics, 77(2), 269–321.

    Article  MathSciNet  MATH  Google Scholar 

  25. Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8), 1771–1800.

    Article  MATH  Google Scholar 

  26. Hinton, G. E., Dayan, P., Frey, B. J., & Neal, R. M. (1995). The wake-sleep algorithm for unsupervised neural networks. Science, 268(5214), 1158–1161.

    Article  Google Scholar 

  27. Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models. arXiv preprint arXiv:2006.11239.

    Google Scholar 

  28. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.

    Article  Google Scholar 

  29. Hoffman, M., Sountsov, P., Dillon, J. V., Langmore, I., Tran, D., & Vasudevan, S. (2019). Neutra-lizing bad geometry in hamiltonian monte carlo using neural transport. arXiv preprint arXiv:1903.03704.

    Google Scholar 

  30. Hoffman, M. D., Gelman, A., et al. (2014). The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15(1), 1593–1623.

    MathSciNet  MATH  Google Scholar 

  31. Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79(8), 2554–2558.

    Article  MathSciNet  MATH  Google Scholar 

  32. Hyvärinen, A. (2005). Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(Apr), 695–709.

    MathSciNet  MATH  Google Scholar 

  33. Hyvarinen, A. (2007). Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. IEEE Transactions on Neural Networks, 18(5), 1529–1531.

    Article  MathSciNet  Google Scholar 

  34. Jin, L., Lazarow, J., & Tu, Z. (2017). Introspective classification with convolutional nets. In Advances in Neural Information Processing Systems (pp. 823–833).

    Google Scholar 

  35. Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K. (1999). An introduction to variational methods for graphical models. Machine Learning, 37(2), 183–233.

    Article  MATH  Google Scholar 

  36. Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of gans for improved quality, stability, and variation. In International Conference on Learning Representations.

    Google Scholar 

  37. Kim, T., & Bengio, Y. (2016). Deep directed generative models with energy-based probability estimation. In ICLR Workshop.

    Google Scholar 

  38. Kingma, D., & Welling, M. (2014a). Efficient gradient-based inference through transformations between bayes nets and neural nets. In International Conference on Machine Learning (pp. 1782–1790).

    Google Scholar 

  39. Kingma, D. P., & Dhariwal, P. (2018). Glow: Generative flow with invertible 1 × 1 convolutions. In Advances in Neural Information Processing Systems (pp. 10215–10224).

    Google Scholar 

  40. Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., & Welling, M. (2016). Improved variational inference with inverse autoregressive flow. In Advances in Neural Information Processing Systems (pp. 4743–4751).

    Google Scholar 

  41. Kingma, D. P., & Welling, M. (2014b). Auto-encoding variational bayes. In International Conference for Learning Representations.

    Google Scholar 

  42. Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations.

    Google Scholar 

  43. Kirkpatrick, S., Gelatt Jr, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220(4598), 671–680.

    Article  MathSciNet  MATH  Google Scholar 

  44. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (pp. 1097–1105).

    Google Scholar 

  45. Kumar, M., Babaeizadeh, M., Erhan, D., Finn, C., Levine, S., Dinh, L., & Kingma, D. (2019). Videoflow: A flow-based generative model for video. arXiv preprint arXiv:1903.01434.

    Google Scholar 

  46. Lafferty, J., McCallum, A., & Pereira, F. C. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In International Conference on Machine Learning (pp. 282–289).

    Google Scholar 

  47. Lazarow, J., Jin, L., & Tu, Z. (2017). Introspective neural networks for generative modeling. In IEEE International Conference on Computer Vision (pp. 2774–2783).

    Google Scholar 

  48. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.

    Article  Google Scholar 

  49. Lee, K., Xu, W., Fan, F., & Tu, Z. (2018). Wasserstein introspective neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 3702–3711).

    Google Scholar 

  50. Lu, Y., Zhu, S.-C., & Wu, Y. N. (2016). Learning FRAME models using CNN filters. Thirtieth AAAI Conference on Artificial Intelligence.

    Google Scholar 

  51. Marinari, E., & Parisi, G. (1992). Simulated tempering: a new monte carlo scheme. EPL (Europhysics Letters), 19(6), 451.

    Article  Google Scholar 

  52. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. In ICLR Workshop.

    Google Scholar 

  53. Neal, R. M. et al. (2011). MCMC using hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, 2(11), 2.

    MATH  Google Scholar 

  54. Ngiam, J., Chen, Z., Koh, P. W., & Ng, A. Y. (2011). Learning deep energy models (pp. 1105–1112).

    Google Scholar 

  55. Nijkamp, E., Gao, R., Sountsov, P., Vasudevan, S., Pang, B., Zhu, S.-C., & Wu, Y. N. (2021). MCMC should mix: Learning energy-based model with neural transport latent space MCMC. In International Conference on Learning Representations.

    Google Scholar 

  56. Nijkamp, E., Hill, M., Han, T., Zhu, S.-C., & Wu, Y. N. (2019a). On the anatomy of MCMC-based maximum likelihood learning of energy-based models. arXiv.

    Google Scholar 

  57. Nijkamp, E., Pang, B., Han, T., Zhou, L., Zhu, S.-C., & Wu, Y. N. (2020). Learning multi-layer latent variable model via variational optimization of short run mcmc for approximate inference. In European Conference on Computer Vision (pp. 361–378). Springer.

    Google Scholar 

  58. Nijkamp, E., Zhu, S.-C., & Wu, Y. N. (2019b). Learning non-convergent short-run MCMC toward energy-based model. In NeurIPS.

    Google Scholar 

  59. Olshausen, B. A., & Field, D. J. (1996b). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607.

    Article  Google Scholar 

  60. Pang, B., Han, T., Nijkamp, E., Zhu, S.-C., & Wu, Y. N. (2020). Learning latent space energy-based prior model. arXiv preprint arXiv:2006.08205.

    Google Scholar 

  61. Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Conference on Empirical Methods in Natural Language Processing (pp. 1532–1543).

    Google Scholar 

  62. Poucet, B., & Save, E. (2005). Attractors in memory. Science, 308(5723), 799–800.

    Article  Google Scholar 

  63. Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.

    Google Scholar 

  64. Rezende, D. J., & Mohamed, S. (2015). Variational inference with normalizing flows. In International Conference on Machine Learning.

    Google Scholar 

  65. Rhodes, B., Xu, K., & Gutmann, M. U. (2020). Telescoping density-ratio estimation. Advances in Neural Information Processing Systems, 33, 4905–4916.

    Google Scholar 

  66. Richardson, M., & Domingos, P. (2006). Markov logic networks. Machine Learning, 62(1), 107–136.

    Article  MATH  Google Scholar 

  67. Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, pp. 400–407.

    Google Scholar 

  68. Smith, S. L., Dherin, B., Barrett, D. G., & De, S. (2021). On the origin of implicit regularization in stochastic gradient descent. arXiv preprint arXiv:2101.12176.

    Google Scholar 

  69. Sohl-Dickstein, J., Weiss, E. A., Maheswaranathan, N., & Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. arXiv preprint arXiv:1503.03585.

    Google Scholar 

  70. Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., & Poole, B. (2020). Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456.

    Google Scholar 

  71. Sugita, Y., & Okamoto, Y. (1999). Replica-exchange molecular dynamics method for protein folding. Chemical Physics Letters, 314(1–2), 141–151.

    Article  Google Scholar 

  72. Swersky, K., Ranzato, M., Buchman, D., Marlin, B., & Freitas, N. (2011). On autoencoders and score matching for energy based models. In Getoor, L., & Scheffer, T. (Eds.) Proceedings of the 28th international conference on machine learning (ICML-11) (pp. 1201–1208). ACM.

    Google Scholar 

  73. Tran, D., Vafa, K., Agrawal, K. K., Dinh, L., & Poole, B. (2019). Discrete flows: Invertible generative models of discrete data. arXiv preprint arXiv:1905.10347.

    Google Scholar 

  74. Tu, Z. (2007). Learning generative models via discriminative approaches. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 1–8). IEEE.

    Google Scholar 

  75. Tyleček, R., & Šára, R. (2013). Spatial pattern templates for recognition of objects with regular structure. In German Conference on Pattern Recognition (pp. 364–374). Springer.

    Google Scholar 

  76. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (pp. 5998–6008).

    Google Scholar 

  77. Vincent, P. (2010). A connection between score matching and denoising autoencoders. Neural Computation, 23(7), 1661–1674.

    Article  MathSciNet  MATH  Google Scholar 

  78. Wu, Y. N., Gao, R., Han, T., & Zhu, S.-C. (2019). A tale of three probabilistic families: Discriminative, descriptive and generative models. Quarterly of Applied Mathematics, 77(2), 423–465.

    Article  MathSciNet  MATH  Google Scholar 

  79. Xie, J., Lu, Y., Gao, R., & Wu, Y. N. (2018a). Cooperative learning of energy-based model and latent variable model via MCMC teaching. In The AAAI Conference on Artificial Intelligence.

    Google Scholar 

  80. Xie, J., Lu, Y., Gao, R., Zhu, S.-C., & Wu, Y. N. (2018b). Cooperative training of descriptor and generator networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, (preprints).

    Google Scholar 

  81. Xie, J., Lu, Y., Gao, R., Zhu, S.-C., & Wu, Y. N. (2018c). Cooperative training of descriptor and generator networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(1), 27–45.

    Article  Google Scholar 

  82. Xie, J., Lu, Y., Zhu, S.-C., & Wu, Y. N. (2016b). A theory of generative ConvNet. In International Conference on Machine Learning (pp. 2635–2644).

    Google Scholar 

  83. Xie, J., Zheng, Z., Fang, X., Zhu, S.-C., & Wu, Y. N. (2019b). Multimodal conditional learning with fast thinking policy-like model and slow thinking planner-like model. arXiv preprint arXiv:1902.02812.

    Google Scholar 

  84. Xie, J., Zheng, Z., Fang, X., Zhu, S.-C., & Wu, Y. N. (2021). Cooperative training of fast thinking initializer and slow thinking solver for conditional learning. In IEEE Transactions on Pattern Analysis and Machine Intelligence.

    Google Scholar 

  85. Xie, J., Zheng, Z., Gao, R., Wang, W., Zhu, S.-C., & Wu, Y. N. (2018d). Learning descriptor networks for 3D shape synthesis and analysis (pp. 8629–8638).

    Google Scholar 

  86. Xie, J., Zhu, S.-C., & Nian Wu, Y. (2017). Synthesizing dynamic patterns by spatial-temporal generative ConvNet (pp. 7093–7101).

    Google Scholar 

  87. Ziebart, B. D., Maas, A. L., Bagnell, J. A., & Dey, A. K. (2008). Maximum entropy inverse reinforcement learning. In The AAAI Conference on Artificial Intelligence.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Zhu, SC., Wu, Y.N. (2023). A Tale of Three Families: Discriminative, Descriptive, and Generative Models. In: Computer Vision. Springer, Cham. https://doi.org/10.1007/978-3-030-96530-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-96530-3_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-96529-7

  • Online ISBN: 978-3-030-96530-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics