An Introduction to Nonparametric Hierarchical Bayesian Modelling with a Focus on Multi-agent Learning

Part of the Lecture Notes in Computer Science book series (LNCS, volume 3355)


In this chapter, we address the situation where agents need to learn from one another by exchanging learned knowledge. We employ hierarchical Bayesian modelling, which provides a powerful and principled solution. We point out some shortcomings of parametric hierarchical Bayesian modelling and thus focus on a nonparametric approach. Nonparametric hierarchical Bayesian modelling has its roots in Bayesian statistics and, in the form of Dirichlet process mixture modelling, was recently introduced into the machine learning community. In this chapter, we hope to provide an accessible introduction to this particular branch of statistics. We present the standard sampling-based learning algorithms and introduce a particular EM learning approach that leads to efficient and plausible solutions. We illustrate the effectiveness of our approach in context of a recommendation engine where our approach allows the principled combination of content-based and collaborative filtering.


Prior Distribution Expectation Maximization Gibbs Sampling Dirichlet Process Dirichlet Distribution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Antoniak, C.E.: Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems. Annals of Statistics 2, 1152–1174 (1974)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Bakker, B., Heskes, T.: Task Clustering and Gating for Bayesian Multitask Learning. Journal of Machine Learning Research 4 (2003)Google Scholar
  3. 3.
    Beal, M.J., Ghahramani, Z., Rasmussen, C.E.: The Infinite Hidden Markov Model. Advances in Neural Information Processing Systems 14 (2002)Google Scholar
  4. 4.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3 (2003)Google Scholar
  5. 5.
    Blei, D.M., Jordan, M.I., Ng, A.Y.: Hierarchical Bayesian Modelling for Applications in Information Retrieval. Bayesian Statistics 7. Oxford University Press, Oxford (2003)Google Scholar
  6. 6.
    Blei, D.M., Jordan, M.I.: Variational methods for the Dirichlet process. To appear in Proceedings of the 21st International Conference on Machine Learning (2004)Google Scholar
  7. 7.
    Breese, J.S., Heckerman, D., Kadie, C.: Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In: Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (1998)Google Scholar
  8. 8.
    Cadez, I., Smyth, P.: Probabilistic Clustering using Hierarchical Models. TR No 99-16, Dept. of Information and Computer Science. University of California, Irvine (1999) Google Scholar
  9. 9.
    Escobar, M. D.: Estimating the Means of Several Normal Populations by Nonparametric Estimation of the Distribution of the Means. Unpublished PhD dissertation, Yale University (1988) Google Scholar
  10. 10.
    Escobar, M.D., West, M.: Computing Bayesian Nonparametric Hierarchical Models. In: Dey, D., Müller, P., Sinha, D. (eds.) Practical Nonparametric and Semiparametric Bayesian Statistics. Springer, Heidelberg (1998)Google Scholar
  11. 11.
    Ferguson, T.S.: A Bayesian Analysis of some Nonparametric Problems. Annals of Statistics 1, 209–230 (1973)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis. CRC Press, Boca Raton (2003)Google Scholar
  13. 13.
    Gilks, W.R., Richardson, S., Spiegelhalter, D.J.: Markov Chain Monte Carlo in Practice. CRC Press, Boca Raton (1995)Google Scholar
  14. 14.
    Gosh, J.K., Ramamoorthi, R.V.: Bayesian Nonparametrics. Springer Series in Statistics (2002)Google Scholar
  15. 15.
    Heckerman, D.: A Tutorial on Learning with Bayesian Networks. Technical report MSR-TR-95-06 of Microsoft Research (1995) Google Scholar
  16. 16.
    Heskes, T.: Empirical Bayes for Learning to Learn. In: Proc. 17th International Conf. on Machine Learning, pp. 367–374. Morgan Kaufmann, San Francisco (2000)Google Scholar
  17. 17.
    Holmen, J., Tresp, V., Simula, O.: A Self-Organizing Map for Clustering Probabilistic Models. In: Proceedings of the Ninth International Conference on Artificial Neural Networks (ICANN 1999), vol. 2 (1999)Google Scholar
  18. 18.
    Ishwaran, H., James, L.F.: Gibbs Sampling Methods for Stick-Breaking Priors. Journal of the American Statistical Association 96(453) (2001)Google Scholar
  19. 19.
    MacEachern, S. M.: Estimating Normal Means with Conjugate Style Dirichlet Process Prior. Technical report No. 487, Department of Statistics, The Ohio State University (1992) Google Scholar
  20. 20.
    Neal, R, M.: Bayesian Mixture Modeling by Monte Carlo Simulation. Technical Teport No. DCR-TR-91-2, Department of Computer Science, University of Toronto (1991) Google Scholar
  21. 21.
    Neal, R, M.: Markov Chain Sampling Methdos for Dirichlet Process Mixture Models. Technical report No. 9815, Department of Statistics, University of Toronto (1998) Google Scholar
  22. 22.
    Platt, J.C.: Probabilities for SV machines. In: Advances in Large Margin Classifiers. MIT Press, Cambridge (1999)Google Scholar
  23. 23.
    Rasmussen, C.E.: The Infinite Gaussian Mixture Model. Advances in Neural Information Processing Systems 12 (2000)Google Scholar
  24. 24.
    Rasmussen, C.E., Ghahramani, Z.: Infinite Mixtures of Gaussian Process Experts. Advances in Neural Information Processing Systems 14 (2002)Google Scholar
  25. 25.
    Rocchio, J.J.: Relevance Feedback in Information Retrieval. In: The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice Hall, Englewood Cliffs (1971)Google Scholar
  26. 26.
    Sethuraman, J.: A Constructive definition of Dirichlet Priors. Statistica Sinica 4 (1994)Google Scholar
  27. 27.
    Teh, Y.W., Jordan, M. I., Beal, M. J., Blei, D. M.: Hierarchical Dirichlet Proceses. Technical Report 653, UC Berkeley Statistics (2004) Google Scholar
  28. 28.
    Tomlinson, G., Escobar, M.: Analysis of Densities. Talk given at the Joint Statistical Meeting (2003) Google Scholar
  29. 29.
    Yu, K., Schwaighofer, A., Tresp, V., Ma, W.-Y., Zhang, H.: Collaborative Ensemble Learning: Combining Collaborative and Content-Based Information Filtering via Hierarchical Bayes. In: Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence (UAI), vol. 19 (2003)Google Scholar
  30. 30.
    Yu, K., Tresp, V., Yu, S.: A Nonparametric Bayesian Framework for Information Filtering. In: The Proceedings of the 27th Annual International ACM SIGIR Conference (2004)Google Scholar
  31. 31.
    Yu, K., Yu, S., Tresp, V.: Dirichlet Enhanced Latent Semantic Analysis. In: The Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics (2005)Google Scholar
  32. 32.
    West, M., Müller, P., Escobar, M.D.: Hierarchical Priors and Mixture Models, with Application in Regression and Density Estimation. In: Lindley, D.V., Smith, A.F.M., Freeman, P. (eds.) Aspects of Uncertainty: A Tribute, pp. 363–386. Wiley, New York (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  1. 1.Siemens AGMünchenGermany

Personalised recommendations