Probabilistic Inference Using Function Factorization and Divergence Minimization



This chapter addresses modeling issues in statistical inference problems. We will focus specifically on factorization model which is a generalization of Markov random fields and Bayesian networks. For any positive function (say an estimated probability distribution), we present a mechanical approach which approximates the function with one in a factorization model that is as simple as possible, subject to an upper bound on approximation error. We also rewrite a probabilistic inference problem into a divergence minimization (DM) problem where iterative algorithms are proposed to solve the DM problem. We prove that the well-known EM algorithm is a special case of our proposed iterative algorithm.


Divergence distance Factorization Hammersley–Clifford theorem Markov random field Maximum likelihood estimation 


  1. 1.
    Besag, J.: Spatial interaction and the statistical analysis of lattice systems. J. Royal Stat. Soc. B (Methodological) 36(2), 192–236 (1974)Google Scholar
  2. 2.
    Chan, T., Yeung, R.W.: New results in probabilistic modeling. Doctoral Thesis, The Chinese University of Hong Kong (2000)Google Scholar
  3. 3.
    Chan, T., Yeung, R.W.: On factorization of positive functions. Proc. 2001 IEEE Int. Symp. Inform. Theory, Washington DC, USA, pp. 44, July 2001Google Scholar
  4. 4.
    Chan, T., Yeung, R.W.: On maximum likelihood estimation and divergence minimization. Proc. 2002 IEEE Int. Symp. Inform. Theory, Lausanne, Switzerland, pp. 158, July 2002Google Scholar
  5. 5.
    Christensen, R.: Log-Linear Models. Springer, New York (1990)MATHGoogle Scholar
  6. 6.
    Christensen, R.: Log-Linear Models and Logistic Regression. Springer, New York (1997)MATHGoogle Scholar
  7. 7.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley Interscience, New York (1991)CrossRefMATHGoogle Scholar
  8. 8.
    Csiszár, I.: I-divegence geometry of probability mass functions and minimization problems. Ann. Probab. 3(1), 146–158 (1975)CrossRefMATHGoogle Scholar
  9. 9.
    Csiszár, I., Tusnady, G.: Information geometry and alternating minimization procedures. In: Dedewicz, E.F., et al. (eds.) Statistics and Decisions, pp. 205–237 (1984)Google Scholar
  10. 10.
    Csiszár, I.: Sanov property, generalized i-projection and a conditional limit theorem. Ann. Probab. 12(3), 768–793 (1984)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Csiszár, I.: A geometric interpretation of Darroch and Ratcliff’s generalized iterative scaling. Ann. Stat. 17(3), 1409–1413 (1989)CrossRefGoogle Scholar
  12. 12.
    Darroch, J.N., Ratcliff, D.: Generalized iterative scaling for log-linear models. Ann. Math. Stat. 43(5), 1470–1480 (1972)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum-likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. B. 39(1), 1–38 (1977)MathSciNetMATHGoogle Scholar
  14. 14.
    Dykstra, R.L., Lemke, J.H.: Duality of I projections and maximum likelihood estimation of log-linear models under cone constraints. J. Am. Stat. Assoc. 83, 546–554 (1988)MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Figueiredo, M.T., Leitao, J.M.N.: Bayesian estimation of ventricular contours in angiographic images. IEEE Trans. Med. Imag. 11, 416–429 (1992)CrossRefGoogle Scholar
  16. 16.
    Frey, B.J.: Graphical Models for Machine Learning and Digital Communication. MIT, Cambridge, Mass (1998)Google Scholar
  17. 17.
    Ising, E.: Beitrag sur Theorie des Ferromagnetismus. Zeit. fur Physik 31, 253–258 (1925)CrossRefGoogle Scholar
  18. 18.
    Jordan, M.I.: Learning in Graphical Models. Kluwer, Boston (1998)MATHGoogle Scholar
  19. 19.
    Lauritzen, S.L.: Graphical Models. Clarendon, Oxford (1996)Google Scholar
  20. 20.
    McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. Wiley, New York (1997)MATHGoogle Scholar
  21. 21.
    Neal, R.M., Hinton, G.E.: A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan, M.I. (ed.) Learning in Graphical Models, pp. 355–368. Kluwer, Boston (1999)Google Scholar
  22. 22.
    Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, San Mateo, California (1988)Google Scholar
  23. 23.
    Redner, R., Walker, H.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26(2), 195–239 (1984)MathSciNetCrossRefMATHGoogle Scholar
  24. 24.
    Yeung, R.W.: A First Course in Information Theory. Kluwer/Plenum, New York (2002)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Institute for Telecommunications ResearchUniversity of South AustraliaAdelaideAustralia
  2. 2.Department of Information EngineeringThe Chinese University of Hong KongHong KongChina

Personalised recommendations