Variational Bayes via propositionalized probability computation in PRISM

Article

Abstract

We propose a logic-based approach to variational Bayes (VB) via propositionalized probability computation in a symbolic-statistical modeling language PRISM. PRISM computes probabilities of logical formulas by reducing them to AND/OR boolean formulas called explanation graphs containing probabilistic \({\tt msw/2}\) atoms. We put Dirichlet priors on parameters of \({\tt msw/2}\) atoms and derive a variational Bayes EM algorithm that learns their hyper parameters from data. It runs on explanation graphs deduced from a program and a goal and computes probabilities in a dynamic programming manner in time linear in the size of the graphs. To show the genericity and effectiveness of Bayesian modeling by the proposed approach, we conducted two learning experiments, one with a probabilistic right-corner grammar and the other with a profile-HMM. To our knowledge, no previous report has been made of VB applied to these models.

Keywords

Variational Bayes Probabilistic models PRISM 

Mathematics Subject Classifications (2000)

68T37 62F15 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Attias, H.: Inferring parameters and structure of latent variable models by variational Bayes. In: Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI-99), pp. 21–30, Stockholm, 30 July–1 August 1999Google Scholar
  2. 2.
    Baker, J.K.: Trainable grammars for speech recognition. In: Proceedings of Spring Conference of the Acoustical Society of America, pp. 547–550 (1979)Google Scholar
  3. 3.
    Bateman, A., Birney, E., Durbin, R., Eddy, S., Howe, K., Sonnhammer, E.: The Pfam protein families database. Nucleic Acids Res. 28(1), 263–266 (2000)CrossRefGoogle Scholar
  4. 4.
    Beal, M., Ghahramani, Z.: The variational Bayesian EM algorithm for incomplete data: with application to scoring graphical model structures. In: Bayesian Statistics, vol. 7, pp. 453–454. Oxford University Press, Oxford (2003)Google Scholar
  5. 5.
    Beal, M.J., Ghahramani, Z.: Variational Bayesian learning of directed graphical models with hidden variables. Bayesian Anal. 1(4), 793–832 (2006)MathSciNetGoogle Scholar
  6. 6.
    Chavira, M., Darwiche, A.: Compiling Bayesian networks with local structure. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI’05), pp. 1306–1312, Edinburgh, 30 July–5 August 2005Google Scholar
  7. 7.
    Cheeseman, P., Stutz, J.: Bayesian classification (autoclass): theory and results. In: Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 153–180. AAAI, Menlo Park (1996)Google Scholar
  8. 8.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. R. Stat. Soc. B39(1), 1–38 (1977)MathSciNetGoogle Scholar
  9. 9.
    Ghahramani, Z., Beal, M.: Graphical models and variational methods. In: Saad, D., Opper, M. (eds.) Advanced Mean Field Methods—Theory and Practice. MIT, Cambridge (2001)Google Scholar
  10. 10.
    Goodman, J.: Parsing inside-out. Ph.D. dissertation, Harvard University (1998)Google Scholar
  11. 11.
    Kameya, Y., Sato, T.: Efficient EM learning for parameterized logic programs. In: Proceedings of the 1st Conference on Computational Logic (CL’00). Lecture Notes in Artificial Intelligence, vol. 1861, pp. 269–294. Springer, New York (2000)Google Scholar
  12. 12.
    Krogh, A., Brown, M., Mian, I., Sjolander, K., Haussler, D.: Hidden markov models in computational biology: applications to protein modeling. J. Mol. Biol. 235, 1501–1531 (1994)CrossRefGoogle Scholar
  13. 13.
    Kurihara, K., Sato, T.: An application of the variational Bayesian approach to probabilistic context-free grammars. In: Proceedings of the IJCNLP-04 Workshop Beyond Shallow Analyses (2004)Google Scholar
  14. 14.
    Kurihara, K., Sato, T.: Variational bayesian grammar induction for natural language. In: Proceedings of the 8th International Colloquium on Grammatical Inference (ICGI-2006), pp. 84–95, Tokyo, 20–22 September 2006Google Scholar
  15. 15.
    MacKay, D.: Ensemble learning for hidden Markov models. Technical report, Cavendish Laboratory, University of Cambridge (1997)Google Scholar
  16. 16.
    Manning, C.: Probabilistic parsing using left corner language models. In: Proceedings of the 5th International Conference on Parsing Technologies (IWPT-97), pp. 147–158. MIT, Cambridge (1997)Google Scholar
  17. 17.
    Mateescu, R., Dechter, R.: The relationship between AND/OR search spaces and variable elimination. In: Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI’05), pp. 380–387, Edinburgh, 26–29 July 2005Google Scholar
  18. 18.
    McAllester, D., Collins, M., Pereira, F.: Case-factor diagrams for structured probabilistic modeling. In: Proceedings of the 20th Annual Conference on Uncertainty in Artificial Intelligence (UAI’04), pp. 382–391. AUAI, Arlington (2004)Google Scholar
  19. 19.
    Minato, S., Satoh, K., Sato, T.: Compiling Bayesian networks by symbolic probability calculation based on zero-suppressed bdds. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI’07), pp. 2550–2555, Hyderabad, 6–12 January 2007Google Scholar
  20. 20.
    Pearl, J.: Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Francisco (1988)Google Scholar
  21. 21.
    Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)CrossRefGoogle Scholar
  22. 22.
    Roark, B., Johnson, M.: Efficient probabilistic top-down and left-corner parsing. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pp. 421–428 (1999)Google Scholar
  23. 23.
    Sato, T.: Inside-outside probability computation for belief propagation. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI’07), pp. 2605–2610 (2007)Google Scholar
  24. 24.
    Sato, T.: A glimpse of symbolic-statistical modeling by prism. J. Intell. Inf. Syst. 31(2), 161–176 (2008)CrossRefGoogle Scholar
  25. 25.
    Sato, T., Abe, S., Kameya, Y., Shirai, K.: A separate-and-learn approach to EM learning of PCFGs. In: Proceedings of the 6th Natural Language Processing Pacific Rim Symposium (NLRPS2001), pp. 255–262 (2001)Google Scholar
  26. 26.
    Sato, T., Kameya, Y.: PRISM: a language for symbolic-statistical modeling. In: Proceedings of the 15th International Joint Conference on Artificial Intelligence (IJCAI’97), pp. 1330–1335 (1997)Google Scholar
  27. 27.
    Sato, T., Kameya, Y.: Parameter learning of logic programs for symbolic-statistical modeling. J. Artif. Intell. Res. 15, 391–454 (2001)MATHMathSciNetGoogle Scholar
  28. 28.
    Sato, T., Kameya, Y., Abe, S., Shirai, K.: Fast EM learning of a family of PCFGs. Technical Report (Dept. of CS) TR01-0006, Tokyo Institute of Technology (2001)Google Scholar
  29. 29.
    Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)MATHCrossRefGoogle Scholar
  30. 30.
    Sornlertlamvanich, V., Inui, K., Shirai, K., Tanaka, H., Tokunaga, T., Takezawa, T.: Empirical evaluation of probabilistic glr parsing. In: Proceedings of the Natural Language Processing Pacific Rim Symposium, pp. 169–174 (1997)Google Scholar
  31. 31.
    Stoye, J., Evers, D., Meyer, F.: Rose: generating sequence families. Bioinformatics 14(2), 157–163 (1998)CrossRefGoogle Scholar
  32. 32.
    Uratani, N., Takezawa, T., Matsuo, H., Morita, C.: ATR integrated speech and language database. Technical Report TR-IT-0056, ATR Interpreting Telecommunications Research Laboratories (1994, in Japanese)Google Scholar
  33. 33.
    Van Uytsel, D., Van Aelten, F., Van Compernolle, D.: Language modeling with probabilistic left corner parsing. Comput. Speech Lang. 19, 171–204 (2005)CrossRefGoogle Scholar
  34. 34.
    Van Uytsel, D., Van Compernolle, D., Wambacq, P.: Maximum-likelihood training of the PLCG-based language model. In: Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop 2001 (ASRU’01) (2001)Google Scholar
  35. 35.
    Wetherell, C.S.: Probabilistic languages: a review and some open questions. Comput. Surv. 12(4), 361–379 (1980)MATHCrossRefMathSciNetGoogle Scholar
  36. 36.
    Zhou, N.F., Sato, T.: Efficient fixpoint computation in linear tabling. In: Proceedings of the 5th ACM-SIGPLAN International Conference on Principles and Practice of Declarative Programming (PPDP’03), pp. 275–283 (2003)Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2009

Authors and Affiliations

  • Taisuke Sato
    • 1
  • Yoshitaka Kameya
    • 1
  • Kenichi Kurihara
    • 2
  1. 1.Tokyo Institute of TechnologyTokyoJapan
  2. 2.Google Japan Inc.TokyoJapan

Personalised recommendations