Probabilistic Models for Text Mining

  • Yizhou SunEmail author
  • Hongbo Deng
  • Jiawei Han


A number of probabilistic methods such as LDA, hidden Markov models, Markov random fields have arisen in recent years for probabilistic analysis of text data. This chapter provides an overview of a variety of probabilistic models for text mining. The chapter focuses more on the fundamental probabilistic techniques, and also covers their various applications to different text mining problems. Some examples of such applications include topic modeling, language modeling, document classification, document clustering, and information extraction.


Probabilistic models mixture model stochastic process graphical model 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    A. Ahmed and E. Xing. Timeline: A dynamic hierarchical dirichlet process model for recovering birth/death and evolution of topics in text stream. Uncertainty in Artificial Intelligence, 2010.Google Scholar
  2. 2.
    A. Ahmed and E. P. Xing. Dynamic non-parametric mixture models and the recurrent chinese restaurant process: with applications t evolutionary clustering. In SDM, pages 219–230, 2008.Google Scholar
  3. 3.
    C. Andrieu, N. De Freitas, A. Doucet, and M. Jordan. An introduction to mcmc for machine learning. Machine learning, 50(1):5–43, 2003.zbMATHCrossRefGoogle Scholar
  4. 4.
    D. Andrzejewski, X. Zhu, and M. Craven. Incorporating domain knowledge into topic modeling via dirichlet forest priors. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, pages 25–32, New York, NY, USA, 2009. ACM.Google Scholar
  5. 5.
    L. Baum, T. Petrie, G. Soules, and N. Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. The annals of mathematical statistics, 41(1):164– 171, 1970.MathSciNetzbMATHCrossRefGoogle Scholar
  6. 6.
    D. Bikel, R. Schwartz, and R. Weischedel. An algorithm that learns what’s in a name. Machine learning, 34(1):211–231, 1999.zbMATHCrossRefGoogle Scholar
  7. 7.
    J. Bilmes. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report TR-97-021, ICSI, 1997.Google Scholar
  8. 8.
    C. Bishop. Pattern recognition and machine learning. Springer, New York, 2006.zbMATHGoogle Scholar
  9. 9.
    D. M. Blei, T. L. Griffiths, and M. I. Jordan. The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM, Aug 2009.Google Scholar
  10. 10.
    D. M. Blei and M. I. Jordan. Variational inference for dirichlet process mixtures. Bayesian Analysis, 1:121–144, 2005.MathSciNetGoogle Scholar
  11. 11.
    D. M. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. JMLR, 3:993–1022, 2003.zbMATHGoogle Scholar
  12. 12.
    S. Borman. The expectation maximization algorithm: A short tutorial. Unpublished Technical report, 2004. Available online at Scholar
  13. 13.
    M. Chang, D. Goldwasser, D. Roth, and V. Srikumar. Discriminative learning over constrained latent representations. In Proc. of the Annual Meeting of the North American Association of Computational Linguistics (NAACL), 6, 2010.Google Scholar
  14. 14.
    M.-W. Chang, N. Rizzolo, and D. Roth. Integer linear programming in nlp – constrained conditional models. Tutorial, NAACL, 2010.Google Scholar
  15. 15.
    H. Chen. Parallel implementations of probabilistic latent semantic analysis on graphic processing units. Computer science, University of Illinois at Urbana–Champaign, 2011.Google Scholar
  16. 16.
    S. Chhabra, W. Yerazunis, and C. Siefkes. Spam filtering using a markov random field model with variable weighting schemas. In ICDM Conference, pages 347–350, 2004.Google Scholar
  17. 17.
    C. T. Chu, S. K. Kim, Y. A. Lin, Y. Yu, G. R. Bradski, A. Y. Ng, and K. Olukotun. Map-Reduce for machine learning on multicore. In NIPS, pages 281–288, 2006.Google Scholar
  18. 18.
    A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. SIAM Rev., 51:661–703, November 2009.MathSciNetzbMATHCrossRefGoogle Scholar
  19. 19.
    F. Cozman. Generalizing variable elimination in bayesian networks. In Workshop on Probabilistic Reasoning in Artificial Intelligence, pages 27–32, 2000.Google Scholar
  20. 20.
    L. de Campos, J. Fern´andez-Luna, and J. Huete. Bayesian networks and information retrieval: an introduction to the special issue. Information processing & management, 40(5):727–733, 2004.Google Scholar
  21. 21.
    F. Dellaert. The expectation maximization algorithm. Technical report, 2002.Google Scholar
  22. 22.
    A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B, 39(1):1–38, 1977.MathSciNetzbMATHGoogle Scholar
  23. 23.
    J. R. Finkel, T. Grenager, and C. D. Manning. Incorporating nonlocal information into information extraction systems by gibbs sampling. In ACL, 2005.Google Scholar
  24. 24.
    G. Forney Jr. The viterbi algorithm. Proceedings of the IEEE, 61(3):268–278, 1973.MathSciNetCrossRefGoogle Scholar
  25. 25.
    N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. Learning probabilistic relational models. In International Joint Conference on Artificial Intelligence, volume 16, pages 1300–1309, 1999.Google Scholar
  26. 26.
    K. Ganchev, J. A. Gra,ca, J. Gillenwater, and B. Taskar. Posterior regularization for structured latent variable models. Journal of Machine Learning Research, 11:2001–2049, Aug. 2010.Google Scholar
  27. 27.
    T. Griffiths and Z. Ghahramani. Infinite latent feature models and the indian buffet process. In NIPS, pages 475–482, 2005.Google Scholar
  28. 28.
    T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, 101(suppl. 1):5228–5235, 2004.Google Scholar
  29. 29.
    T. Hofmann. Probabilistic latent semantic analysis. In Proceedings of Uncertainty in Artificial Intelligence, UAI, 1999.Google Scholar
  30. 30.
    T. Hofmann. Probabilistic latent semantic indexing. In ACM SIGIR Conference, pages 50–57, 1999.Google Scholar
  31. 31.
    C. Hong, W. Chen, W. Zheng, J. Shan, Y. Chen, and Y. Zhang. Parallelization and characterization of probabilistic latent semantic analysis. International Conference on Parallel Processing, 0:628– 635, 2008.Google Scholar
  32. 32.
    M. I. Jordan. Graphical models. Statistical Science, 19(1):140–155, 2004.zbMATHGoogle Scholar
  33. 33.
    M. I. Jordan. Dirichlet processes, chinese restaurant processes and all that. Tutorial presentation at the NIPS Conference, 2005.Google Scholar
  34. 34.
    C. T. Kelley. Iterative methods for optimization. Frontiers in Applied Mathematics, SIAM, 1999.zbMATHCrossRefGoogle Scholar
  35. 35.
    R. Kindermann, J. Snell, and A. M. Society. Markov random fields and their applications. American Mathematical Society Providence, RI, 1980.Google Scholar
  36. 36.
    D. Koller and N. Friedman. Probabilistic graphical models. MIT press, 2009. [37] J. Kupiec. Robust part-of-speech tagging using a hidden markov model. Computer Speech & Language, 6(3):225–242, 1992.Google Scholar
  37. 37.
    J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML, pages 282–289, 2001.Google Scholar
  38. 38.
    J.-M. Marin, K. L. Mengersen, and C. Robert. Bayesian modelling and inference on mixtures of distributions. In D. Dey and C. Rao, editors, Handbook of Statistics: Volume 25. Elsevier, 2005.Google Scholar
  39. 39.
    A. McCallum and W. Li. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4, pages 188–191. Association for Computational Linguistics, 2003.Google Scholar
  40. 40.
    Q. Mei, D. Cai, D. Zhang, and C. Zhai. Topic modeling with network regularization. In WWW Conference, 2008.Google Scholar
  41. 41.
    Q. Mei, X. Ling, M. Wondra, H. Su, and C. Zhai. Topic sentiment mixture: modeling facets and opinions in weblogs. In WWW Conference, pages 171–180, 2007.Google Scholar
  42. 42.
    Q. Mei and C. Zhai. A mixture model for contextual text mining. In ACM KDD Conference, pages 649–655, 2006.Google Scholar
  43. 43.
    D. Metzler and W. Croft. A markov random field model for term dependencies. In ACM SIGIR Conference, pages 472–479, 2005.Google Scholar
  44. 44.
    T. Minka. Expectation propagation for approximate bayesian inference. In Uncertainty in Artificial Intelligence, volume 17, pages 362–369, 2001.Google Scholar
  45. 45.
    K. Murphy, Y. Weiss, and M. Jordan. Loopy belief propagation for approximate inference: An empirical study. In Proceedings of Uncertainty in AI, volume 9, pages 467–475, 1999.Google Scholar
  46. 46.
    R. Nallapati, W. Cohen, and J. Lafferty. Parallelized variational em for latent dirichlet allocation: An experimental evaluation of speed and scalability. In Proceedings of the Seventh IEEE International Conference on Data Mining Workshops, pages 349–354, 2007.Google Scholar
  47. 47.
    R. M. Neal. Markov chain sampling methods for dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2):249–265, 2000.MathSciNetCrossRefGoogle Scholar
  48. 48.
    D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed inference for latent dirichlet allocation. In NIPS Conference, 2007.Google Scholar
  49. 49.
    K. Nigam, A. K. McCallum, S. Thrun, and T. Mitchell. Text classification from labeled and unlabeled documents using em. Machine Learning, 39:103–134, May 2000.zbMATHCrossRefGoogle Scholar
  50. 50.
    P. Orbanz and Y. W. Teh. Bayesian nonparametric models. In Encyclopedia of Machine Learning, pages 81–89. 2010.Google Scholar
  51. 51.
    J. Pitman and M. Yor. The Two-Parameter Poisson-Dirichlet distribution derived from a stable subordinator. The Annals of Probability, 25(2):855–900, 1997.MathSciNetzbMATHCrossRefGoogle Scholar
  52. 52.
    I. Porteous, D. Newman, A. Ihler, A. Asuncion, P. Smyth, and M. Welling. Fast collapsed gibbs sampling for latent dirichlet allocation. In ACM KDD Conference, pages 569–577, 2008.Google Scholar
  53. 53.
    V. Punyakanok, D. Roth, W. Yih, and D. Zimak. Learning and inference over constrained output. In Proc. of the International Joint Conference on Artificial Intelligence (IJCAI), pages 1124– 1129, 2005.Google Scholar
  54. 54.
    L. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257– 286, 1989.CrossRefGoogle Scholar
  55. 55.
    L. R. Rabiner and B. H. Juang. An introduction to hidden Markov models. IEEE ASSP Magazine, pages 4–15, January 1986.Google Scholar
  56. 56.
    C. E. Rasmussen. The infinite gaussian mixture model. In In Advances in Neural Information Processing Systems 12, volume 12, pages 554–560, 2000.Google Scholar
  57. 57.
    C. E. Rasmussen and C. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.Google Scholar
  58. 58.
    M. Richardson and P. Domingos. Markov logic networks. Machine Learning, 62(1):107–136, 2006.Google Scholar
  59. 59.
    D. Roth and W. Yih. Integer linear programming inference for conditional random fields. In International Conference on Machine Learning (ICML), pages 737–744, 2005.Google Scholar
  60. 60.
    M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A bayesian approach to filtering junk e-mail. In AAAI Workshop on Learning for Text Categorization, 1998.Google Scholar
  61. 61.
    J. Sethuraman. A constructive definition of dirichlet priors. Statistica Sinica, 4:639–650, 1994.MathSciNetzbMATHGoogle Scholar
  62. 62.
    F. Sha and F. Pereira. Shallow parsing with conditional random fields. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pages 134–141, 2003.Google Scholar
  63. 63.
    Y. Sun, J. Han, J. Gao, and Y. Yu. itopicmodel: Information network-integrated topic modeling. In ICDM, pages 493–502, 2009.Google Scholar
  64. 64.
    C. Sutton and A. McCallum. An introduction to conditional random fields for relational learning. Introduction to statistical relational learning, pages 95–130, 2006.Google Scholar
  65. 65.
    Y. W. Teh. A hierarchical bayesian language model based on pitman-yor processes. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, ACL-44, pages 985–992, 2006.Google Scholar
  66. 66.
    Y. W. Teh. Dirichlet processes. In Encyclopedia of Machine Learning. Springer, 2010.Google Scholar
  67. 67.
    Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566–1581, 2006.MathSciNetzbMATHCrossRefGoogle Scholar
  68. 68.
    R. Thibaux and M. I. Jordan. Hierarchical beta processes and the indian buffet process. Journal of Machine Learning Research – Proceedings Track, 2:564–571, 2007.Google Scholar
  69. 69.
    Y. Wang, H. Bai, M. Stanton, W.-Y. Chen, and E. Y. Chang. Plda: Parallel latent dirichlet allocation for large-scale applications. In Proceedings of the 5th International Conference on Algorithmic Aspects in Information and Management, pages 301–314, 2009.Google Scholar
  70. 70.
    C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In ACM KDD Conference, pages 743– 748, 2004.Google Scholar
  71. 71.
    J. Zhang, Y. Song, C. Zhang, and S. Liu. Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora. In ACM KDD Conference, pages 1079–1088, New York, NY, USA, 2010. ACM.Google Scholar
  72. 72.
    X. Zhu, Z. Ghahramani, and J. Lafferty. Time-sensitive dirichlet process mixture models. Technical report, Carnegie Mellon University, 2005.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of Illinois at Urbana-ChampaignChampaignUSA

Personalised recommendations