Constrained LDA for Grouping Product Features in Opinion Mining

  • Zhongwu Zhai
  • Bing Liu
  • Hua Xu
  • Peifa Jia
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6634)


In opinion mining of product reviews, one often wants to produce a summary of opinions based on product features. However, for the same feature, people can express it with different words and phrases. To produce an effective summary, these words and phrases, which are domain synonyms, need to be grouped under the same feature. Topic modeling is a suitable method for the task. However, instead of simply letting topic modeling find groupings freely, we believe it is possible to do better by giving it some pre-existing knowledge in the form of automatically extracted constraints. In this paper, we first extend a popular topic modeling method, called Latent Dirichlet Allocation (LDA), with the ability to process large scale constraints. Then, two novel methods are proposed to extract two types of constraints automatically. Finally, the resulting constrained-LDA and the extracted constraints are applied to group product features. Experiments show that constrained-LDA outperforms the original LDA and the latest mLSA by a large margin.


Opinion Mining Feature Grouping Constrained LDA 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of SIGKDD, pp. 168–177 (2004)Google Scholar
  2. 2.
    Basu, S., Davidson, I., Wagstaff, K.: Constrained clustering: Advances in algorithms, theory, and applications. Chapman & Hall/CRC, Boca Raton (2008)zbMATHGoogle Scholar
  3. 3.
    Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means clustering with background knowledge. In: Proceedings of ICML, pp. 577–584 (2001)Google Scholar
  4. 4.
    Andrzejewski, D., Zhu, X.: Latent Dirichlet Allocation with topic-in-set knowledge. In: Proceedings of NAACL HLT, pp. 43–48 (2009)Google Scholar
  5. 5.
    Andrzejewski, D., Zhu, X., Craven, M.: Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In: Proceedings of ICML, pp. 25–32 (2009)Google Scholar
  6. 6.
    Guo, H., Zhu, H., Guo, Z., Zhang, X., Su, Z.: Product feature categorization with multilevel latent semantic association. In: Proceedings of CIKM, pp. 1087–1096 (2009)Google Scholar
  7. 7.
    Blei, D., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3(3), 993–1022 (2003)zbMATHGoogle Scholar
  8. 8.
    Griffiths, T., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences 101(Suppl 1), 5228–5535 (2004)CrossRefGoogle Scholar
  9. 9.
    Blei, D., McAuliffe, J.: Supervised topic models. Advances in Neural Information Processing Systems 20, 121–128 (2008)Google Scholar
  10. 10.
    Ramage, D., Hall, D., Nallapati, R., Manning, C.: Labeled, LDA: A supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of EMNLP, pp. 248–256 (2009)Google Scholar
  11. 11.
    Chang, J., Blei, D.: Relational topic models for document networks. In: Proceedings of the 12th International Conference on Artificial Intelligence and Statistics(AISTATS), Clearwater Beach, Florida, USA (2009)Google Scholar
  12. 12.
    Carenini, G., Ng, R., Zwart, E.: Extracting knowledge from evaluative text. In: Proceedings of International Conference on Knowledge Capture, pp. 11–18 (2005)Google Scholar
  13. 13.
    Liu, B., Hu, M., Cheng, J.: Opinion Observer: Analyzing and Comparing Opinions on the Web. In: Proceedings of WWW, pp. 342–351 (2005)Google Scholar
  14. 14.
    Branavan, S.R.K., Chen, H., Eisenstein, J., Barzilay, R.: Learning document-level semantic properties from free-text annotations. In: Proceedings of ACL, pp. 569–603 (2008)Google Scholar
  15. 15.
    Zhai, Z., Liu, B., Xu, H., Jia, P.: Grouping Product Features Using Semi-supervised Learning with Soft-Constraints. In: Proceedings of COLING (2010)Google Scholar
  16. 16.
    Steyvers, M., Griffiths, T.: Probabilistic topic models. In: Handbook of Latent Semantic Analysis, pp. 424–440 (2007)Google Scholar
  17. 17.
    Rand, W.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66(336), 846–850 (1971)CrossRefGoogle Scholar
  18. 18.
    Cardie, C., Wagstaff, K.: Noun phrase coreference as clustering. In: Proceedings of the Eleventh National Conference on Artificial Intelligence, pp. 82–89 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Zhongwu Zhai
    • 1
  • Bing Liu
    • 2
  • Hua Xu
    • 1
  • Peifa Jia
    • 1
  1. 1.State Key Lab of Intelligent Tech. & Sys., State Key Lab of Intelligent Tech. &, Sys., Dept. of Comp. Sci. & Tech.Tsinghua Univ.China
  2. 2.Dept. of Comp. Sci.University of Illinois at ChicagoUSA

Personalised recommendations