Advertisement

Hierarchical Latent Tree Analysis for Topic Detection

  • Tengfei Liu
  • Nevin L. Zhang
  • Peixian Chen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8725)

Abstract

In the LDA approach to topic detection, a topic is determined by identifying the words that are used with high frequency when writing about the topic. However, high frequency words in one topic may be also used with high frequency in other topics. Thus they may not be the best words to characterize the topic. In this paper, we propose a new method for topic detection, where a topic is determined by identifying words that appear with high frequency in the topic and low frequency in other topics. We model patterns of word co- occurrence and co-occurrences of those patterns using a hierarchy of discrete latent variables. The states of the latent variables represent clusters of documents and they are interpreted as topics. The words that best distinguish a cluster from other clusters are selected to characterize the topic. Empirical results show that the new method yields topics with clearer thematic characterizations than the alternative approaches.

Keywords

Latent Variable Latent Dirichlet Allocation Font Size Probabilistic Graphical Model High Frequency Word 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bartholomew, D., Knott, M., Moustaki, I.: Latent Variable Models and Factor Analysis. A Unified Approach. John Wiley & Sons (2011)Google Scholar
  2. 2.
    Blei, D., Griffiths, T., Jordan, M.: The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM 57(2), 7:1–7:30 (2010)Google Scholar
  3. 3.
    Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. the Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHGoogle Scholar
  4. 4.
    Chen, T., Zhang, N.L., Liu, T.F., Poon, K.M., Wang, Y.: Model-based multidimensional clustering of categorical data. Artif. Intell. 176(1), 2246–2269 (2012)CrossRefzbMATHMathSciNetGoogle Scholar
  5. 5.
    Chen, T., Zhang, N.L., Wang, Y.: Efficient model evaluation in the search-based approach to latent structure discovery. In: 4th European Workshop on Probabilistic Graphical Models, pp. 57–64 (2008)Google Scholar
  6. 6.
    Choi, N.J., Tan, V.Y.F., Anandkumar, A., Willsky, A.: Learning latent tree graphical models. Journal of Machine Learning Research 12, 1771–1812 (2011)zbMATHMathSciNetGoogle Scholar
  7. 7.
    Chow, C.K., Liu, C.N.: Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory 14(3), 462–467 (1968)CrossRefzbMATHMathSciNetGoogle Scholar
  8. 8.
    Cover, T., Thomas, J.: Elements of Information Theory. Wiley-Interscience (2006)Google Scholar
  9. 9.
    Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm. J. Royal Statistical Society, Series B 39(1), 1–38 (1977)zbMATHMathSciNetGoogle Scholar
  10. 10.
    Kass, R., Raftery, A.: Bayes factor. Journal of American Statistical Association 90(430), 773–795 (1995)CrossRefzbMATHGoogle Scholar
  11. 11.
    Lafferty, J., Blei, D.: Correlated topic models. In: NIPS, pp. 147–155 (2005)Google Scholar
  12. 12.
    Liu, T.F., Zhang, N.L., Chen, P., Liu, H., Poon, K.M., Wang, Y.: Greedy learning of latent tree models for multidimensional clustering. Machine Learning (2013), doi: 10.1007/s10994-013-5393-0Google Scholar
  13. 13.
    Liu, T.F., Zhang, N.L., Poon, K.M., Liu, H., Wang, Y.: A novel ltm-based method for multi-partition clustering. In: 6th European Workshop on Probabilistic Graphical Models, pp. 203–210 (2012)Google Scholar
  14. 14.
    Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: EMNLP, pp. 262–272 (2011)Google Scholar
  15. 15.
    Mourad, R., Sinoquet, C., Zhang, N.L., Liu, T.F., Leray, P.: A survey on latent tree models and applications. J. Artif. Intell. Res. (JAIR) 47, 157–203 (2013)zbMATHMathSciNetGoogle Scholar
  16. 16.
    Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc. (1988)Google Scholar
  17. 17.
    Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics 6, 461–464 (1978)CrossRefzbMATHMathSciNetGoogle Scholar
  18. 18.
    Shafer, G., Shenoy, P.: Probability propagation. Annals of Mathematics and Artificial Intelligence 2(1-4), 327–351 (1990)CrossRefzbMATHMathSciNetGoogle Scholar
  19. 19.
    Zhang, N.L.: Hierarchical latent class models for cluster analysis. Journal of Machine Learning Research 5(6), 697–723 (2004)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Tengfei Liu
    • 1
  • Nevin L. Zhang
    • 1
  • Peixian Chen
    • 1
  1. 1.Department of Computer Science and EngineeringThe Hong Kong University of Science and TechnologyHong Kong

Personalised recommendations