Hierarchical Latent Tree Analysis for Topic Detection
In the LDA approach to topic detection, a topic is determined by identifying the words that are used with high frequency when writing about the topic. However, high frequency words in one topic may be also used with high frequency in other topics. Thus they may not be the best words to characterize the topic. In this paper, we propose a new method for topic detection, where a topic is determined by identifying words that appear with high frequency in the topic and low frequency in other topics. We model patterns of word co- occurrence and co-occurrences of those patterns using a hierarchy of discrete latent variables. The states of the latent variables represent clusters of documents and they are interpreted as topics. The words that best distinguish a cluster from other clusters are selected to characterize the topic. Empirical results show that the new method yields topics with clearer thematic characterizations than the alternative approaches.
KeywordsLatent Variable Latent Dirichlet Allocation Font Size Probabilistic Graphical Model High Frequency Word
Unable to display preview. Download preview PDF.
- 1.Bartholomew, D., Knott, M., Moustaki, I.: Latent Variable Models and Factor Analysis. A Unified Approach. John Wiley & Sons (2011)Google Scholar
- 2.Blei, D., Griffiths, T., Jordan, M.: The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. J. ACM 57(2), 7:1–7:30 (2010)Google Scholar
- 5.Chen, T., Zhang, N.L., Wang, Y.: Efficient model evaluation in the search-based approach to latent structure discovery. In: 4th European Workshop on Probabilistic Graphical Models, pp. 57–64 (2008)Google Scholar
- 8.Cover, T., Thomas, J.: Elements of Information Theory. Wiley-Interscience (2006)Google Scholar
- 11.Lafferty, J., Blei, D.: Correlated topic models. In: NIPS, pp. 147–155 (2005)Google Scholar
- 12.Liu, T.F., Zhang, N.L., Chen, P., Liu, H., Poon, K.M., Wang, Y.: Greedy learning of latent tree models for multidimensional clustering. Machine Learning (2013), doi: 10.1007/s10994-013-5393-0Google Scholar
- 13.Liu, T.F., Zhang, N.L., Poon, K.M., Liu, H., Wang, Y.: A novel ltm-based method for multi-partition clustering. In: 6th European Workshop on Probabilistic Graphical Models, pp. 203–210 (2012)Google Scholar
- 14.Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: EMNLP, pp. 262–272 (2011)Google Scholar
- 16.Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc. (1988)Google Scholar