Tree-Structured Hierarchical Dirichlet Process
In many domains, document sets are hierarchically organized such as message forums having multiple levels of sections. Analysis of latent topics within such content is crucial for tasks like trend and user interest analysis. Nonparametric topic models are a powerful approach, but traditional Hierarchical Dirichlet Processes (HDPs) are unable to fully take into account topic sharing across deep hierarchical structure. We propose the Tree-structured Hierarchical Dirichlet Process, allowing Dirichlet process based topic modeling over a given tree structure of arbitrary size and height, where documents can arise at all tree nodes. Experiments on a hierarchical social message forum and a product reviews forum demonstrate better generalization performance than traditional HDPs in terms of ability to model new data and classify documents to sections.
KeywordsHierarchical Dirichlet Processes Topic modeling Message forum
- 4.Li, W., McCallum, A.: Pachinko allocation: DAG-structured mixture models of topic correlations. In: Proceedings of ICML, pp. 577–584. ACM (2006)Google Scholar
- 5.Adams, R., Ghahramani, Z., Jordan, M.: Tree-structured stick breaking for hierarchical data. In: Proceedings of NIPS, pp. 19–27. Curran Associates Inc. (2010)Google Scholar
- 8.Kim, J., Kim, D., Kim, S., Oh, A.: Modeling topic hierarchies with the recursive Chinese restaurant process. In: Proceedings of CIKM, pp. 783–792. ACM (2012)Google Scholar
- 9.He, R., McAuley, J.: Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering. In: Proceedings of WWW, pp. 507–517 (2016)Google Scholar