Modeling Community Structure and Topics in Dynamic Text Networks
- 72 Downloads
The last decade has seen great progress in both dynamic network modeling and topic modeling. This paper draws upon both areas to create a bespoke Bayesian model applied to a dataset consisting of the top 467 US political blogs in 2012, their posts over the year, and their links to one another. Our model allows dynamic topic discovery to inform the latent network model and the network structure to facilitate topic identification. Our results find complex community structure within this set of blogs, where community membership depends strongly upon the set of topics in which the blogger is interested. We examine the time varying nature of the Sensational Crime topic, as well as the network properties of the Election News topic, as notable and easily interpretable empirical examples.
KeywordsNetworks Natural language processing Topic modeling Political blogs Community detection
- Arun, R., Suresh, V., Madhavan, C.V., Murthy, M.N. (2010). On finding the natural number of topics with latent dirichlet allocation: Some observations. In Advances in knowledge discovery and data mining (pp. 391–402). Springer.Google Scholar
- Blei, D.M., & Lafferty, J.D. (2006). Dynamic topic models. In Proceedings of the 23rd international conference on machine learning (pp. 113–120). ACM.Google Scholar
- Brown, P.F., Desouza, P.V., Mercer, R.L., Pietra, V.J.D., Lai, J.C. (1992). Class-based n-gram models of natural language. Computational linguistics, 18 (4), 467–479.Google Scholar
- Chang, J., & Blei, D.M. (2009). Relational topic models for document networks. In International conference on artificial intelligence and statistics (pp. 81–88).Google Scholar
- Ho, Q., Eisenstein, J., Xing, E.P. (2012). Document hierarchies from text and links. In Proceedings of the 21st international conference on World Wide Web (pp. 739–748). ACM.Google Scholar
- Hoffman, M., Bach, F.R., Blei, D.M. (2010). Online learning for latent dirichlet allocation. In Advances in neural information processing systems (pp. 856–864).Google Scholar
- McNamee, P., & Mayfield, J. (2003). Jhu/apl experiments in tokenization and non-word translation. In Comparative evaluation of multilingual information access systems (pp. 85–97). Springer.Google Scholar
- Ramos, J. (2003). Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning.Google Scholar
- Technorati. (2002). https://web.archive.org/web/20140420052710/http://technorati.com/.
- Wang, E., Silva, J., Willett, R., Carin, L. (2011). Dynamic relational topic model for social network analysis with noisy links. In 2011 IEEE, statistical signal processing workshop (SSP) (pp. 497–500). IEEE.Google Scholar
- Yin, J., & Wang, J. (2014). A dirichlet multinomial mixture model-based approach for short text clustering. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 233–242). ACM.Google Scholar