Abstract
This paper presents a topic model that captures the temporal dynamics in the text data along with topical phrases. Previous approaches have relied upon bag-of-words assumption to model such property in a corpus. This has resulted in an inferior performance with less interpretable topics. Our topic model can not only capture changes in the way a topic structure changes over time but also maintains important contextual information in the text data. Finding topical n-grams, when possible based on context, instead of always presenting unigrams in topics does away with many ambiguities that individual words may carry. We derive a collapsed Gibbs sampler for posterior inference. Our experimental results show an improvement over the current state-of-the-art topics over time model.
Keywords
The work described in this paper is substantially supported by grants from the Research Grant Council of the Hong Kong Special Administrative Region, China (Project Code: CUHK413510) and the Direct Grant of the Faculty of Engineering, CUHK (Project Codes: 2050476 and 2050522). This work is also affiliated with the CUHK MoE-Microsoft Key Laboratory of Human-centric Computing and Interface Technologies.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proc. of ICML, pp. 113–120 (2006)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Bolelli, L., Ertekin, S., Giles, C.L.: Topic and Trend Detection in Text Collections Using Latent Dirichlet Allocation. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 776–780. Springer, Heidelberg (2009)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proc. of PNAS 101, 5228–5235 (2004)
Griffiths, T., Steyvers, M., Tenenbaum, J.: Topics in semantic representation. Psychological Review 114(2), 211–244 (2007)
Hong, L., Yin, D., Guo, J., Davison, B.D.: Tracking trends: incorporating term volume into temporal topic models. In: Proc. of KDD, pp. 484–492 (2011)
Jo, Y., Hopcroft, J.E., Lagoze, C.: The web of topics: discovering the topology of topic evolution in a corpus. In: Proc. of WWW, pp. 257–266 (2011)
Kawamae, N.: Trend analysis model: trend consists of temporal words, topics, and timestamps. In: Proc. of WSDM, pp. 317–326 (2011)
Kleinberg, J.: Bursty and hierarchical structure in streams. In: Proc. of KDD, pp. 91–101 (2002)
Knights, D., Mozer, M., Nicolov, N.: Detecting topic drift with compound topic models (2009)
Lindsey, R., Headden, W., Stipicevic, M.: A phrase-discovering topic model using hierarchical Pitman-Yor processes. In: Proc. of EMNLP-CoNLL, pp. 214–222 (2012)
Masada, T., Fukagawa, D., Takasu, A., Shibata, Y., Oguri, K.: Modeling Topical Trends over Continuous Time with Priors. In: Zhang, L., Lu, B.-L., Kwok, J. (eds.) ISNN 2010, Part II. LNCS, vol. 6064, pp. 302–311. Springer, Heidelberg (2010)
Nodelman, U., Shelton, C.R., Koller, D.: Continuous time Bayesian networks. In: Proc. of UAI, pp. 378–387 (2002)
Pruteanu-Malinici, I., Ren, L., Paisley, J., Wang, E., Carin, L.: Hierarchical Bayesian modeling of topics in time-stamped documents. TPAMI 32(6), 996–1011 (2010)
Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proc. of UAI, pp. 487–494 (2004)
Swan, R., Allan, J.: Extracting significant time varying features from text. In: Proc. of CIKM, pp. 38–45 (1999)
Wallach, H.M.: Topic modeling: Beyond bag-of-words. In: Proc. of ICML, pp. 977–984 (2006)
Wang, C., Blei, D.M., Heckerman, D.: Continuous time dynamic topic models. In: Proc. of UAI, pp. 579–586 (2008)
Wang, X., McCallum, A., Wei, X.: Topical n-grams: Phrase and topic discovery, with an application to information retrieval. In: Proc. of ICDM, pp. 697–702 (2007)
Wang, X., McCallum, A.: Topics over time: A non-markov continuous-time model of topical trends. In: Proc. of KDD, pp. 424–433 (2006)
Wang, X., Mohanty, N., McCallum, A.: Group and topic discovery from relations and text. In: Proc. of LinkKDD, pp. 28–35 (2005)
Yin, Z., Cao, L., Han, J., Zhai, C., Huang, T.: LPTA: A probabilistic model for latent periodic topic analysis. In: Proc. of ICDM, pp. 904–913 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jameel, S., Lam, W. (2013). An N-Gram Topic Model for Time-Stamped Documents. In: Serdyukov, P., et al. Advances in Information Retrieval. ECIR 2013. Lecture Notes in Computer Science, vol 7814. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36973-5_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-36973-5_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36972-8
Online ISBN: 978-3-642-36973-5
eBook Packages: Computer ScienceComputer Science (R0)