Advertisement

Bag of Timestamps: A Simple and Efficient Bayesian Chronological Mining

  • Tomonari Masada
  • Atsuhiro Takasu
  • Tsuyoshi Hamada
  • Yuichiro Shibata
  • Kiyoshi Oguri
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5446)

Abstract

In this paper, we propose a new probabilistic model, Bag of Timestamps (BoT), for chronological text mining. BoT is an extension of latent Dirichlet allocation (LDA), and has two remarkable features when compared with a previously proposed Topics over Time (ToT), which is also an extension of LDA. First, we can avoid overfitting to temporal data, because temporal data are modeled in a Bayesian manner similar to word frequencies. Second, BoT has a conditional probability where no functions requiring time-consuming computations appear. The experiments using newswire documents show that BoT achieves more moderate fitting to temporal data in shorter execution time than ToT.

Keywords

Temporal Data Beta Distribution Semantical Content Latent Dirichlet Allocation Word Token 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
    Blei, D., Lafferty, J.: Dynamic Topic Models. In: Proc. of ICML 2006, pp. 113–120 (2006)Google Scholar
  5. 5.
    Blei, D., Ng, A., Jordan, M.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHGoogle Scholar
  6. 6.
    Erosheva, E., Fienberg, S., Lafferty, J.: Mixed-membership models of scientific publications. Proc. Natl. Acad. Sci. 101(suppl. 1), 5220–5527 (2004)CrossRefGoogle Scholar
  7. 7.
    Griffiths, T., Steyvers, M.: Finding Scientific Topics. Proc. Natl. Acad. Sci. 101(suppl. 1), 5228–5235 (2004)CrossRefGoogle Scholar
  8. 8.
    Nallapati, R., Cohen, W., Ditmore, S., Lafferty, J., Ung, K.: Multiscale Topic Tomography. In: Proc. of KDD 2007, pp. 520–529 (2007)Google Scholar
  9. 9.
    Wang, X., McCallum, A.: Topics over Time: A Non-Markov Continuous-Time Model of Topical Trends. In: Proc. of KDD 2006, pp. 424–433 (2006)Google Scholar
  10. 10.
    Wang, X., Mohanty, N., McCallum, A.: Group and Topic Discovery from Relations and Text. In: Proc. of LinkKDD 2005, pp. 28–35 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Tomonari Masada
    • 1
  • Atsuhiro Takasu
    • 2
  • Tsuyoshi Hamada
    • 1
  • Yuichiro Shibata
    • 1
  • Kiyoshi Oguri
    • 1
  1. 1.Nagasaki UniversityNagasakiJapan
  2. 2.National Institute of InformaticsTokyoJapan

Personalised recommendations