Skip to main content

Visually Summarizing Semantic Evolution in Document Streams with Topic Table

  • Conference paper
Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2010)

Abstract

We propose a visualization technique for summarizing contents of document streams, such as news or scientific archives. The content of streaming documents change over time and so do themes the documents are about. Topic evolution is a relatively new research subject that encompasses the unsupervised discovery of thematic subjects in a document collection and the adaptation of these subjects as new documents arrive. While many powerful topic evolution methods exist, the combination of learning and visualization of the evolving topics has been less explored, although it is indispensable for understanding a dynamic document collection.

We propose Topic Table, a visualization technique that builds upon topic modeling for deriving a condensed representation of a document collection. Topic Table captures important and intuitively comprehensible aspects of a topic over time: the importance of the topic within the collection, the words characterizing this topic, the semantic changes of a topic from one timepoint to the next. As an example, we visualize content of the NIPS proceedings from 1987 to 1999.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. AlSumait, L., Barbara, D., Domeniconi, C.: On-line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking. In: ICDM (2008)

    Google Scholar 

  2. Blei, D., Lafferty, J.: Dynamic topic models. In: ICML (2006)

    Google Scholar 

  3. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  4. Boyd-Graber, J., Chang, J., Gerrish, S., Wang, C., Blei, D.: Reading tea leaves: How humans interpret topic models. In: Neural Information Processing Systems, NIPS (2009)

    Google Scholar 

  5. Chou, T.-C., Chen, M.C.: Using incremental PLSI for threshold-resilient online event analysis. IEEE Trans. on Knowl. and Data Eng. 20(3), 289–299 (2008)

    Article  Google Scholar 

  6. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological) 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  7. Ferlez, J., Faloutsos, C., Leskovec, J., Mladenic, D., Grobelnik, M.: Monitoring network evolution using MDL. In: Proceedings of IEEE Int. Conf. on Data Engineering (ICDE 2008). IEEE (2008)

    Google Scholar 

  8. Gohr, A., Hinneburg, A., Schult, R., Spiliopoulou, M.: Topic evolution in a stream of documents. In: SIAM Data Mining Conf. (SDM 2009), Reno, NV, pp. 378–385 (April-May 2009)

    Google Scholar 

  9. Gohr, A., Spiliopoulou, M., Hinneburg, A.: Visually summarizing the evolution of documents under a social tag. In: International Conf. on Knowledge Discovery and Information Retrieval (KDIR 2010), Valencia, Spain, pp. 85–94. SciTePress Digital Library (October 2010)

    Google Scholar 

  10. Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: Theory and practice. IEEE Trans. of Knowlende and Data Eng. 15(3), 515–528 (2003)

    Article  Google Scholar 

  11. Havre, S., Hetzler, E., Whitney, P., Nowell, L.: ThemeRiver: Visualizing thematic changes in large document collections. IEEE Trans. Visualization and Computer Graphics 8(1), 9–20 (2002)

    Article  Google Scholar 

  12. Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 42(1), 177–196 (2001)

    Article  MATH  Google Scholar 

  13. Ipeirotis, P., Ntoulas, A., Cho, J., Gravano, L.: Modeling and managing content changes in text databases. In: Proceedings of the IEEE Int. Conf. on Data Engineering, ICDE 2005 (2005)

    Google Scholar 

  14. Jin, W., Srihari, R.K., Ho, H.H., Wu, X.: Improving knowledge discovery in document collections through combining text retrieval and link analysis techniques. In: Proceedings of the 2007 Seventh IEEE International Conference on Data Mining, pp. 193–202. IEEE Computer Society, Washington, DC, USA (2007)

    Chapter  Google Scholar 

  15. Leskovec, J., Backstrom, L., Kleinberg, J.: Meme-tracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, pp. 497–506. ACM, New York (2009)

    Chapter  Google Scholar 

  16. Mei, Q., Shen, X., Zhai, C.: Automatic labeling of multinomial topic models. In: KDD, pp. 490–499 (2007)

    Google Scholar 

  17. Mei, Q., Zhai, C.: Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In: SIGKDD, pp. 198–207. ACM, New York (2005)

    Google Scholar 

  18. Newman, D., Baldwin, T., Cavedon, L., Huang, E., Karimi, S., Martinez, D., Scholer, F., Zobel, J.: Visualizing search results and document collections using topic maps. Web Semantics: Science, Services and Agents on the World Wide Web 8(2-3), 169–175 (2010); Bridging the Gap–Data Mining and Social Network Analysis for Integrating Semantic Web and Web 2.0; The Future of Knowledge Dissemination: The Elsevier Grand Challenge for the Life Sciences

    Google Scholar 

  19. Cleveland, W.S.: The Elements of Graphing Data. Hobart Press, Summit (1985/1994)

    Google Scholar 

  20. Wang, C., Blei, D., Heckerman, D.: Continuous Time Dynamic Topic Models. In: Proceedings of ICML (2008)

    Google Scholar 

  21. Wang, X., McCallum, A.: Topics over time: a non-markov continuous-time model of topical trends. In: SIGKDD, pp. 424–433. ACM (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gohr, A., Spiliopoulou, M., Hinneburg, A. (2013). Visually Summarizing Semantic Evolution in Document Streams with Topic Table. In: Fred, A., Dietz, J.L.G., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2010. Communications in Computer and Information Science, vol 272. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29764-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29764-9_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29763-2

  • Online ISBN: 978-3-642-29764-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics