Advertisement

Temporal Dynamics of On-Line Information Streams

  • Jon KleinbergEmail author
Chapter
Part of the Data-Centric Systems and Applications book series (DCSA)

Abstract

A number of recent computing applications involve information arriving continuously over time in the form of a data stream, and this has led to new ways of thinking about traditional problems in a variety of areas. In some cases, the rate and overall volume of data in the stream may be so great that it cannot all be stored for processing, and this leads to new requirements for efficiency and scalability. In other cases, the quantities of information may still be manageable, but the data stream perspective takes what has generally been a static view of a problem and adds a strong temporal dimension to it. Our focus here is on some of the challenges that this latter issue raises in the settings of text mining, on-line information, and information retrieval.

Keywords

News Article News Story Topic Detection Information Stream Internet Archive 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    E. Adar, L. Zhang, L.A. Adamic, R.M. Lukose, Implicit structure and the dynamics of blogspace. Workshop on the weblogging ecosystem, at the international WWW conference (2004) Google Scholar
  2. 2.
    R. Agrawal, R. Srikant, Mining sequential patterns, in Proc. Intl. Conf. on Data Engineering (1995) Google Scholar
  3. 3.
    J. Aizen, D. Huttenlocher, J. Kleinberg, A. Novak, Traffic-based feedback on the web. Proc. Natl. Acad. Sci. 101(Suppl. 1), 5254–5260 (2004) CrossRefGoogle Scholar
  4. 4.
    J. Allan (ed.), Topic Detection and Tracking: Event Based Information Retrieval (Kluwer Academic, Norwell, 2002) zbMATHGoogle Scholar
  5. 5.
    J. Allan, J.G. Carbonell, G. Doddington, J. Yamron, Y. Yang, Topic detection and tracking pilot study: final report, in Proc. DARPA Broadcast News Transcription and Understanding Workshop (1998) Google Scholar
  6. 6.
    R. Allen, Timelines as information system interfaces, in Proc. International Symposium on Digital Libraries (1995) Google Scholar
  7. 7.
    D. Anick, D. Mitra, M. Sondhi, Stochastic theory of a data handling system with multiple sources. Bell Syst. Tech. J. 61 (1982) Google Scholar
  8. 8.
  9. 9.
    S. Ben-David, J. Gehrke, D. Kifer, Detecting change in data streams, in Proc. 30th Intl. Conference on Very Large Databases (VLDB) (2004) Google Scholar
  10. 10.
    M. Charikar, K. Chen, M. Farach-Colton, Finding frequent items in data streams, in Proc. Intl. Colloq. on Automata Languages and Programming (2002) Google Scholar
  11. 11.
  12. 12.
    F. Diaz, R. Jones, Using temporal profiles of queries for precision prediction, in Proc. SIGIR Intl. Conf. on Information Retrieval (2004) Google Scholar
  13. 13.
    A. Elwalid, D. Mitra, Effective bandwidth of general Markovian traffic sources and admission control of high speed networks. IEEE Trans. Netw. 1 (1993) Google Scholar
  14. 14.
    P. Felzenszwalb, D. Huttenlocher, J. Kleinberg, Fast algorithms for large-state-space HMMs with applications to web usage analysis, in Advances in Neural Information Processing Systems (NIPS), vol. 16 (2003) Google Scholar
  15. 15.
    E. Gabrilovich, S. Dumais, E. Horvitz, NewsJunkie: providing personalized newsfeeds via analysis of information novelty, in Proceedings of the Thirteenth International World Wide Web Conference (2004) Google Scholar
  16. 16.
  17. 17.
    D. Gruhl, R. Guha, D. Liben-Nowell, A. Tomkins, Information diffusion through blogspace, in Proc. International WWW Conference (2004) Google Scholar
  18. 18.
    D. Hand, H. Mannila, P. Smyth, Principles of Data Mining (MIT Press, Cambridge, 2001) Google Scholar
  19. 19.
    S. Havre, B. Hetzler, L. Nowell, ThemeRiver: visualizing theme changes over time, in Proc. IEEE Symposium on Information Visualization (2000) Google Scholar
  20. 20.
    D. Jensen, Personal communication (2002) Google Scholar
  21. 21.
    F.P. Kelly, Notes on effective bandwidths, in Stochastic Networks: Theory and Applications, ed. by F.P. Kelly, S. Zachary, I. Ziedins (Oxford University Press, London, 1996) Google Scholar
  22. 22.
    D. Kempe, J. Kleinberg, E. Tardos, Maximizing the spread of influence through a social network, in Proc. 9th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (2003) Google Scholar
  23. 23.
    J. Kleinberg, Bursty and hierarchical structure in streams, in Proc. 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (2002) Google Scholar
  24. 24.
    R. Kumar, J. Novak, P. Raghavan, A. Tomkins, On the bursty evolution of blogspace, in Proc. International WWW Conference (2003) Google Scholar
  25. 25.
    V. Lavrenko, M. Schmill, D. Lawrie, P. Ogilvie, D. Jensen, J. Allan, Mining of concurrent text and time series. KDD-2000 workshop on text mining (2000) Google Scholar
  26. 26.
    R. Liebscher, R. Belew, Lexical dynamics and conceptual change: analyses and implications for information retrieval. Cogn. Sci. (Online) 1 (2003) Google Scholar
  27. 27.
    K. Mane, K. Börner, Mapping topics and topic bursts in PNAS. Proc. Natl. Acad. Sci. 101(Suppl. 1), 5287–5290 (2004) CrossRefGoogle Scholar
  28. 28.
    H. Mannila, H. Toivonen, A.I. Verkamo, Discovering frequent episodes in sequences, in Proc. Intl. Conf. on Knowledge Discovery and Data Mining (1995) Google Scholar
  29. 29.
    N. Miller, P. Wong, M. Brewster, H. Foote, Topic islands: a wavelet-based text visualization system, in Proc. IEEE Visualization (1998) Google Scholar
  30. 30.
    R. Papka, On-line new event detection, clustering, and tracking. PhD thesis, Univ. Mass. Amherst (1999) Google Scholar
  31. 31.
    C. Plaisant, B. Milash, A. Rose, S. Widoff, B. Shneiderman, LifeLines: visualizing personal histories, in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (1996) Google Scholar
  32. 32.
    L. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77 (1989) Google Scholar
  33. 33.
    R. Swan, J. Allan, Extracting significant time-varying features from text, in Proc. 8th Intl. Conf. on Information Knowledge Management (1999) Google Scholar
  34. 34.
    R. Swan, J. Allan, Automatic generation of overview timelines, in Proc. SIGIR Intl. Conf. on Information Retrieval (2000) Google Scholar
  35. 35.
    R. Swan, D. Jensen, TimeMines: constructing timelines with statistical models of word usage. KDD-2000 Workshop on Text Mining (2000) Google Scholar
  36. 36.
    M. Vlachos, C. Meek, Z. Vagena, D. Gunopulos, Identifying similarities, periodicities and bursts for online search queries, in Proc. ACM SIGMOD International Conference on Management of Data (2004) Google Scholar
  37. 37.
    P. Wong, W. Cowley, H. Foote, E. Jurrus, J. Thomas, Visualizing sequential patterns for text mining, in Proc. IEEE Information Visualization (2000) Google Scholar
  38. 38.
    Y. Zhu, D. Shasha, Efficient elastic burst detection in data streams, in Proc. ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (2003) Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Department of Computer ScienceCornell UniversityIthacaUSA

Personalised recommendations