Abstract
A number of recent computing applications involve information arriving continuously over time in the form of a data stream, and this has led to new ways of thinking about traditional problems in a variety of areas. In some cases, the rate and overall volume of data in the stream may be so great that it cannot all be stored for processing, and this leads to new requirements for efficiency and scalability. In other cases, the quantities of information may still be manageable, but the data stream perspective takes what has generally been a static view of a problem and adds a strong temporal dimension to it. Our focus here is on some of the challenges that this latter issue raises in the settings of text mining, on-line information, and information retrieval.
This survey was written in 2004 and circulated on-line as a preprint prior to its appearance in this volume.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
E. Adar, L. Zhang, L.A. Adamic, R.M. Lukose, Implicit structure and the dynamics of blogspace. Workshop on the weblogging ecosystem, at the international WWW conference (2004)
R. Agrawal, R. Srikant, Mining sequential patterns, in Proc. Intl. Conf. on Data Engineering (1995)
J. Aizen, D. Huttenlocher, J. Kleinberg, A. Novak, Traffic-based feedback on the web. Proc. Natl. Acad. Sci. 101(Suppl. 1), 5254–5260 (2004)
J. Allan (ed.), Topic Detection and Tracking: Event Based Information Retrieval (Kluwer Academic, Norwell, 2002)
J. Allan, J.G. Carbonell, G. Doddington, J. Yamron, Y. Yang, Topic detection and tracking pilot study: final report, in Proc. DARPA Broadcast News Transcription and Understanding Workshop (1998)
R. Allen, Timelines as information system interfaces, in Proc. International Symposium on Digital Libraries (1995)
D. Anick, D. Mitra, M. Sondhi, Stochastic theory of a data handling system with multiple sources. Bell Syst. Tech. J. 61 (1982)
J. Ask, Top searches at http://static.wc.ask.com/docs/about/jeevesiq.html?o=0
S. Ben-David, J. Gehrke, D. Kifer, Detecting change in data streams, in Proc. 30th Intl. Conference on Very Large Databases (VLDB) (2004)
M. Charikar, K. Chen, M. Farach-Colton, Finding frequent items in data streams, in Proc. Intl. Colloq. on Automata Languages and Programming (2002)
Daypop. http://www.daypop.com
F. Diaz, R. Jones, Using temporal profiles of queries for precision prediction, in Proc. SIGIR Intl. Conf. on Information Retrieval (2004)
A. Elwalid, D. Mitra, Effective bandwidth of general Markovian traffic sources and admission control of high speed networks. IEEE Trans. Netw. 1 (1993)
P. Felzenszwalb, D. Huttenlocher, J. Kleinberg, Fast algorithms for large-state-space HMMs with applications to web usage analysis, in Advances in Neural Information Processing Systems (NIPS), vol. 16 (2003)
E. Gabrilovich, S. Dumais, E. Horvitz, NewsJunkie: providing personalized newsfeeds via analysis of information novelty, in Proceedings of the Thirteenth International World Wide Web Conference (2004)
Google. Zeitgeist at http://www.google.com/press/zeitgeist.html
D. Gruhl, R. Guha, D. Liben-Nowell, A. Tomkins, Information diffusion through blogspace, in Proc. International WWW Conference (2004)
D. Hand, H. Mannila, P. Smyth, Principles of Data Mining (MIT Press, Cambridge, 2001)
S. Havre, B. Hetzler, L. Nowell, ThemeRiver: visualizing theme changes over time, in Proc. IEEE Symposium on Information Visualization (2000)
D. Jensen, Personal communication (2002)
F.P. Kelly, Notes on effective bandwidths, in Stochastic Networks: Theory and Applications, ed. by F.P. Kelly, S. Zachary, I. Ziedins (Oxford University Press, London, 1996)
D. Kempe, J. Kleinberg, E. Tardos, Maximizing the spread of influence through a social network, in Proc. 9th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (2003)
J. Kleinberg, Bursty and hierarchical structure in streams, in Proc. 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (2002)
R. Kumar, J. Novak, P. Raghavan, A. Tomkins, On the bursty evolution of blogspace, in Proc. International WWW Conference (2003)
V. Lavrenko, M. Schmill, D. Lawrie, P. Ogilvie, D. Jensen, J. Allan, Mining of concurrent text and time series. KDD-2000 workshop on text mining (2000)
R. Liebscher, R. Belew, Lexical dynamics and conceptual change: analyses and implications for information retrieval. Cogn. Sci. (Online) 1 (2003)
K. Mane, K. Börner, Mapping topics and topic bursts in PNAS. Proc. Natl. Acad. Sci. 101(Suppl. 1), 5287–5290 (2004)
H. Mannila, H. Toivonen, A.I. Verkamo, Discovering frequent episodes in sequences, in Proc. Intl. Conf. on Knowledge Discovery and Data Mining (1995)
N. Miller, P. Wong, M. Brewster, H. Foote, Topic islands: a wavelet-based text visualization system, in Proc. IEEE Visualization (1998)
R. Papka, On-line new event detection, clustering, and tracking. PhD thesis, Univ. Mass. Amherst (1999)
C. Plaisant, B. Milash, A. Rose, S. Widoff, B. Shneiderman, LifeLines: visualizing personal histories, in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (1996)
L. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77 (1989)
R. Swan, J. Allan, Extracting significant time-varying features from text, in Proc. 8th Intl. Conf. on Information Knowledge Management (1999)
R. Swan, J. Allan, Automatic generation of overview timelines, in Proc. SIGIR Intl. Conf. on Information Retrieval (2000)
R. Swan, D. Jensen, TimeMines: constructing timelines with statistical models of word usage. KDD-2000 Workshop on Text Mining (2000)
M. Vlachos, C. Meek, Z. Vagena, D. Gunopulos, Identifying similarities, periodicities and bursts for online search queries, in Proc. ACM SIGMOD International Conference on Management of Data (2004)
P. Wong, W. Cowley, H. Foote, E. Jurrus, J. Thomas, Visualizing sequential patterns for text mining, in Proc. IEEE Information Visualization (2000)
Y. Zhu, D. Shasha, Efficient elastic burst detection in data streams, in Proc. ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Kleinberg, J. (2016). Temporal Dynamics of On-Line Information Streams. In: Garofalakis, M., Gehrke, J., Rastogi, R. (eds) Data Stream Management. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28608-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-28608-0_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28607-3
Online ISBN: 978-3-540-28608-0
eBook Packages: Computer ScienceComputer Science (R0)