Online and Offline Trend Cluster Discovery in Spatially Distributed Data Streams

Ciampi, Anna; Appice, Annalisa; Malerba, Donato

doi:10.1007/978-3-642-23599-3_8

Anna Ciampi²³,
Annalisa Appice²³ &
Donato Malerba²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6904))

Included in the following conference series:

1155 Accesses
2 Citations

Abstract

Emerging real life applications, such as environmental compliance, ecological studies and meteorology, are characterized by real-time data acquisition through remote sensor networks. The most important aspect of the sensor readings is that they comprise a space dimension and a time dimension which are both information bearing. Additionally, they usually arrive at a rapid rate in a continuous, unbounded stream. Streaming prevents us from storing all readings and performing multiple scans of the entire data set. The drift of data distribution poses the additional problem of mining patterns which may change over the time. We address these challenges for the trend cluster cluster discovery, that is, the discovery of clusters of spatially close sensors which transmit readings, whose temporal variation, called trend polyline, is similar along the time horizon of a window. We present a stream framework which segments the stream into equally-sized windows, computes online intra-window trend clusters and stores these trend clusters in a database. Trend clusters are queried offline at any time, to determine trend clusters along larger windows (i.e. windows of windows). Experiments with several streams demonstrate the effectiveness of the proposed framework in discovering accurate and relevant to human trend clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Aberer, K., Koubarakis, M., Kalogeraki, V. (eds.) VLDB 2003. LNCS, vol. 2944, pp. 81–92. Springer, Heidelberg (2004)
Google Scholar
Aggarwal, C.C., Yu, P.S.: A survey of synopsis construction in data streams. In: Data Streams: Models and Algorithms, vol. 31, pp. 170–207. Springer, Heidelberg (2007)
Chapter Google Scholar
Armenakis, C.: Estimation and organization of spatio-temporal data. In: Frank, A.U., Formentini, U., Campari, I. (eds.) GIS 1992. LNCS, vol. 639. Springer, Heidelberg (1992)
Google Scholar
Babcock, B., Datar, M., Motwani, R., O’Callaghan, L.: Maintaining variance and k-medians over data stream windows. In: PODS 2003, pp. 234–243. ACM, New York (2003)
Google Scholar
Bhaduri, K., Sivakumar, K.D.K., Kargupta, H., Wolff, R., Chen, R.: In: Aggarwal, C.C. (ed.) Data Streams. Advances in Database Systems, vol. 31, pp. 309–332. Springer, US (2007)
Chapter Google Scholar
Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Ghosh, J., Lambert, D., Skillicorn, D.B., Srivastava, J. (eds.) SIAM SDM 2006 (2006)
Google Scholar
Chang, W., Zeng, D., Chen, H.: A stack-based prospective spatio-temporal data analysis approach. Decis. Support Syst. 45(4), 697–713 (2008)
Article Google Scholar
Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: KDD 2007, pp. 133–142. ACM, New York (2007)
Google Scholar
Ciampi, A., Appice, A., Malerba, D.: Summarization for geographically distributed data streams. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds.) KES 2010. LNCS, vol. 6278, pp. 339–348. Springer, Heidelberg (2010)
Chapter Google Scholar
Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: a review. ACM SIGMOD Record 34(2), 18–26 (2005)
Article MATH Google Scholar
Gaffney, S., Smyth, P.: Trajectory clustering with mixtures of regression models. In: KDD 1999, pp. 63–72. ACM, New York (1999)
Google Scholar
Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams. In: FOCS, pp. 359–366 (2000)
Google Scholar
Hadjieleftheriou, M., Kollios, G., Gunopulos, D., Tsotras, V.J.: On-line discovery of dense areas in spatio-temporal databases. In: Hadzilacos, T., Manolopoulos, Y., Roddick, J., Theodoridis, Y. (eds.) SSTD 2003. LNCS, vol. 2750, pp. 306–324. Springer, Heidelberg (2003)
Chapter Google Scholar
Kalnis, P., Mamoulis, N., Bakiras, S.: On discovering moving clusters in spatio-temporal data. In: Anshelevich, E., Egenhofer, M.J., Hwang, J. (eds.) SSTD 2005. LNCS, vol. 3633, pp. 364–381. Springer, Heidelberg (2005)
Chapter Google Scholar
Lee, J.-G., Han, J., Whang, K.-Y.: Trajectory clustering: a partition-and-group framework. In: SIGMOD 2007, pp. 593–604. ACM, New York (2007)
Google Scholar
Li, Y., Han, J., Yang, J.: Clustering moving objects. In: KDD 2004, pp. 617–622. ACM, New York (2004)
Google Scholar
Munro, R., Chawla, S.: An integrated approach to mining data streams. In: Technical Report, University of Sydney. School of Information Technologies (2004)
Google Scholar
Nanni, M., Pedreschi, D.: Time-focused clustering of trajectories of moving objects. J. Intell. Inf. Syst. 27(3), 267–289 (2006)
Article Google Scholar
O’Callaghan, L., Meyerson, A., Motwani, R., Mishra, N., Guha, S.: Streaming-data algorithms for high-quality clustering. In: ICDE, p. 685. IEEE, Los Alamitos (2002)
Google Scholar
Shekhar, S., Chawla, S.: Spatial databases: A tour. Prentice Hall, Englewood Cliffs (2003)
Google Scholar
Tobler, W.: Cellular geography. Philosophy in Geography, 379–386 (1979)
Google Scholar
Vlachos, M., Gunopoulos, D., Kollios, G.: Discovering similar multidimensional trajectories. In: ICDE 2002, p. 673. IEEE Computer Society, Los Alamitos (2002)
Google Scholar
Wan, L., Ng, W., Dang, X., Yu, P., Zhang, K.: Density-based clustering of data streams at multiple resolutions. Trans. Knowl. Discov. Data 3(3), 1–28 (2009)
Article Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. SIGMOD Rec. 25(2), 103–114 (1996)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Informatica, Università degli Studi di Bari Aldo Moro, via Orabona, 4, 70126, Bari, Italy
Anna Ciampi, Annalisa Appice & Donato Malerba

Authors

Anna Ciampi
View author publications
You can also search for this author in PubMed Google Scholar
Annalisa Appice
View author publications
You can also search for this author in PubMed Google Scholar
Donato Malerba
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Knowledge and Date Engineering Group, University of Kassel, Wilhelmshöher Allee 73, 34121, Kassel, Germany
Martin Atzmueller
Data Mining and Information Retrieval Group, University of Würzburg, Am Hubland, 97074, Würzburg, Germany
Andreas Hotho
Faculty of Computer Science, Graz University of Technology, Inffeldgasse 21a, 8010, Graz, Austria
Markus Strohmaier
Mobile Social Networking Group, Nokia Research Center, 100176, Beijing, China
Alvin Chin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ciampi, A., Appice, A., Malerba, D. (2011). Online and Offline Trend Cluster Discovery in Spatially Distributed Data Streams. In: Atzmueller, M., Hotho, A., Strohmaier, M., Chin, A. (eds) Analysis of Social Media and Ubiquitous Data. MUSE MSM 2010 2010. Lecture Notes in Computer Science(), vol 6904. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23599-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-23599-3_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23598-6
Online ISBN: 978-3-642-23599-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics