Abstract
Emerging real life applications, such as environmental compliance, ecological studies and meteorology, are characterized by real-time data acquisition through remote sensor networks. The most important aspect of the sensor readings is that they comprise a space dimension and a time dimension which are both information bearing. Additionally, they usually arrive at a rapid rate in a continuous, unbounded stream. Streaming prevents us from storing all readings and performing multiple scans of the entire data set. The drift of data distribution poses the additional problem of mining patterns which may change over the time. We address these challenges for the trend cluster cluster discovery, that is, the discovery of clusters of spatially close sensors which transmit readings, whose temporal variation, called trend polyline, is similar along the time horizon of a window. We present a stream framework which segments the stream into equally-sized windows, computes online intra-window trend clusters and stores these trend clusters in a database. Trend clusters are queried offline at any time, to determine trend clusters along larger windows (i.e. windows of windows). Experiments with several streams demonstrate the effectiveness of the proposed framework in discovering accurate and relevant to human trend clusters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Aberer, K., Koubarakis, M., Kalogeraki, V. (eds.) VLDB 2003. LNCS, vol. 2944, pp. 81–92. Springer, Heidelberg (2004)
Aggarwal, C.C., Yu, P.S.: A survey of synopsis construction in data streams. In: Data Streams: Models and Algorithms, vol. 31, pp. 170–207. Springer, Heidelberg (2007)
Armenakis, C.: Estimation and organization of spatio-temporal data. In: Frank, A.U., Formentini, U., Campari, I. (eds.) GIS 1992. LNCS, vol. 639. Springer, Heidelberg (1992)
Babcock, B., Datar, M., Motwani, R., O’Callaghan, L.: Maintaining variance and k-medians over data stream windows. In: PODS 2003, pp. 234–243. ACM, New York (2003)
Bhaduri, K., Sivakumar, K.D.K., Kargupta, H., Wolff, R., Chen, R.: In: Aggarwal, C.C. (ed.) Data Streams. Advances in Database Systems, vol. 31, pp. 309–332. Springer, US (2007)
Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Ghosh, J., Lambert, D., Skillicorn, D.B., Srivastava, J. (eds.) SIAM SDM 2006 (2006)
Chang, W., Zeng, D., Chen, H.: A stack-based prospective spatio-temporal data analysis approach. Decis. Support Syst. 45(4), 697–713 (2008)
Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: KDD 2007, pp. 133–142. ACM, New York (2007)
Ciampi, A., Appice, A., Malerba, D.: Summarization for geographically distributed data streams. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds.) KES 2010. LNCS, vol. 6278, pp. 339–348. Springer, Heidelberg (2010)
Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: a review. ACM SIGMOD Record 34(2), 18–26 (2005)
Gaffney, S., Smyth, P.: Trajectory clustering with mixtures of regression models. In: KDD 1999, pp. 63–72. ACM, New York (1999)
Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams. In: FOCS, pp. 359–366 (2000)
Hadjieleftheriou, M., Kollios, G., Gunopulos, D., Tsotras, V.J.: On-line discovery of dense areas in spatio-temporal databases. In: Hadzilacos, T., Manolopoulos, Y., Roddick, J., Theodoridis, Y. (eds.) SSTD 2003. LNCS, vol. 2750, pp. 306–324. Springer, Heidelberg (2003)
Kalnis, P., Mamoulis, N., Bakiras, S.: On discovering moving clusters in spatio-temporal data. In: Anshelevich, E., Egenhofer, M.J., Hwang, J. (eds.) SSTD 2005. LNCS, vol. 3633, pp. 364–381. Springer, Heidelberg (2005)
Lee, J.-G., Han, J., Whang, K.-Y.: Trajectory clustering: a partition-and-group framework. In: SIGMOD 2007, pp. 593–604. ACM, New York (2007)
Li, Y., Han, J., Yang, J.: Clustering moving objects. In: KDD 2004, pp. 617–622. ACM, New York (2004)
Munro, R., Chawla, S.: An integrated approach to mining data streams. In: Technical Report, University of Sydney. School of Information Technologies (2004)
Nanni, M., Pedreschi, D.: Time-focused clustering of trajectories of moving objects. J. Intell. Inf. Syst. 27(3), 267–289 (2006)
O’Callaghan, L., Meyerson, A., Motwani, R., Mishra, N., Guha, S.: Streaming-data algorithms for high-quality clustering. In: ICDE, p. 685. IEEE, Los Alamitos (2002)
Shekhar, S., Chawla, S.: Spatial databases: A tour. Prentice Hall, Englewood Cliffs (2003)
Tobler, W.: Cellular geography. Philosophy in Geography, 379–386 (1979)
Vlachos, M., Gunopoulos, D., Kollios, G.: Discovering similar multidimensional trajectories. In: ICDE 2002, p. 673. IEEE Computer Society, Los Alamitos (2002)
Wan, L., Ng, W., Dang, X., Yu, P., Zhang, K.: Density-based clustering of data streams at multiple resolutions. Trans. Knowl. Discov. Data 3(3), 1–28 (2009)
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. SIGMOD Rec. 25(2), 103–114 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ciampi, A., Appice, A., Malerba, D. (2011). Online and Offline Trend Cluster Discovery in Spatially Distributed Data Streams. In: Atzmueller, M., Hotho, A., Strohmaier, M., Chin, A. (eds) Analysis of Social Media and Ubiquitous Data. MUSE MSM 2010 2010. Lecture Notes in Computer Science(), vol 6904. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23599-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-23599-3_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23598-6
Online ISBN: 978-3-642-23599-3
eBook Packages: Computer ScienceComputer Science (R0)