Skip to main content
Log in

Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Real-time surveillance systems, telecommunication systems, and other dynamic environments often generate tremendous (potentially infinite) volume of stream data: the volume is too huge to be scanned multiple times. Much of such data resides at rather low level of abstraction, whereas most analysts are interested in relatively high-level dynamic changes (such as trends and outliers). To discover such high-level characteristics, one may need to perform on-line multi-level, multi-dimensional analytical processing of stream data. In this paper, we propose an architecture, called stream_cube, to facilitate on-line, multi-dimensional, multi-level analysis of stream data.

For fast online multi-dimensional analysis of stream data, three important techniques are proposed for efficient and effective computation of stream cubes. First, a tilted time frame model is proposed as a multi-resolution model to register time-related data: the more recent data are registered at finer resolution, whereas the more distant data are registered at coarser resolution. This design reduces the overall storage of time-related data and adapts nicely to the data analysis tasks commonly encountered in practice. Second, instead of materializing cuboids at all levels, we propose to maintain a small number of critical layers. Flexible analysis can be efficiently performed based on the concept of observation layer and minimal interesting layer. Third, an efficient stream data cubing algorithm is developed which computes only the layers (cuboids) along a popular path and leaves the other cuboids for query-driven, on-line computation. Based on this design methodology, stream data cube can be constructed and maintained incrementally with a reasonable amount of memory, computation cost, and query response time. This is verified by our substantial performance study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. S. Agarwal, R. Agrawal, P.M. Deshpande, A. Gupta, J.F. Naughton, R. Ramakrishnan, and S. Sarawagi, “On the computation of multidimensional aggregates,” in Proc. 1996 Int. Conf. Very Large Data Bases (VLDB'96), Bombay, India, Sept. 1996, pp. 506–521.

  2. C. Aggarwal, J. Han, J. Wang, and P.S. Yu, “A framework for projected clustering of high dimensional data streams,” in Proc. 2004 Int. Conf. Very Large Data Bases (VLDB'04). Toronto, Canada, Aug. 2004, pp. 852–863.

  3. C. Aggarwal, J. Han, J. Wang, and P.S. Yu, “On demand classification of data streams,” in Proc. 2004 Int. Conf. Knowledge Discovery and Data Mining (KDD'04), Seattle, WA, Aug. 2004, pp. 503–508.

  4. C.C. Aggarwal, J. Han, J. Wang, and P.S. Yu, “A framework for clustering evolving data streams,” in Proc. 2003 Int. Conf. Very Large Data Bases (VLDB'03), Berlin, Germany, Sept. 2003.

  5. R. Agrawal and R. Srikant, “Mining sequential patterns,” in Proc. 1995 Int. Conf. Data Engineering (ICDE'95), Taipei, Taiwan, March 1995, pp. 3–14.

  6. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, “Models and issues in data stream systems,” in Proc. 2002 ACM Symp. Principles of Database Systems (PODS'02), Madison, WI, June 2002, pp. 1–16.

  7. S. Babu and J. Widom, “Continuous queries over data streams,” SIGMOD Record, vol. 30, pp. 109–120, 2001.

    Google Scholar 

  8. K. Beyer and R. Ramakrishnan, “Bottom-up computation of sparse and iceberg cubes,” in Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'99), Philadelphia, PA, June 1999, pp. 359–370.

  9. S. Chaudhuri and U. Dayal, “An overview of data warehousing and OLAP technology,” SIGMOD Record, vol. 26, pp. 65–74, 1997.

    Google Scholar 

  10. Y. Chen, G. Dong, J. Han, B.W. Wah, and J. Wang, “Multi-dimensional regression analysis of time-series data streams,” in Proc. 2002 Int. Conf. Very Large Data Bases (VLDB'02), Hong Kong, China, Aug. 2002, pp. 323–334.

  11. G. Dong, J. Han, J. Lam, J. Pei, and K. Wang, “Mining multi-dimensional constrained gradients in data cubes,” in Proc. 2001 Int. Conf. on Very Large Data Bases (VLDB'01), Rome, Italy, Sept. 2001, pp. 321–330.

  12. J. Gehrke, F. Korn, and D. Srivastava, “On computing correlated aggregates over continuous data streams,” in Proc. 2001 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'01), Santa Barbara, CA, May 2001, pp. 13–24.

  13. C. Giannella, J. Han, J. Pei, X. Yan, and P.S. Yu, “Mining frequent patterns in data streams at multiple time granularities,” in H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha (eds), Data Mining: Next Generation Challenges and Future Directions. AAAI/MIT Press, 2004.

  14. A.C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss, “Surfing wavelets on streams: One-pass summaries for approximate aggregate queries,” in Proc. 2001 Int. Conf. on Very Large Data Bases (VLDB'01), Rome, Italy, Sept. 2001, pp. 79–88.

  15. J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh, “Data cube: A relational aggregation operator generalizing group-by, cross-tab and sub-totals,” Data Mining and Knowledge Discovery, vol. 1, pp. 29–54, 1997.

    Article  Google Scholar 

  16. M. Greenwald and S. Khanna, “Space-efficient online computation of quantile summaries,” in Proc. 2001 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'01), Santa Barbara, CA, May 2001, pp. 58–66.

  17. S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan, “Clustering data streams,” in Proc. IEEE Symposium on Foundations of Computer Science (FOCS'00), Redondo Beach, CA, 2000, pp. 359–366.

  18. J. Han, J. Pei, G. Dong, and K. Wang, “Efficient computation of iceberg cubes with complex measures,” in Proc. 2001 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'01), Santa Barbara, CA, May 2001, pp. 1–12.

  19. V. Harinarayan, A. Rajaraman, and J.D. Ullman, “Implementing data cubes efficiently,” in Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'96), Montreal, Canada, June 1996, pp. 205–216.

  20. G. Hulten, L. Spencer, and P. Domingos, “Mining time-changing data streams,” in Proc. 2001 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'01), San Fransisco, CA, Aug. 2001.

  21. T. Imielinski, L. Khachiyan, and A. Abdulghani, “Cubegrades: Generalizing association rules,” Data Mining and Knowledge Discovery, vol. 6, pp. 219–258, 2002.

    Article  MathSciNet  Google Scholar 

  22. X. Li, J. Han, and H. Gonzalez, “High-dimensional OLAP: A minimal cubing approach,” in Proc. 2004 Int. Conf. Very Large Data Bases (VLDB'04), Toronto, Canada, Aug. 2004, pp. 528–539.

  23. G. Manku and R. Motwani, “Approximate frequency counts over data streams,” in Proc. 2002 Int. Conf. Very Large Data Bases (VLDB'02), Hong Kong, China, Aug. 2002, pp. 346–357.

  24. S. Sarawagi, R. Agrawal, and N. Megiddo, “Discovery-driven exploration of OLAP data cubes,” in Proc. Int. Conf. of Extending Database Technology (EDBT'98), Valencia, Spain, March 1998, pp. 168–182.

  25. G. Sathe and S. Sarawagi, “Intelligent rollups in multidimensional OLAP data,” in Proc. 2001 Int. Conf. on Very Large Data Bases (VLDB'01), Rome, Italy, Sept. 2001, pp. 531–540.

  26. Z. Shao, J. Han, and D. Xin, “MM-Cubing: Computing iceberg cubes by factorizing the lattice space,” in Proc. 2004 Int. Conf. on Scientific and Statistical Database Management (SSDBM'04), Santorini Island, Greece, June 2004, pp. 213–222.

  27. H. Wang, W. Fan, P.S. Yu, and J. Han, “Mining concept-drifting data streams using ensemble classifiers,” in Proc. 2003 ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD'03), Washington, DC, Aug. 2003.

  28. D. Xin, J. Han, X. Li, and B.W. Wah, “Star-cubing: Computing iceberg cubes by top-down and bottom-up integration,” in Proc. 2003 Int. Conf. Very Large Data Bases (VLDB'03), Berlin, Germany, Sept. 2003.

  29. Y. Zhao, P.M. Deshpande, and J.F. Naughton, “An array-based algorithm for simultaneous multidimensional aggregates,” in Proc. 1997 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'97), Tucson, Arizona, May 1997, pp. 159–170.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiawei Han.

Additional information

Recommended by: Ahmed Elmagarmid

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, J., Chen, Y., Dong, G. et al. Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams. Distrib Parallel Databases 18, 173–197 (2005). https://doi.org/10.1007/s10619-005-3296-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-005-3296-1

Keywords

Navigation