Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams

Han, Jiawei; Chen, Yixin; Dong, Guozhu; Pei, Jian; Wah, Benjamin W.; Wang, Jianyong; Cai, Y. Dora

doi:10.1007/s10619-005-3296-1

Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams

Published: 20 September 2005

Volume 18, pages 173–197, (2005)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Jiawei Han¹,
Yixin Chen²,
Guozhu Dong³,
Jian Pei⁴,
Benjamin W. Wah¹,
Jianyong Wang⁵ &
…
Y. Dora Cai¹

703 Accesses
108 Citations
3 Altmetric
Explore all metrics

Abstract

Real-time surveillance systems, telecommunication systems, and other dynamic environments often generate tremendous (potentially infinite) volume of stream data: the volume is too huge to be scanned multiple times. Much of such data resides at rather low level of abstraction, whereas most analysts are interested in relatively high-level dynamic changes (such as trends and outliers). To discover such high-level characteristics, one may need to perform on-line multi-level, multi-dimensional analytical processing of stream data. In this paper, we propose an architecture, called stream_cube, to facilitate on-line, multi-dimensional, multi-level analysis of stream data.

For fast online multi-dimensional analysis of stream data, three important techniques are proposed for efficient and effective computation of stream cubes. First, a tilted time frame model is proposed as a multi-resolution model to register time-related data: the more recent data are registered at finer resolution, whereas the more distant data are registered at coarser resolution. This design reduces the overall storage of time-related data and adapts nicely to the data analysis tasks commonly encountered in practice. Second, instead of materializing cuboids at all levels, we propose to maintain a small number of critical layers. Flexible analysis can be efficiently performed based on the concept of observation layer and minimal interesting layer. Third, an efficient stream data cubing algorithm is developed which computes only the layers (cuboids) along a popular path and leaves the other cuboids for query-driven, on-line computation. Based on this design methodology, stream data cube can be constructed and maintained incrementally with a reasonable amount of memory, computation cost, and query response time. This is verified by our substantial performance study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimizing Data Stream Representation: An Extensive Survey on Stream Clustering Algorithms

Article 21 January 2019

Adaptive, Automatic Stream Mining

BBoxDB streams: scalable processing of multi-dimensional data streams

Article Open access 02 May 2022

References

S. Agarwal, R. Agrawal, P.M. Deshpande, A. Gupta, J.F. Naughton, R. Ramakrishnan, and S. Sarawagi, “On the computation of multidimensional aggregates,” in Proc. 1996 Int. Conf. Very Large Data Bases (VLDB'96), Bombay, India, Sept. 1996, pp. 506–521.
C. Aggarwal, J. Han, J. Wang, and P.S. Yu, “A framework for projected clustering of high dimensional data streams,” in Proc. 2004 Int. Conf. Very Large Data Bases (VLDB'04). Toronto, Canada, Aug. 2004, pp. 852–863.
C. Aggarwal, J. Han, J. Wang, and P.S. Yu, “On demand classification of data streams,” in Proc. 2004 Int. Conf. Knowledge Discovery and Data Mining (KDD'04), Seattle, WA, Aug. 2004, pp. 503–508.
C.C. Aggarwal, J. Han, J. Wang, and P.S. Yu, “A framework for clustering evolving data streams,” in Proc. 2003 Int. Conf. Very Large Data Bases (VLDB'03), Berlin, Germany, Sept. 2003.
R. Agrawal and R. Srikant, “Mining sequential patterns,” in Proc. 1995 Int. Conf. Data Engineering (ICDE'95), Taipei, Taiwan, March 1995, pp. 3–14.
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, “Models and issues in data stream systems,” in Proc. 2002 ACM Symp. Principles of Database Systems (PODS'02), Madison, WI, June 2002, pp. 1–16.
S. Babu and J. Widom, “Continuous queries over data streams,” SIGMOD Record, vol. 30, pp. 109–120, 2001.
Google Scholar
K. Beyer and R. Ramakrishnan, “Bottom-up computation of sparse and iceberg cubes,” in Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'99), Philadelphia, PA, June 1999, pp. 359–370.
S. Chaudhuri and U. Dayal, “An overview of data warehousing and OLAP technology,” SIGMOD Record, vol. 26, pp. 65–74, 1997.
Google Scholar
Y. Chen, G. Dong, J. Han, B.W. Wah, and J. Wang, “Multi-dimensional regression analysis of time-series data streams,” in Proc. 2002 Int. Conf. Very Large Data Bases (VLDB'02), Hong Kong, China, Aug. 2002, pp. 323–334.
G. Dong, J. Han, J. Lam, J. Pei, and K. Wang, “Mining multi-dimensional constrained gradients in data cubes,” in Proc. 2001 Int. Conf. on Very Large Data Bases (VLDB'01), Rome, Italy, Sept. 2001, pp. 321–330.
J. Gehrke, F. Korn, and D. Srivastava, “On computing correlated aggregates over continuous data streams,” in Proc. 2001 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'01), Santa Barbara, CA, May 2001, pp. 13–24.
C. Giannella, J. Han, J. Pei, X. Yan, and P.S. Yu, “Mining frequent patterns in data streams at multiple time granularities,” in H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha (eds), Data Mining: Next Generation Challenges and Future Directions. AAAI/MIT Press, 2004.
A.C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss, “Surfing wavelets on streams: One-pass summaries for approximate aggregate queries,” in Proc. 2001 Int. Conf. on Very Large Data Bases (VLDB'01), Rome, Italy, Sept. 2001, pp. 79–88.
J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh, “Data cube: A relational aggregation operator generalizing group-by, cross-tab and sub-totals,” Data Mining and Knowledge Discovery, vol. 1, pp. 29–54, 1997.
Article Google Scholar
M. Greenwald and S. Khanna, “Space-efficient online computation of quantile summaries,” in Proc. 2001 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'01), Santa Barbara, CA, May 2001, pp. 58–66.
S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan, “Clustering data streams,” in Proc. IEEE Symposium on Foundations of Computer Science (FOCS'00), Redondo Beach, CA, 2000, pp. 359–366.
J. Han, J. Pei, G. Dong, and K. Wang, “Efficient computation of iceberg cubes with complex measures,” in Proc. 2001 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'01), Santa Barbara, CA, May 2001, pp. 1–12.
V. Harinarayan, A. Rajaraman, and J.D. Ullman, “Implementing data cubes efficiently,” in Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'96), Montreal, Canada, June 1996, pp. 205–216.
G. Hulten, L. Spencer, and P. Domingos, “Mining time-changing data streams,” in Proc. 2001 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD'01), San Fransisco, CA, Aug. 2001.
T. Imielinski, L. Khachiyan, and A. Abdulghani, “Cubegrades: Generalizing association rules,” Data Mining and Knowledge Discovery, vol. 6, pp. 219–258, 2002.
Article MathSciNet Google Scholar
X. Li, J. Han, and H. Gonzalez, “High-dimensional OLAP: A minimal cubing approach,” in Proc. 2004 Int. Conf. Very Large Data Bases (VLDB'04), Toronto, Canada, Aug. 2004, pp. 528–539.
G. Manku and R. Motwani, “Approximate frequency counts over data streams,” in Proc. 2002 Int. Conf. Very Large Data Bases (VLDB'02), Hong Kong, China, Aug. 2002, pp. 346–357.
S. Sarawagi, R. Agrawal, and N. Megiddo, “Discovery-driven exploration of OLAP data cubes,” in Proc. Int. Conf. of Extending Database Technology (EDBT'98), Valencia, Spain, March 1998, pp. 168–182.
G. Sathe and S. Sarawagi, “Intelligent rollups in multidimensional OLAP data,” in Proc. 2001 Int. Conf. on Very Large Data Bases (VLDB'01), Rome, Italy, Sept. 2001, pp. 531–540.
Z. Shao, J. Han, and D. Xin, “MM-Cubing: Computing iceberg cubes by factorizing the lattice space,” in Proc. 2004 Int. Conf. on Scientific and Statistical Database Management (SSDBM'04), Santorini Island, Greece, June 2004, pp. 213–222.
H. Wang, W. Fan, P.S. Yu, and J. Han, “Mining concept-drifting data streams using ensemble classifiers,” in Proc. 2003 ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD'03), Washington, DC, Aug. 2003.
D. Xin, J. Han, X. Li, and B.W. Wah, “Star-cubing: Computing iceberg cubes by top-down and bottom-up integration,” in Proc. 2003 Int. Conf. Very Large Data Bases (VLDB'03), Berlin, Germany, Sept. 2003.
Y. Zhao, P.M. Deshpande, and J.F. Naughton, “An array-based algorithm for simultaneous multidimensional aggregates,” in Proc. 1997 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'97), Tucson, Arizona, May 1997, pp. 159–170.

Download references

Author information

Authors and Affiliations

University of Illinois, Illinois
Jiawei Han, Benjamin W. Wah & Y. Dora Cai
Washington University, St. Louis
Yixin Chen
Wright State University, USA
Guozhu Dong
Simon Fraser University, B. C., Canada
Jian Pei
Tsinghua University, Beijing, China
Jianyong Wang

Authors

Jiawei Han
View author publications
You can also search for this author in PubMed Google Scholar
Yixin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Guozhu Dong
View author publications
You can also search for this author in PubMed Google Scholar
Jian Pei
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin W. Wah
View author publications
You can also search for this author in PubMed Google Scholar
Jianyong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Y. Dora Cai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiawei Han.

Additional information

Recommended by: Ahmed Elmagarmid

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, J., Chen, Y., Dong, G. et al. Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams. Distrib Parallel Databases 18, 173–197 (2005). https://doi.org/10.1007/s10619-005-3296-1

Download citation

Published: 20 September 2005
Issue Date: September 2005
DOI: https://doi.org/10.1007/s10619-005-3296-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams

Abstract

Access this article

Similar content being viewed by others

Optimizing Data Stream Representation: An Extensive Survey on Stream Clustering Algorithms

Adaptive, Automatic Stream Mining

BBoxDB streams: scalable processing of multi-dimensional data streams

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams

Abstract

Access this article

Similar content being viewed by others

Optimizing Data Stream Representation: An Extensive Survey on Stream Clustering Algorithms

Adaptive, Automatic Stream Mining

BBoxDB streams: scalable processing of multi-dimensional data streams

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation