Parallel Data Cube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation

Jin, Ruoming; Yang, Ge; Agrawal, Gagan

doi:10.1007/978-3-540-24596-4_9

Parallel Data Cube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation

Ruoming Jin⁶,
Ge Yang⁶ &
Gagan Agrawal⁶

Conference paper

419 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2913))

Abstract

Data cube construction is a commonly used operation in data warehouses. Because of the volume of data that is stored and analyzed in a data warehouse and the amount of computation involved in data cube construction, it is natural to consider parallel machines for this operation. This paper presents two new algorithms for parallel data cube construction, along with their theoretical analysis and experimental evaluation. Our work is based upon a new data-structure, called the aggregation tree, which results in minimally bounded memory requirements. An aggregation tree is parameterized by the ordering of dimensions. We prove that the same ordering of the dimensions minimizes both the computational and communication requirements, for both the algorithms. We also describe a method for partitioning the initial array, which again minimizes the communication volume for both the algorithms. Experimental results further validate the theoretical results.

This work was supported by NSF grant ACR-9982087, NSF CAREER award ACR-9733520, and NSF grant ACR-0130437.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. McGraw-Hill, New York (1990)
MATH Google Scholar
Dehne, F., Eavis, T., Hambrusch, S., Rau-Chaplin, A.: Parallelizing the data cube. Distributed and Parallel Databases: An International Journal (Special Issue on Parallel and Distributed Data Mining) (2002) (to appear)
Google Scholar
Goil, S., Choudhary, A.: High performance OLAP and data mining on parallel computers. Technical Report CPDC-TR-97-05, Center for Parallel and Distributed Computing, Northwestern University (December 1997)
Google Scholar
Goil, S., Choudhary, A.: PARSIMONY: An infrastructure for parallel multidimensional analysis and data mining. Journal of Parallel and Distributed Computing 61(3), 285–321 (2001)
Article MATH Google Scholar
Gray, J., Bosworth, A., Layman, A., Pirahesh, H.: Data Cube: A Relational Aggregational Operator for Generalizing Group-Bys, Cross-Tabs, and Sub-totals. Technical Report MSRTR- 95-22, Microsoft Research (1995)
Google Scholar
Agrawal, S., Agrawal, R., Desphpande, P.M., Gupta, A., Naughton, J.F., Ramakrishnan, R., Sarawagi, S.: On the computation of multidimensional aggregates. In: Proc 1996 Int. Conf. Very Large Data Bases, Bombay, India, September 1996, pp. 506–521 (1996)
Google Scholar
Tam, Y.J.: Datacube: Its implementation and application in olap mining. Master’s thesis, Simon Fraser University (September 1998)
Google Scholar
Yang, G., Jin, R., Agrawal, G.: Implementing data cube construction using a cluster middleware: Algorithms, implementation experience and performance evaluation. In: The 2nd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2002), Berlin, Germany (May 2002)
Google Scholar
Zhao, Y., Deshpande, P.M., Naughton, J.F.: An array based algorithm for simultaneous multidimensional aggregates. In: Prceedings of the ACM SIGMOD International Conference on Management of Data, June 1997, pp. 159–170. ACM Press, New York (1997)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Sciences, Ohio State University, Columbus, OH, 43210, USA
Ruoming Jin, Ge Yang & Gagan Agrawal

Authors

Ruoming Jin
View author publications
You can also search for this author in PubMed Google Scholar
Ge Yang
View author publications
You can also search for this author in PubMed Google Scholar
Gagan Agrawal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Southern California, CA 90089-2562, Los Angeles
Timothy Mark Pinkston
Department of Electrical Engineering, University of Southern California, CA 90089-2562, Los Angeles, USA
Viktor K. Prasanna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jin, R., Yang, G., Agrawal, G. (2003). Parallel Data Cube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation. In: Pinkston, T.M., Prasanna, V.K. (eds) High Performance Computing - HiPC 2003. HiPC 2003. Lecture Notes in Computer Science, vol 2913. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24596-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-540-24596-4_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20626-2
Online ISBN: 978-3-540-24596-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics