Skip to main content

Parallel Data Cube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2913))

Abstract

Data cube construction is a commonly used operation in data warehouses. Because of the volume of data that is stored and analyzed in a data warehouse and the amount of computation involved in data cube construction, it is natural to consider parallel machines for this operation. This paper presents two new algorithms for parallel data cube construction, along with their theoretical analysis and experimental evaluation. Our work is based upon a new data-structure, called the aggregation tree, which results in minimally bounded memory requirements. An aggregation tree is parameterized by the ordering of dimensions. We prove that the same ordering of the dimensions minimizes both the computational and communication requirements, for both the algorithms. We also describe a method for partitioning the initial array, which again minimizes the communication volume for both the algorithms. Experimental results further validate the theoretical results.

This work was supported by NSF grant ACR-9982087, NSF CAREER award ACR-9733520, and NSF grant ACR-0130437.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. McGraw-Hill, New York (1990)

    MATH  Google Scholar 

  2. Dehne, F., Eavis, T., Hambrusch, S., Rau-Chaplin, A.: Parallelizing the data cube. Distributed and Parallel Databases: An International Journal (Special Issue on Parallel and Distributed Data Mining) (2002) (to appear)

    Google Scholar 

  3. Goil, S., Choudhary, A.: High performance OLAP and data mining on parallel computers. Technical Report CPDC-TR-97-05, Center for Parallel and Distributed Computing, Northwestern University (December 1997)

    Google Scholar 

  4. Goil, S., Choudhary, A.: PARSIMONY: An infrastructure for parallel multidimensional analysis and data mining. Journal of Parallel and Distributed Computing 61(3), 285–321 (2001)

    Article  MATH  Google Scholar 

  5. Gray, J., Bosworth, A., Layman, A., Pirahesh, H.: Data Cube: A Relational Aggregational Operator for Generalizing Group-Bys, Cross-Tabs, and Sub-totals. Technical Report MSRTR- 95-22, Microsoft Research (1995)

    Google Scholar 

  6. Agrawal, S., Agrawal, R., Desphpande, P.M., Gupta, A., Naughton, J.F., Ramakrishnan, R., Sarawagi, S.: On the computation of multidimensional aggregates. In: Proc 1996 Int. Conf. Very Large Data Bases, Bombay, India, September 1996, pp. 506–521 (1996)

    Google Scholar 

  7. Tam, Y.J.: Datacube: Its implementation and application in olap mining. Master’s thesis, Simon Fraser University (September 1998)

    Google Scholar 

  8. Yang, G., Jin, R., Agrawal, G.: Implementing data cube construction using a cluster middleware: Algorithms, implementation experience and performance evaluation. In: The 2nd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2002), Berlin, Germany (May 2002)

    Google Scholar 

  9. Zhao, Y., Deshpande, P.M., Naughton, J.F.: An array based algorithm for simultaneous multidimensional aggregates. In: Prceedings of the ACM SIGMOD International Conference on Management of Data, June 1997, pp. 159–170. ACM Press, New York (1997)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jin, R., Yang, G., Agrawal, G. (2003). Parallel Data Cube Construction: Algorithms, Theoretical Analysis, and Experimental Evaluation. In: Pinkston, T.M., Prasanna, V.K. (eds) High Performance Computing - HiPC 2003. HiPC 2003. Lecture Notes in Computer Science, vol 2913. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24596-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24596-4_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20626-2

  • Online ISBN: 978-3-540-24596-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics