Efficient OLAP Query Processing in Distributed Data Warehouses

  • Michael O. Akinde
  • Michael H. Böhlen
  • Theodore Johnson
  • Laks V.S. Lakshmanan
  • Divesh Srivastava
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2287)

Abstract

The success of Internet applications has led to an explosive growth in the demand for bandwidth from ISPs. Managing an IP network requires collecting and analyzing network data, such as flowlevel traffc statistics. Such analyses can typically be expressed as OLAP queries, e.g., correlated aggregate queries and data cubes. Current day OLAP tools for this task assume the availability of the data in a centralized data warehouse. However, the inherently distributed nature of data collection and the huge amount of data extracted at each collection point make it impractical to gather all data at a centralized site. One solution is to maintain a distributed data warehouse, consisting of local data warehouses at each collection point and a coordinator site, with most of the processing being performed at the local sites. In this paper, we consider the problem of efficient evaluation of OLAP queries over a distributed data warehouse. We have developed the Skalla system for this task. Skalla translates OLAP queries, specified as certain algebraic expressions, into distributed evaluation plans which are shipped to individual sites. Salient properties of our approach are that only partial results are shipped - never parts of the detail data. We propose a variety of optimizations to minimize both the synchronization traffic and the local processing done at each site. We finally present an experimental study based on TPC(R) data. Our results demonstrate the scalability of our techniques and quantify the performance benefits of the optimization techniques that have gone into the Skalla system.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    S. Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. In Proc. of the Int. Conf. on Very Large Databases, pages 506–521, 1996.Google Scholar
  2. 2.
    M. O. Akinde, and M. H. Böhlen. Generalized MD-joins: Evaluation and reduction to SQL. In Databases in Telecommunications II, pages 52–67, Sept. 2001.Google Scholar
  3. 3.
    D. Bitton, H. Boral, D. J. DeWitt, and W. K. Wilkinson. Parallel algorithms for the executions of relational database operations. ACM TODS 8(3):324–353, 1983.CrossRefGoogle Scholar
  4. 4.
    H. Boral, W. Alexander, L. Clay, G. Copeland, S. Danforth, M. Franklin, B. Hart, M. Smith, and P. Valduriez. Prototyping Bubba, a highly parallel database system. IEEE TKDE 2(1), March 1990Google Scholar
  5. 5.
    R. Cáceres, N. Duffield, A. Feldmann, J. Friedmann, A. Greenberg, R. Greer, T. Johnson, C. Kalmanek, B. Krishnamurthy, D. Lavelle, P. Mishra, K. K. Ramakrishnan, J. Rexford, F. True, and J. van der Merwe. Measurement and analysis of IP network usage and behavior. IEEE Communications Magazine, May 2000.Google Scholar
  6. 6.
    D. Chatziantoniou. Ad hoc OLAP: Expression and evaluation. In Proc. of the IEEE Int. Conf. on Data Engineering, 1999.Google Scholar
  7. 7.
    D. Chatziantoniou, M. O. Akinde, T. Johnson, and S. Kim. The MD-join: An operator for complex OLAP. In Proc. of the IEEE Int. Conf. on Data Engineering, 2001.Google Scholar
  8. 8.
    S. Chaudhuri and U. Dayal. An overview of data warehousing and OLAP technology. SIGMOD Record, 26(1):65–74, Mar. 1997.CrossRefGoogle Scholar
  9. 9.
    R. Elmasri and S. B. Navathe. Fundamentals of Database Systems. Benjamin/Cummings Publishers, second edition, 1994.Google Scholar
  10. 10.
    A. Feldmann, A. Greenberg, C. Lund, N. Reingold, J. Rexford, and F. True. Deriving traffic demands for operational IP networks: Methodology and experience. In Proc. of ACM SIGCOMM, 2000.Google Scholar
  11. 11.
    G. Graefe, U. Fayyad, and S. Chaudhuri. On the efficient gathering of sufficient statistics for classification from large SQL databases. In Proc. of Int. Conf. on Knowledge Discovery and Data Mining, pages 204–208, 1998.Google Scholar
  12. 12.
    J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Datacube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Mining and Knowledge Discovery, 1(1):29–53, 1997.CrossRefGoogle Scholar
  13. 13.
    R. Greer. Daytona and the fourth-generation language Cymbal. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 525–526, 1999.Google Scholar
  14. 14.
    R. Kimball. The data warehouse toolkit. John Wiley, 1996.Google Scholar
  15. 15.
    D. Kossman The state of the art in distributed query processing. ACM Computing Surveys, 32(4):422–469, 2000.CrossRefGoogle Scholar
  16. 16.
    M. T. Özsu and P. Valduriez. Principles of Distributed Database Systems. Prentice Hall, 1991.Google Scholar
  17. 17.
    K. A. Ross and D. Srivastava. Fast computation of sparse datacubes. In Proc. of the Int. Conf. on Very Large Databases, pages 116–125, 1997.Google Scholar
  18. 18.
    K. A. Ross, D. Srivastava, and D. Chatziantoniou. Complex aggregation at multiple granularities. In Proc. of the Int. Conf. on Extending Database Technology, pages 263–277, 1998.Google Scholar
  19. 19.
    A. Shatdal and J. F. Naughton. Adaptive parallel aggregation algorithms. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 104–114, 1995.Google Scholar
  20. 20.
    C. T. Yu, K. C. Guh, and A. L. P. Chen. An integrated algorithm for distributed query processing. In Proc. of the IFIP Conf. on Distributed Processing, 1987.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Michael O. Akinde
    • 1
  • Michael H. Böhlen
    • 1
  • Theodore Johnson
    • 2
  • Laks V.S. Lakshmanan
    • 3
  • Divesh Srivastava
    • 2
  1. 1.Aalborg UniversityAalborg
  2. 2.AT&T Labs-ResearchUSA
  3. 3.University of British ColumbiaBritish Columbia

Personalised recommendations