Skip to main content

Efficient Data Distribution for DWS

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5182))

Abstract

The DWS (Data Warehouse Striping) technique is a data partitioning approach especially designed for distributed data warehousing environments. In DWS the fact tables are distributed by an arbitrary number of low-cost computers and the queries are executed in parallel by all the computers, guarantying a nearly optimal speed up and scale up. Data loading in data warehouses is typically a heavy process that gets even more complex when considering distributed environments. Data partitioning brings the need for new loading algorithms that conciliate a balanced distribution of data among nodes with an efficient data allocation (vital to achieve low and uniform response times and, consequently, high performance during the execution of queries). This paper evaluates several alternative algorithms and proposes a generic approach for the evaluation of data distribution algorithms in the context of DWS. The experimental results show that the effective loading of the nodes in a DWS system must consider complementary effects, minimizing the number of distinct keys of any large dimension in the fact tables in each node, as well as splitting correlated rows among the nodes.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agosta, L.: Data Warehousing Lessons Learned: SMP or MPP for Data Warehousing. DM Review Magazine (2002)

    Google Scholar 

  2. Almeida, R., Vieira, M.: Selected TPC-DS queries and execution times, http://eden.dei.uc.pt/~mvieira/

  3. Bernardino, J., Madeira, H.: A New Technique to Speedup Queries in Data Warehousing. In: Symp. on Advances in DB and Information Systems, Prague (2001)

    Google Scholar 

  4. Bernardino, J., Madeira, H.: Experimental Evaluation of a New Distributed Partitioning Technique for Data Warehouses. In: International Symp. on Database Engineering and Applications, IDEAS 2001, Grenoble, France (2001)

    Google Scholar 

  5. Jenkins, B.: “Hash Functions”, “Algorithm Alley”. Dr. Dobb’s Journal (September 1997)

    Google Scholar 

  6. Critical Software SA, “DWS”, www.criticalsoftware.com

  7. DATAllegro, “DATAllegro v3”, www.datallegro.com

  8. ExtenDB, ExtenDB Parallel Server for Data Warehousing, http://www.extendb.com

  9. Kimball, R., Ross, M.: The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd edn. J. Wiley & Sons, Inc., Chichester (2002)

    Google Scholar 

  10. Netezza: The Netezza Performance Server DW Appliance, http://www.netezza.com

  11. Sun Microsystems, Data Warehousing Performance with SMP and MPP Architectures, White Paper (1998)

    Google Scholar 

  12. Transaction Processing Performance Council, TPC BenchmarkTM DS (Decision Support) Standard Specification, Draft Version 32 (2007), http://www.tpc.org/tpcds

Download references

Author information

Authors and Affiliations

Authors

Editor information

Il-Yeol Song Johann Eder Tho Manh Nguyen

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Almeida, R., Vieira, J., Vieira, M., Madeira, H., Bernardino, J. (2008). Efficient Data Distribution for DWS. In: Song, IY., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2008. Lecture Notes in Computer Science, vol 5182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85836-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85836-2_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85835-5

  • Online ISBN: 978-3-540-85836-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics