Advertisement

Redundant Array of Inexpensive Nodes for DWS

  • Jorge Vieira
  • Marco Vieira
  • Marco Costa
  • Henrique Madeira
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4947)

Abstract

The DWS (Data Warehouse Striping) technique is a round-robin data partitioning approach especially designed for distributed data warehousing environments. In DWS the fact tables are distributed by an arbitrary number of low-cost computers and the queries are executed in parallel by all the computers, guarantying a nearly optimal speed up and scale up. However, the use of a large number of inexpensive nodes increases the risk of having node failures that impair the computation of queries. This paper proposes an approach that provides Data Warehouse Striping with the capability of answering to queries even in the presence of node failures. This approach is based on the selective replication of data over the cluster nodes, which guarantees full availability when one or more nodes fail. The proposal was evaluated using the newly TPC-DS benchmark and the results show that the approach is quite effective.

Keywords

Data warehousing redundancy replication recovery availability 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agosta, L.: Data Warehousing Lessons Learned: SMP or MPP for Data Warehousing, DM Review Magazine (2002)Google Scholar
  2. 2.
    Bernardino, J., Madeira, H.: A New Technique to Speedup Queries in Data Warehousing. In: ABDIS-DASFA, Symp. on Advances in DB and Information Systems, Prague (2001)Google Scholar
  3. 3.
    Bernardino, J., Madeira, H.: Experimental Evaluation of a New Distributed Partitioning Technique for Data Warehouses. In: IDEAS 2001, Grenoble, France (2001)Google Scholar
  4. 4.
    Critical Software SA, DWS, http://www.criticalsoftware.com/
  5. 5.
    DATAllegro, DATAllegro v3™, http://www.datallegro.com/
  6. 6.
    ExtenDB, ExtenDB Parallel Server for Data Warehousing, http://www.extendb.com/
  7. 7.
    IDC, Survey-Based Segmentation of the Market by Data Warehouse Size and Number of Data Sources (2004)Google Scholar
  8. 8.
    Kimball, R., Ross, M.: The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, 2nd edn. J. Wiley & Sons, Inc, Chichester (2002)Google Scholar
  9. 9.
    Netezza, The Netezza Performance Server® Data Warehouse Appliance, http://www.netezza.com/
  10. 10.
    Sun Microsystems, Data Warehousing Performance with SMP and MPP Architectures, White Paper (1998) Google Scholar
  11. 11.
    Transaction Processing Performance Council, TPC BenchmarkTM DS (Decision Support) Standard Specification, Draft Version 32 (2007), available at: http://www.tpc.org/tpcds/
  12. 12.
    Lin, Y., et al.: Middleware based Data Replication providing Snapshot Isolation. In: ACM SIGMOD Int. Conf. on Management of Data, Baltimore, Maryland, USA (2005)Google Scholar
  13. 13.
    Patino-Martinez, M., Jimenez-Peris, R., Alonso, G.: Scalable Replication in Database Clusters. In: Herlihy, M.P. (ed.) DISC 2000. LNCS, vol. 1914, Springer, Heidelberg (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Jorge Vieira
    • 1
  • Marco Vieira
    • 2
  • Marco Costa
    • 1
  • Henrique Madeira
    • 2
  1. 1.Critical Software SACoimbraPortugal
  2. 2.CISUC, Department of Informatics EngineeringUniversity of CoimbraCoimbraPortugal

Personalised recommendations