Advertisement

Resumption of Data Extraction Process in Parallel Data Warehouses

  • Marcin Gorawski
  • Pawel Marks
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3911)

Abstract

ETL processes are sometimes interrupted by occurrence of a failure. In such a case, one of the interrupted extraction resumption algorithms is usually used. In this paper we present a modified Design-Resume (DR) algorithm enriched by the possibility of handling ETL processes containing many loading nodes. We use the DR algorithm to resume a parallel data warehouse load process. The key feature of this algorithm is that it does not impose additional overhead on the normal ETL process. In our work we modify the algorithm to work with more than one loading node, which increases the efficiency of the resumption process. Based on the results of performed tests, the benefits of our improvements are discussed.

Keywords

Data Warehouse Total Processing Time Fact Table Data Warehouse System Spatial Data Warehouse 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bruckner, R., List, B., Schiefer, J.: Striving Towards Near Real-Time Data Integration for Data Warehouses. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2002. LNCS, vol. 2454, Springer, Heidelberg (2002)CrossRefGoogle Scholar
  2. 2.
    Galhardas, H., Florescu, D., Shasha, D., Simon, E.: Ajax: An Extensible Data Cleaning-Tool. In: Proc. ACM SIGMOD Intl. Conf. On the Management of Data, Teksas (2000)Google Scholar
  3. 3.
    Gorawski, M., Malczok, R.: Distributed Spatial Data Warehouse Indexed with Virtual Memory Aggregation Tree. In: 5th Workshop on Spatial-Temporal DataBase Management (STDBM_VLDB 2004), Toronto, Canada (2004)Google Scholar
  4. 4.
    Gorawski, M., Piekarek, M.: Development Environment ETL/JavaBeans. Studia Informatica, 24, 4(56) (2003)Google Scholar
  5. 5.
    Gorawski, M., Wocaw, A.: Evaluation of the Efficiency of Design-Resume/JavaBeans Recovery Algorithm. Archives of Theoretical and Applied Informatics 15(1) (2003)Google Scholar
  6. 6.
    Labio, W., Wiener, J., Garcia-Molina, H., Gorelik, V.: Efficient resumption of interrupted warehouse loads. In: SIGMOD Conference (2000)Google Scholar
  7. 7.
    Labio, W., Wiener, J., Garcia-Molina, H., Gorelik, V.: Resumption algorithms. Technical report, Stanford University (1998)Google Scholar
  8. 8.
    Oracle. WarehouseBuilder10gOracle, Available at: http://otn.oracle.com/products/warehouse/index.html
  9. 9.
    Sagent Technologies Inc.: Personal correspondence with customersGoogle Scholar
  10. 10.
    Vassiliadis, P., Simitsis, A., Skiadopoulos, S.: Modeling ETL Activities asGraphs. In: Proc. 4th Intl. Workshop on Design and Management of Data Warehouses, Canada (2002)Google Scholar
  11. 11.
    Vassiliadis, P., Simitsis, A., Georgantas, P., Terrovitis, M.: A Framework for the Design of ETL Scenarios. In: Eder, J., Missikoff, M. (eds.) CAiSE 2003. LNCS, vol. 2681, Springer, Heidelberg (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Marcin Gorawski
    • 1
  • Pawel Marks
    • 1
  1. 1.Institute of Computer ScienceSilesian University of TechnologyGliwicePoland

Personalised recommendations