Skip to main content

Real-Time Snapshot Maintenance with Incremental ETL Pipelines in Data Warehouses

  • Conference paper
  • First Online:
Big Data Analytics and Knowledge Discovery (DaWaK 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9263))

Included in the following conference series:

Abstract

Multi-version concurrency control method has nowadays been widely used in data warehouses to provide OLAP queries and ETL maintenance flows with concurrent access. A snapshot is taken on existing warehouse tables to answer a certain query independently of concurrent updates. In this work, we extend this snapshot with the deltas which reside at the source side of ETL flows. Before answering a query, relevant tables are first refreshed with the exact source deltas which are captured at the time this query arrives (so-called query-driven policy). Snapshot maintenance is done by an incremental recomputation pipeline which is flushed by a set of consecutive deltas belonging to a sequence of incoming queries. A workload scheduler is thereby used to achieve a serializable schedule of concurrent maintenance tasks and OLAP queries. Performance has been examined by using read-/update-heavy workloads.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    ETL transformation operations are called steps in Kettle.

References

  1. Vassiliadis, P., Simitsis, A.: Near real time ETL. In: Kozielski, S., Wrembel, R. (eds.) New Trends in Data Warehousing and Data Analysis, pp. 1–31. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  2. Karakasidis, A., Vassiliadis, P., Pitoura, E.: ETL queues for active data warehousing. In: Proceedings of the 2nd International Workshop on Information Quality in Information Systems, pp. 28–39. ACM (2005)

    Google Scholar 

  3. Behrend, A., Jörg, T.: Optimized incremental ETL jobs for maintaining data warehouses. In: Proceedings of the Fourteenth International Database Engineering and Applications Symposium, pp. 216–224. ACM (2010)

    Google Scholar 

  4. Thomsen, C., Pedersen, T.B., Lehner, W.: RiTE: Providing on-demand data for right-time data warehousing. In: ICDE, pp. 456–465 (2008)

    Google Scholar 

  5. Zhuge, Y., Garcia-Molina, H., Hammer, J., Widom, J.: View maintenance in a warehousing environment. ACM SIGMOD Rec. 24(2), 316–327 (1995)

    Article  Google Scholar 

  6. Golab, L., Johnson, T.: Consistency in a stream warehouse. In: CIDR, vol. 11, pp. 114–122 (2011)

    Google Scholar 

  7. Golab, L., Johnson, T., Shkapenyuk, V.: Scheduling updates in a real-time stream warehouse. In: ICDE, pp. 1207–1210 (2009)

    Google Scholar 

  8. Kemper, A., Neumann, T.: HyPer: a hybrid OLTP and OLAP main memory database system based on virtual memory snapshots. In: ICDE, pp. 195–206 (2011)

    Google Scholar 

  9. Kimball, R., Caserta, J.: The Data Warehouse ETL Toolkit. Wiley, Indianapolis (2004)

    Google Scholar 

  10. Casters, M., Bouman, R., Van Dongen, J.: Pentaho Kettle Solutions: Building Open Source ETL Solutions with Pentaho Data Integration. Wiley, Indianapolis (2010)

    Google Scholar 

  11. http://www.tpc.org/tpcds/

  12. http://www.postgresql.org/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weiping Qu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Qu, W., Basavaraj, V., Shankar, S., Dessloch, S. (2015). Real-Time Snapshot Maintenance with Incremental ETL Pipelines in Data Warehouses. In: Madria, S., Hara, T. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2015. Lecture Notes in Computer Science(), vol 9263. Springer, Cham. https://doi.org/10.1007/978-3-319-22729-0_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22729-0_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22728-3

  • Online ISBN: 978-3-319-22729-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics