Evaluation of Load Scheduling Strategies for Real-Time Data Warehouse Environments

  • Maik Thiele
  • Wolfgang Lehner
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 41)

Abstract

The demand for so-called living or real-time data warehouses is increasing in many application areas, including manufacturing, event monitoring and telecommunications. In fields like these, users normally expect short response times for their queries and high freshness for the requested data. However, it is truly challenging to meet both requirements at the same time because of the continuous flow of write-only updates and read-only queries as well as the latency caused by arbitrarily complex ETL processes. To optimize the update flow in terms of data freshness maximization and load minimization, we propose two algorithms — local and global scheduling — that operate on the basis of different system information. We want to discuss the benefits and drawbacks of both approaches in detail and derive recommendations regarding the optimal scheduling strategy for any given system setup and workload.

Keywords

Real-Time Data Warehouse ETL Scheduling 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Thiele, M., Fischer, U., Lehner, W.: Partition-based workload scheduling in living data warehouse environments. Information Systems 34, 382–399 (2009)CrossRefGoogle Scholar
  2. 2.
    Thiele, M., Bader, A., Lehner, W.: Multi-objective scheduling for real-time data warehouses. In: Proceedings der 12. GI-Fachtagung für Datenbanksysteme in Business, Technology und Web, GI, pp. 307–326 (2009)Google Scholar
  3. 3.
    Krompass, S., Kuno, H., Wiener, J.L., Wilkinson, K., Dayal, U., Kemper, A.: Managing long-running queries. In: EDBT ’09: Proceedings of the 12th International Conference on Extending Database Technology, pp. 132–143. ACM, New York (2009)CrossRefGoogle Scholar
  4. 4.
    Gupta, C., Mehta, A., Wang, S., Dayal, U.: Fair, effective, efficient and differentiated scheduling in an enterprise data warehouse. In: EDBT ’09: Proceedings of the 12th International Conference on Extending Database Technology, pp. 696–707. ACM Press, New York (2009)CrossRefGoogle Scholar
  5. 5.
    Thiele, M., Fischer, U., Lehner, W.: Partition-based workload scheduling in living data warehouse environments. In: DOLAP, pp. 57–64. ACM Press, New York (2007)Google Scholar
  6. 6.
    Leung, J., Kelly, L., Anderson, J.H.: Handbook of Scheduling: Algorithms, Models, and Performance Analysis. CRC Press, Inc., Boca Raton (2004)Google Scholar
  7. 7.
    Kang, K.D.: Managing deadline miss ratio and sensor data freshness in real-time databases. TKDE 16(10), 1200–1216 (2004); Senior Member-Sang H. Son and Fellow-John A. StankovicGoogle Scholar
  8. 8.
    Kang, K.D., Son, S.H., Stankovic, J.A., Abdelzaher, T.F.: A qos-sensitive approach for timeliness and freshness guarantees in real-time databases. In: ECRTS, pp. 203–212 (2002)Google Scholar
  9. 9.
    Haritsa, J.R., Carey, M.J., Livny, M.: Value-based scheduling in real-time database systems. The VLDB Journal 2(2), 117–152 (1993)CrossRefGoogle Scholar
  10. 10.
    Hong, D., Johnson, T., Chakravarthy, S.: Real-time transaction scheduling: A cost conscious approach. In: Buneman, P., Jajodia, S. (eds.) SIGMOD, pp. 197–206. ACM Press, New York (1993)Google Scholar
  11. 11.
    Simitsis, A., Wilkinson, K., Castellanos, M., Dayal, U.: Qox-driven etl design: Reducing the cost of etl consulting engagements. In: Appears in SIGMOD ’09: International Conference on Management of Data, ACM, New York (2009)Google Scholar
  12. 12.
    Zhou, Y., Chen, Z., Li, K.: Second-level buffer cache management. IEEE Trans. Parallel Distrib. Syst. 15(6), 505–519 (2004)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Gill, B.S.: On multi-level exclusive caching: offline optimality and why promotions are better than demotions. In: FAST’08: Proceedings of the 6th USENIX Conference on File and Storage Technologies, Berkeley, CA, USA, pp. 1–17. USENIX Association (2008)Google Scholar
  14. 14.
    Chen, Z., Zhang, Y., Zhou, Y., Scott, H., Schiefer, B.: Empirical evaluation of multi-level buffer cache collaboration for storage systems. In: SIGMETRICS ’05: Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pp. 145–156. ACM Press, New York (2005)CrossRefGoogle Scholar
  15. 15.
    Li, X., Aboulnaga, A., Salem, K., Sachedina, A., Gao, S.: Second-tier cache management using write hints. In: FAST’05: Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies, p. 9. USENIX Association, Berkeley (2005)Google Scholar
  16. 16.
    Wong, T.M., Wilkes, J.: My cache or yours? making storage more exclusive. In: ATEC ’02: Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference, pp. 161–175. USENIX Association, Berkeley (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Maik Thiele
    • 1
  • Wolfgang Lehner
    • 1
  1. 1.Faculty of Computer Science, Database Technology GroupDresden University of TechnologyDresden

Personalised recommendations