Skip to main content
Log in

Scheduling to Minimize Staleness and Stretch in Real-Time Data Warehouses

  • Published:
Theory of Computing Systems Aims and scope Submit manuscript

Abstract

We study scheduling algorithms for loading data feeds into real time data warehouses, which are used in applications such as IP network monitoring, online financial trading, and credit card fraud detection. In these applications, the warehouse collects a large number of streaming data feeds that are generated by external sources and arrive asynchronously. Data for each table in the warehouse are generated at a constant rate, different tables possibly at different rates. For each data feed, the arrival of new data triggers an update that seeks to append the new data to the corresponding table; if multiple updates are pending for the same table, they are batched together before being loaded. At time τ, if a table has been updated with information up to time rτ, its staleness is defined as τr.

Our first objective is to schedule the updates on one or more processors in a way that minimizes the total staleness. In order to ensure fairness, our second objective is to limit the maximum “stretch”, which we define (roughly) as the ratio between the duration of time an update waits till it is finished being processed, and the length of the update.

In contrast to earlier work proving the nonexistence of constant-competitive algorithms for related scheduling problems, we prove that any online nonpreemptive algorithm, no processor of which is ever voluntarily idle, incurs a staleness at most a constant factor larger than an obvious lower bound on total staleness (provided that the processors are sufficiently fast). We give a constant-stretch algorithm, provided that the processors are sufficiently fast, for the quasiperiodic model, in which tables can be clustered into a few groups such that the update frequencies within each group vary by at most a constant factor. Finally, we show that our constant-stretch algorithm is also constant-competitive (subject to the same proviso on processor speed) in the quasiperiodic model with respect to total weighted staleness, where tables are assigned weights that reflect their priorities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Adelberg, B., Garcia-Molina, H., Kao, B.: Applying update streams in a soft real time database system. In: SIGMOD, pp. 245–256 (1995)

    Google Scholar 

  2. Babcock, B., Babu, S., Datar, M., Motwani, R.: Chain: operator scheduling for memory minimization in data stream systems. In: SIGMOD, pp. 253–264 (2003)

    Google Scholar 

  3. Bansal, N., Pruhs, K.: Server scheduling in the Lp norm: a rising tide lifts all boats. In: STOC, pp. 242–250 (2003)

    Google Scholar 

  4. Bender, M., Chakrabarti, S., Muthukrishnan, S.: Flow and stretch metrics for scheduling continuous job streams. In: SODA, pp. 270–279 (1998)

    Google Scholar 

  5. Carney, D., Cetintemel, U., Rasin, A., Zdonik, S., Cherniack, M., Stonebraker, M.: Operator scheduling in a data stream manager. In: VLDB, pp. 838–849 (2003)

    Chapter  Google Scholar 

  6. Cho, J., Garcia-Molina, H.: Synchronizing a database to improve freshness. In: SIGMOD, pp. 117–128 (2000)

    Chapter  Google Scholar 

  7. Golab, L., Johnson, T., Seidel, J.S., Shkapenyuk, V.: Stream warehousing with DataDepot. In: SIGMOD, pp. 847–854 (2009)

    Chapter  Google Scholar 

  8. Golab, L., Johnson, T., Shkapenyuk, V.: Scheduling updates in a real time stream warehouse. In: ICDE, pp. 1207–1210 (2009)

    Google Scholar 

  9. Guo, H., Larson, P.A., Ramakrishnan, R., Goldstein, J.: Relaxed currency and consistency: how to say “good enough” in SQL. In: SIGMOD, pp. 815–826 (2004)

    Chapter  Google Scholar 

  10. Hammad, M., Franklin, M., Aref, W., Elmagarmid, A.: Scheduling for shared window joins over data streams. In: VLDB, pp. 297–308 (2003)

    Chapter  Google Scholar 

  11. Labio, W., Yerneni, R., Garcia-Molina, H.: Shrinking the warehouse update window. In: SIGMOD, pp. 383–394 (1999)

    Google Scholar 

  12. Labrinidis, A., Roussopoulos, N.: Update propagation strategies for improving the quality of data on the web. In: VLDB, pp. 391–400 (2001)

    Google Scholar 

  13. Polyzotis, N., Skiadopoulos, S., Vassiliadis, P., Simitsis, A., Frantzell, N.-E.: Supporting streaming updates in an active data warehouse. In: ICDE, pp. 476–485 (2007)

    Google Scholar 

  14. Sharaf, M., Chrysanthis, P., Labrinidis, A., Pruhs, K.: Algorithms and metrics for processing multiple heterogeneous continuous queries. Trans. On Database Sys. 33(1) (2008)

  15. Srinivasan, R., Liang, C., Ramamritham, K.: Maintaining temporal coherency of virtual data warehouses. In: RTSS, pp. 60–70 (1998)

    Google Scholar 

  16. Xiong, M., Stankovic, J., Ramamritham, K., Towsley, D., Sivasankaran, R.: Maintaining temporal consistency: issues and algorithms. In: RTDB, pp. 1–6 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lukasz Golab.

Additional information

Work done while M. Bateni was visiting AT&T Labs–Research.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bateni, M., Golab, L., Hajiaghayi, M. et al. Scheduling to Minimize Staleness and Stretch in Real-Time Data Warehouses. Theory Comput Syst 49, 757–780 (2011). https://doi.org/10.1007/s00224-011-9347-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00224-011-9347-2

Keywords

Navigation