Skip to main content
Log in

UpStream: storage-centric load management for streaming applications with update semantics

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

This paper addresses the problem of minimizing the staleness of query results for streaming applications with update semantics under overload conditions. Staleness is a measure of how out-of-date the results are compared with the latest data arriving on the input. Real-time streaming applications are subject to overload due to unpredictably increasing data rates, while in many of them, we observe that data streams and queries in fact exhibit “update semantics” (i.e., the latest input data are all that really matters when producing a query result). Under such semantics, overload will cause staleness to build up. The key to avoid this is to exploit the update semantics of applications as early as possible in the processing pipeline. In this paper, we propose UpStream, a storage-centric framework for load management over streaming applications with update semantics. We first describe how we model streams and queries that possess the update semantics, providing definitions for correctness and staleness for the query results. Then, we show how staleness can be minimized based on intelligent update key scheduling techniques applied at the queue level, while preserving the correctness of the results, even for complex queries that involve sliding windows. UpStream is based on the simple idea of applying the updates in place, yet with great returns in terms of lowering staleness and memory consumption, as we also experimentally verify on the Borealis system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abadi, D., Ahmad, Y., Balazinska, M., Çetintemel, U., Cherniack, M., Hwang, J., Lindner, W., Maskey, A., Rasin, A., Ryvkina, E., Tatbul, N., Xing, Y., Zdonik, S.: The design of the Borealis stream processing engine. In: CIDR Conference, Asilomar, CA (2005)

  2. Adelberg, B., Garcia-Molina, H., Kao, B.: Applying update streams in a soft real-time database system. In: ACM SIGMOD Conference. San Jose, CA (1995)

  3. Adelberg, B., Kao, B., Garcia-Molina, H.: Database support for efficiently maintaining derived data. In: EDBT Conference. Avignon, France (1996)

  4. Alonso R., Barbara D., Garcia-Molina H.: Data caching issues in an information retrieval system. ACM Trans. Database Syst. 15(3), 359–384 (1990)

    Article  Google Scholar 

  5. Babcock, B., Datar, M., Motwani, R.: Load shedding for aggregation queries over data streams. In: IEEE ICDE Conference. Boston, MA (2004)

  6. Balazinska, M., Balakrishnan, H., Stonebraker, M.: Contract-based load management in federated distributed systems. In: NSDI Conference. San Fransisco, CA (2004)

  7. Bateni, M.H., Golab, L., Hajiaghayi, M.T., Karloff, H.: Scheduling to Minimize Staleness and Stretch in Real-Time Data Warehouses. In: ACM SPAA Conference. Calgary, Canada (2009)

  8. Botan, I., Alonso, G., Fischer, P.M., Kossmann, D., Tatbul, N.: Flexible and scalable storage management for data-intensive stream processing. In: EDBT Conference. Saint Petersburg, Russia (2009)

  9. Carney, D., Çetintemel, U., Rasin, A., Zdonik, S.B., Cherniack, M., Stonebraker, M.: Operator scheduling in a data stream manager. In: VLDB Conference. Berlin, Germany (2003)

  10. Cho, J., Garcia-Molina, H.: Synchronizing a database to improve freshness. In: ACM SIGMOD Conference. Dallas, TX (2000)

  11. Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Surfing wavelets on streams: one-pass summaries for approximate aggregate queries. In: VLDB Conference. Rome, Italy (2001)

  12. Golab, L., Johnson, T., Seidel, J.S., Shkapenyuk, V.: Stream warehousing with DataDepot. In: ACM SIGMOD Conference. Providence, RI (2009a)

  13. Golab, L., Johnson, T., Shkapenyuk, V.: Scheduling updates in a real-time stream warehouse. In: IEEE ICDE Conference. Shanghai, China (2009b)

  14. Gray, J., Sundaresan, P., Englert, S., Baclawski, K., Weinberger, P.J.: Quickly generating billion-record synthetic databases. In: ACM SIGMOD Conference. Minneapolis, MN (1994)

  15. Kao, B., yiu Lam, K., Adelberg, B., Cheng, R., Lee, T.S.H.: Updates and view maintenance in soft real-time database systems. In: CIKM Conference. Kansas City, MO (1999)

  16. Labrinidis A., Roussopoulos N.: Exploring the tradeoff between performance and data freshness in database-driven web servers. VLDB J. 13(3), 240-255 (2004)

    Article  Google Scholar 

  17. Maskey, A., Cherniack, M.: Replay-based approaches to revision processing in stream query engines. In: SSPS Workshop. Nantes, France (2008)

  18. Moga, A., Botan, I., Tatbul, N.: UpStream: storage-centric load management for data streams with update semantics. Tech. Rep. Technical Report TR-620, ETH Zurich Department of Computer Science (2009) ftp://ftp.inf.ethz.ch/pub/publications/tech-reports/6xx/620.pdf

  19. Muthukrishnan, S.: Data streams: Algorithms and applications. Foundations and Trends in Theoretical Computer Science 1(2) (2005)

  20. NYSE (2006) NYSE Data Solutions. http://www.nyxdata.com/nysedata/

  21. Olston, C., Widom, J.: Best-Effort Cache Synchronization with Source Cooperation. In: ACM SIGMOD Conference. Madison, WI (2002)

  22. Pietzuch, P., Ledlie, J., Shneidman, J., Roussopoulos, M., Welsh, M., Seltzer, M.: Network-aware operator placement for stream- processing systems. In: IEEE ICDE Conference. Atlanta, GA (2006)

  23. Qu, H., Labrinidis, A.: Preference-aware query and update scheduling in web-databases. In: IEEE ICDE Conference. Istanbul, Turkey (2007)

  24. Qu, H., Labrinidis, A., Mosse, D.: UNIT: user-centric transaction management in web-database systems. In: IEEE ICDE Conference. Atlanta, GA (2006)

  25. Reiss, F., Hellerstein, J.M.: Data triage: an adaptive architecture for load shedding in TelegraphCQ. In: IEEE ICDE Conference. Tokyo, Japan (2005)

  26. Sharaf, M.A., Labrinidis, A., Chrysanthis, P.K., Pruhs, K.: Freshness-aware scheduling of continuous queries in the dynamic web. In: WebDB Workshop. Baltimore, MD (2005)

  27. Tatbul, N., Zdonik, S.: Window-aware load shedding for aggregation queries over data streams. In: VLDB Conference. Seoul, Korea (2006)

  28. Tatbul, N., Çetintemel, U., Zdonik, S., Cherniack, M., Stonebraker, M.: Load shedding in a data stream manager. In: VLDB Conference. Berlin, Germany (2003)

  29. Tatbul, N., Çetintemel, U., Zdonik, S.: Staying FIT: efficient load shedding techniques for distributed stream processing. In: VLDB Conference. Vienna, Austria (2007)

  30. Tu, Y., Liu, S., Prabhakar, S., Yao, B.: Load shedding in stream databases: a control-based approach. In: VLDB Conference. Seoul, Korea (2006)

  31. Xing, Y., Zdonik, S., Hwang, J.H.: Dynamic load distribution in the Borealis stream processor. In: IEEE ICDE Conference. Tokyo, Japan (2005)

  32. Xing, Y., Hwang, J.H., Çetintemel, U., Zdonik, S.: Providing resiliency to load variations in distributed stream processing. In: VLDB Conference. Seoul, Korea (2006)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexandru Moga.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Moga, A., Botan, I. & Tatbul, N. UpStream: storage-centric load management for streaming applications with update semantics. The VLDB Journal 20, 867–892 (2011). https://doi.org/10.1007/s00778-011-0229-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-011-0229-7

Keywords

Navigation