Efficient and Coordinated Checkpointing for Reliable Distributed Data Stream Management

  • Gert Brettlecker
  • Heiko Schuldt
  • Hans-Jörg Schek
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4152)


Data Stream Management (DSM) addresses the continuous processing of sensor data. DSM requires the combination of stream operators, which may run on different distributed devices, into stream processes. Due to the recent advantages in sensor technologies and wireless communication, DSM is increasingly gaining importance in various application domains. Especially in healthcare, the continuous monitoring of patients at home (telemonitoring) can significantly benefit from DSM. A vital requirement in telemonitoring is however that DSM provides a high degree of reliability. In this paper, we present a novel approach to efficient and coordinated stream operator checkpointing supporting reliable DSM while maintaining the high result quality needed for healthcare applications. Furthermore, we present evaluation results of our checkpointing approach implemented within our process and data stream management infrastructure OSIRIS-SE. OSIRIS-SE supports flexible failure handling and efficient and coordinated checkpointing by means of consistent operator migration. This ensures complete and consistent continuous data stream processing even in the case of failures.


Data Stream Network Overhead Output Queue Operator Migration Stream Element 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Hwang, J.H., Balazinska, M., Rasin, A., Cetintemel, U., Stonebraker, M., Zdonik, S.: High Availability Algorithms for Distributed Stream Processing. In: Proc. of ICDE Conf. (2005)Google Scholar
  2. 2.
    Brettlecker, G., Schuldt, H., Schatz, R.: Hyperdatabases for Peer–to–Peer Data Stream Processing. In: Proc. of ICWS Conf., San Diego, CA, USA, pp. 358–366 (2004)Google Scholar
  3. 3.
    Brettlecker, G., Schuldt, H., Schek, H.-J.: Towards Reliable Data Stream Processing with OSIRIS-SE. In: Proc. of BTW Conf., Karlsruhe, Germany, pp. 405–414 (2005)Google Scholar
  4. 4.
    Schuler, C., Schuldt, H., Türker, C., Weber, R., Schek, H.-J.: Peer-to-Peer Execution of (Transactional) Processes. International Journal of Cooperative Information Systems (IJCIS) 14, 377–405 (2005)CrossRefGoogle Scholar
  5. 5.
    Schuler, C., Weber, R., Schuldt, H., Schek, H.-J.: Peer–to–Peer Process Execution with Osiris. In: Orlowska, M.E., Weerawarana, S., Papazoglou, M.P., Yang, J. (eds.) ICSOC 2003. LNCS, vol. 2910, pp. 483–498. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  6. 6.
    Weber, R., Schuler, C., Neukomm, P., Schuldt, H., Schek, H.J.: Web Service Composition with OGrape and OSIRIS. In: Proc. of VLDB Conf., Berlin, Germany (2003)Google Scholar
  7. 7.
    Schuler, C., Weber, R., Schuldt, H., Schek, H.J.: Scalable Peer–to–Peer Process Management – The OSIRIS Approach. In: Proc. of ICWS Conf., San Diego, CA, USA, pp. 26–34 (2004)Google Scholar
  8. 8.
    Bartlett, J.: A NonStop Kernel. In: Proc. of ACM Symposium on Operating Systems Principles, Asilomar, CA, USA, pp. 22–29 (1981)Google Scholar
  9. 9.
    Bartlett, J., Gray, J., Horst, B.: Fault Tolerance in Tandem Computer Systems. Technical Report TR 86.2, Tandem (1986)Google Scholar
  10. 10.
    Elnozahy, E., Alvisi, L., Wang, Y.M., Johnson, D.: A survey of rollback-recovery protocols in message-passing systems. ACM Comput. Surv. 34, 375–408 (2002)CrossRefGoogle Scholar
  11. 11.
    Balakrishnan, H., et al.: Retrospective on Aurora. VLDB Journal (2004)Google Scholar
  12. 12.
    Balazinska, M., Balakrishnan, H., Madden, S., Stonebraker, M.: Fault-Tolerance in the Borealis Distributed Stream Processing System. In: Proc. of ACM SIGMOD Conf., Baltimore, MD, USA, pp. 13–24 (2005)Google Scholar
  13. 13.
    Chandrasekaran, S., et al.: TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. In: Proc. of CIDR Conf., Asilomar, USA (2003)Google Scholar
  14. 14.
    Shah, M., Hellerstein, J., Chandrasekaran, S., Franklin, M.: Flux: An adaptive partitioning operator for continuous query systems. In: Proc. of ICDE Conf., Bangalore, India (2003)Google Scholar
  15. 15.
    Shah, M.A., Hellerstein, J.M., Brewer, E.: High Available, Fault-Tolerant, Parallel Dataflows. In: Proc. of ACM SIGMOD Conf., pp. 827–838 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Gert Brettlecker
    • 1
  • Heiko Schuldt
    • 1
  • Hans-Jörg Schek
    • 2
  1. 1.Department of Computer ScienceUniversity of BaselBaselSwitzerland
  2. 2.Department of Computer & Information ScienceUniversity of KonstanzKonstanzGermany

Personalised recommendations