Advertisement

The Journal of Supercomputing

, Volume 61, Issue 3, pp 619–641 | Cite as

Dispatching stream operators in parallel execution of continuous queries

  • Ali A. Safaei
  • Mostafa S. Haghjoo
Article

Abstract

Data stream is a continuous, rapid, time-varying sequence of data elements which should be processed in an online manner. These matters are under research in Data Stream Management Systems (DSMSs). Single processor DSMSs cannot satisfy data stream applications’ requirements properly. Main shortcomings are tuple latency, tuple loss, and throughput. In our previous publications, we introduced parallel execution of continuous queries to overcome these problems via performance improvement, especially in terms of tuple latency. We scheduled operators in an event-driven manner which caused system performance reduction in periods between consecutive scheduling instances.

In this paper, a continuous scheduling method (dispatching) is presented to be more compatible with the continuous nature of data streams as well as queries to improve system adaptivity and performance. In a multiprocessing environment, the dispatching method forces processing nodes (logical machines) to send partially-processed tuples to next machines with minimum workload to execute the next operator on them. So, operator scheduling is done continuously and dynamically for each tuple processed by each operator. The dispatching method is described, formally presented, and its correctness is proved. Also, it is modeled in PetriNets and is evaluated via simulation. Results show that the dispatching method significantly improves system performance in terms of tuple latency, throughput, and tuple loss. Furthermore, the fluctuation of system performance parameters (against variation of system and stream characteristics) diminishes considerably and leads to high adaptivity with the underlying system.

Keywords

Operator scheduling Continuous queries Data stream Query plan Dispatching 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Babcock B, et al (2002) Models and issues in data stream systems. In: Proc of PODS, June 2002 (Invited paper) Google Scholar
  2. 2.
    The STREAM Group (2003) STREAM: the Stanford stream data manager. IEEE Data Eng Bull 26:19–26 Google Scholar
  3. 3.
    Abadi D, et al (2003) Aurora: a new model and architecture for data stream management. VLDB J 2(12):120–139 Google Scholar
  4. 4.
    Chandrasekaran S et al (2003) TelegraphCQ: continuous dataflow processing. In: ACM SIGMOD Google Scholar
  5. 5.
    Golab L, Ozsu MT (2003) Issues in data stream management. SIGMOD Rec 32(2):5–14 CrossRefGoogle Scholar
  6. 6.
    Sharaf MA (2005) Preemptive rate-based operator scheduling in a data stream management system. In: IEEE/AICCSA Google Scholar
  7. 7.
    Safaei AA, Haghjoo MS (2010) Parallel processing of data stream query operators. J Distrib Parallel Databases, 2(28):93–118. doi: 10.1007/s10619-010-7066-3 CrossRefGoogle Scholar
  8. 8.
    Babcock B, et al (2004) Operator scheduling in data stream systems. VLDB J 13(4):333–353 CrossRefGoogle Scholar
  9. 9.
    Soliman MS, Tan G (2008) Operator-scheduling using dynamic chain for continuous-query processing. In: IEEE int conference on computer science and software engineering Google Scholar
  10. 10.
    Sharaf MA, et al (2008) Scheduling continuous queries in data stream management systems. In: PVLDB Google Scholar
  11. 11.
    Graefe G, et al (1994) Extensible query optimization and parallel execution in volcano. In: Query processing for advanced database systems. Kaufmann, San Mateo Google Scholar
  12. 12.
    DeWitt DJ, Gray J (1992) Parallel database systems: the future of high performance database processing. Commun ACM 35(6):85–98 CrossRefGoogle Scholar
  13. 13.
    Chakravarthy S, Pajjuri V (2006) Scheduling strategies and their evaluation in a data stream management system. Lecture notes in computer science, vol 4042. Springer, Berlin Google Scholar
  14. 14.
    Ghalambor M, Safaeei AA, Azgomi MA (2009) DSMS scheduling regarding complex QoS metrics. In: IEEE/ACS international conference on computer systems and applications (AICCSA), 10–13 May 2009 Google Scholar
  15. 15.
    Abdollahi Azgomi M, Movaghar A (2004) Coloured stochastic activity networks: definitions and behavior. In: Proc. 20th annual UK performance engineering workshop (UKPEW’04), Bradford, UK, pp 297–308 Google Scholar
  16. 16.
    Khalili A, Jalaly Bidgoly A, Abdollahi Azgomi M (2009) In: PDETool: a multi-formalism modeling tool for discrete-event systems based on SDES description, June 2009. Lecture notes in computer science, vol 5606, pp 343–352 Google Scholar
  17. 17.
    Carney D, et al (2003) Operator scheduling in a data stream manager. In: Proceedings of the 29th international conference on very large data bases, Germany, pp 838–849 Google Scholar
  18. 18.
    Zhou Y, et al (2008) Toward massive query optimization in large-scale distributed stream systems. Lecture notes in computer science, vol 5346, pp 326–345 Google Scholar
  19. 19.
    Babcock B, et al (2003) Chain: operator scheduling for memory minimization in data stream systems. In: Proceedings of the ACM SIGMOD international conference Google Scholar
  20. 20.
    Widom BS (2002) Exploiting k-constraints to reduce memory overhead in continuous queries over data streams. Technical report, November 2002 Google Scholar
  21. 21.
    Osman A, Ammar H (2005) Dynamic load management for distributed continuous query systems. In: Proceedings of the ICDE Google Scholar
  22. 22.
    Zhou Y, et al (2005) An adaptable distributed query processing architecture. Data Knowl Eng 53:283–309 CrossRefGoogle Scholar
  23. 23.
    Shah MA, et al (2003) Flux: an adaptive partitioning operator for continuous query systems. In: Proceedings of the ICDE Google Scholar
  24. 24.
    Avnur R, Hellerstein JM (2000) Eddies: continuously adaptive query processing. In: Proceedings of the ACM SIGMOD Google Scholar
  25. 25.
    Graefe G (1994) Volcano—an extensible and parallel query evaluation system. IEEE Trans Knowl Data Eng 6(1):120–135 CrossRefGoogle Scholar
  26. 26.
    Apers PMG, et al (1992) Prisma/db: a parallel, main memory relational DBMS. IEEE Trans Knowl Data Eng 4(6):541–554 CrossRefGoogle Scholar
  27. 27.
    Johnson T, et al (2008) Query-aware partitioning for monitoring massive network data streams. In: Proceedings of the ACM SIGMOD Google Scholar
  28. 28.
    Kramer J (2006) Dynamic plan migration for snapshot-equivalent continuous queries in data stream systems. In: ICSWN06 Google Scholar
  29. 29.
    Zhu Y, et al (2004) Dynamic plan migration for continuous queries over data streams. In: Proceedings of the ACM SIGMOD Google Scholar
  30. 30.
    Safaei AA, et al (2009) Using finite state machines in processing continuous queries. Int. Rev Comput Softw 4(5):551–556 Google Scholar
  31. 31.
    Tian F, DeWitt DJ (2003) Tuple routing strategies for distributed eddies. In: Proceedings of 29th VLDB conference, September 2003. ISBN: 0-12-722442-4 Google Scholar
  32. 32.
    Graefe G (1993) Query evaluation techniques for large databases. ACM Comput Surv 25(2):73–170 CrossRefGoogle Scholar
  33. 33.
    Deshpande A (2004) An initial study of overheads of eddies. SIGMOD Rec 33(1):44–49 CrossRefGoogle Scholar
  34. 34.
    Tian F, DeWitt DJ (2003) Tuple routing strategies for distributed eddies. In: Proceedings of the 29th VLDB conference, pp 333–344 CrossRefGoogle Scholar
  35. 35.
    Nehme RV, et al (2009) Query mesh: multiroute query processing technology. In: Proceedings of the VLDB Endowment, v.2, n.2, August 2009 Google Scholar
  36. 36.
    Nehme R, et al (2009) Self-tuning query mesh for adaptive multi-route query processing. In: Proceedings of the 12th international conference on extending database technology: advances in database technology (EDBT’09) Google Scholar
  37. 37.
    Zhou Y, et al (2006) Efficient dynamic operator placement in a locally distributed continuous query system. Lecture notes in computer science, vol 4275, pp 54–71 Google Scholar
  38. 38.
    Alemi M (2010) Implementation of a real-time DSMS. MSc thesis, Iran University of Science and Technology Google Scholar
  39. 39.
    Woodcock J, Davies J (1996) Using Z: specification, refinement, and proof. Prentice-Hall international series in computer science. Prentice-Hall, New York. ISBN: 0-13-948472-8 zbMATHGoogle Scholar
  40. 40.
    Stallings W (2009) Operating systems: internals and design principles, 6th edn. Prentice-Hall, New York. ISBN-10: 0-13-600632-9 Google Scholar
  41. 41.
    Brakerski Z, Dreizin V, Pattshamir B (2003) Dispatching in perfectly-periodic schedules. J Algorithms 49(2):219–239 MathSciNetzbMATHCrossRefGoogle Scholar
  42. 42.
    Deitel H (1990) An introduction to operating systems, 2nd edn. Addison-Wesley, Reading Google Scholar
  43. 43.
    Babu S (2005) Adaptive query processing in data stream management systems. PhD thesis, Stanford University Google Scholar
  44. 44.
    Babu S, Motwani R, Munagala K, Nishizawa I, Widom J (2004) Adaptive ordering of pipelined stream filters. In: Proc SIGMOD conference, pp 407–418 Google Scholar
  45. 45.
    Das  A, Gehrke J, Riedewald M (2003) Approximate join processing over data streams. In: Proc SIGMOD conference, pp 40–51 Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.School of Computer EngineeringIran University of Science and TechnologyTehranIran

Personalised recommendations