Advertisement

Distributed and Parallel Databases

, Volume 28, Issue 2–3, pp 93–118 | Cite as

Parallel processing of continuous queries over data streams

  • Ali A. Safaei
  • Mostafa S. Haghjoo
Article

Abstract

In this paper, we propose parallel processing of continuous queries over data streams to handle the bottleneck of single processor DSMSs. Queries are executed in parallel over the logical machines in a multiprocessing environment. Scheduling parallel execution of operators is performed via finding the shortest path in a weighted graph called Query Mega Graph (QMG), which is a logical view of K machines. By lapse of time, number of tuples waiting in queues of different operators may be very different. When a queue becomes full, re-scheduling is done by updating weight of edges of QMG. In the new computed path, machines with more workload will be used less. The proposed system is formally presented and its correctness is proved. It is also modeled in PetriNets and its performance is evaluated and compared with serial query processing as well as the Min-Latency scheduling algorithm. The presented system is shown to outperform them w.r.t. tuple latency (response time), memory usage, throughput and also tuple loss- critical parameters in any data stream management systems.

Keywords

Query plan Data stream Parallel execution Tuple latency 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Babcock, B., et al.: Models and issues in data stream systems. In: Proceeding of PODS, Invited paper (2002) Google Scholar
  2. 2.
    The STREAM Group.: STREAM: The Stanford stream data manager. IEEE Data Engineering Bulletin, March (2003) Google Scholar
  3. 3.
    Abadi, D., et al.: Aurora: A new model and architecture for data stream management. VLDB J. 2(12), 120–139 (2003) Google Scholar
  4. 4.
    Babcock, B.: Models and issues in data stream systems. In: Proceeding of PODS (2002) Google Scholar
  5. 5.
    Golab, L., Ozsu, M.T.: Issues in data stream management. SIGMOD Records (2003) Google Scholar
  6. 6.
    Tatbul, N., et al.: Load shedding in a data stream manager. In: Proceedings of VLDB’03, Germany, pp. 309–320 (2003) Google Scholar
  7. 7.
    Chekuri, C., et al.: Scheduling problems in parallel query optimization. In: Proceeding of PODS (1995) Google Scholar
  8. 8.
    Babcock, B., et al.: Operator scheduling in data stream systems. VLDB J. 13(4), 333–353 (2004) CrossRefGoogle Scholar
  9. 9.
    Hong, W.: Parallel query processing using shared memory multiprocessors and disk arrays. Ph.D. thesis (1992) Google Scholar
  10. 10.
    Hasan, W., et al.: Open issues in parallel query optimization. ACM SIGMOD Rec. 25(3), 28–33 (1996) CrossRefGoogle Scholar
  11. 11.
    Graefe, G., et al.: Extensible Query Optimization and Parallel Execution in Volcano. Query Processing for Advanced Database Systems. Morgan Kaufmann, San Mateo (1994) Google Scholar
  12. 12.
    DeWitt, D.J., Gray, J.: Parallel database systems: The future of high performance database processing. Commun. ACM 36(6), 377–387 (1992) Google Scholar
  13. 13.
    Babu, S., Widom, J.: Continuous queries over data streams. In: SIGMOD Record (2001) Google Scholar
  14. 14.
    DeWitt, C., Naughton: Design and evaluation of alternative selection placement strategies in optimizing continuous queries. In: Proceeding of ICDE (2002) Google Scholar
  15. 15.
    Movaghar, A.: Performability modeling with stochastic activity networks, Ph.D. dissertation, University of Michigan (1985) Google Scholar
  16. 16.
    Abdollahi Azgomi, M., Movaghar, A.: Coloured stochastic activity networks: Definitions and behavior. In: Proceeding of UKPEW’04, UK, pp. 297–308 (2004) Google Scholar
  17. 17.
    Khalili, A., et al.: PDETool: A multi-formalism modeling tool for discrete-event systems based on SDES description. In: Lecture Notes in Computer Science, vol. 5606, pp. 343–352. Springer, Berlin (2009) Google Scholar
  18. 18.
    Carney, D., et al.: Operator scheduling in a data stream manager. In: Proceedings of the VLDB, Germany, pp. 838–849 (2003) Google Scholar
  19. 19.
    Arasu, et al.: Characterizing memory requirements for queries over continuous data streams. In: Proceeding of the PODS (2002) Google Scholar
  20. 20.
    Babcock, B., et al.: Chain: Operator scheduling for memory minimization in data stream systems. In: Proceeding of the ACM SIGMOD (2003) Google Scholar
  21. 21.
    Babu, et al.: Exploiting k-constraints to reduce memory overhead in continuous queries over data streams. Technical Report, November (2002) Google Scholar
  22. 22.
    Ghalambor, M., Safaeei, A.A., Azgomi, M.A.: DSMS scheduling regarding complex QoS metrics. In: IEEE/ACS International Conference on Computer Systems and Applications (AICCSA), 10–13 May 2009 Google Scholar
  23. 23.
    Abadi, D.J., et al.: The design of the Borealis stream processing engine. In: Proceeding of the CIDR (2005) Google Scholar
  24. 24.
    Xing, Y., et al.: Dynamic load distribution in the Borealis stream processor. In: Proceeding of ICDE (2005) Google Scholar
  25. 25.
    Babu, S.: Adaptive query processing in data stream management systems. Ph.D. thesis (2005) Google Scholar
  26. 26.
    Graefe, G.: Volcano—An extensible and parallel query evaluation system. IEEE Trans. Knowl. Data Eng. 6(1), 120–135 (1994) CrossRefGoogle Scholar
  27. 27.
    Apers, P.M.G., et al.: PRISMA/DB: A parallel, main memory relational DBMS. IEEE Trans. Knowl. Data Eng. 4(6), 4–24 (1992) CrossRefGoogle Scholar
  28. 28.
    Beringer, J., Hullermeier, E.: Online clustering of parallel data streams. Int. J. Data Knowl. Eng. 58(2), 180–204 (2006) CrossRefGoogle Scholar
  29. 29.
    Kramer1, J.: Dynamic plan migration for snapshot-equivalent continuous queries in data stream systems. In: Proceeding of ICSWN (2006) Google Scholar
  30. 30.
    Zhu, Y., et al.: Dynamic plan migration for continuous queries over data streams. In: Proceeding of SIGMOD (2004) Google Scholar
  31. 31.
    Safaei, A.A., et al.: Using finite state machines in processing continuous queries. Int. Rev. Comput. Softw. 4(5), 551–556 (2009) Google Scholar
  32. 32.
    Tian, F., DeWitt, D.J.: Tuple routing strategies for distributed eddies. In: Proceeding of the VLDB (2003) Google Scholar
  33. 33.
    Chakravarthy, S., Pajjuri, V.: Scheduling Strategies and Their Evaluation in a Data Stream Management System. LNCS, vol. 4042. Springer, Berlin (2006) Google Scholar
  34. 34.
    Shah, M.A., et al.: Flux: An adaptive partitioning operator for continuous query systems. In: Proceeding of the ICDE (2003) Google Scholar
  35. 35.
    Tian, F., DeWitt, D.J.: Tuple routing strategies for distributed eddies. In: Proceeding of the VLDB (2003) Google Scholar
  36. 36.
    Nehme, R.V., et al.: Query mesh: Multiroute query processing technology. In: Proceeding of the VLDB (2009) Google Scholar
  37. 37.
    Nehme, R., et al.: Self-tuning query mesh for adaptive multi-route query processing. In: Proceeding of the ACM EDBT (2009) Google Scholar
  38. 38.
    Alqadi, Z.A.A., et al.: Performance analysis and evaluation of parallel matrix multiplication algorithms. World Appl. Sci. J. 5(2), 211–214 (2008) Google Scholar
  39. 39.
    Akl, S.G., et al.: Data-movement-intensive problems: Two folk theorems in parallel computation revisited. Theor. Comput. Sci. 95, 323–337 (1992) zbMATHCrossRefMathSciNetGoogle Scholar
  40. 40.
    Almasi, G.S., Gottlieb, A.: Highly Parallel Computing. Benjamin/Cummings, Redwood City (1989) zbMATHGoogle Scholar
  41. 41.
    Cosnard, M., Trystram, D.: Parallel Algorithms and Architectures. International Thompson Computer Press, Boston (1995) Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Department of Computer EngineeringIran University of Science and TechnologyTehranIran

Personalised recommendations