Advertisement

Adaptive Query Processing in Distributed Settings

  • Anastasios Gounaris
  • Efthymia Tsamoura
  • Yannis Manolopoulos
Part of the Intelligent Systems Reference Library book series (ISRL, volume 36)

Abstract

In this survey chapter,we discuss adaptive query processing (AdQP) techniques for distributed environments. We also investigate the issues involved in extending AdQP techniques originally proposed for single-node processing so that they become applicable to multi-node environments as well. In order to make it easier for the reader to understand the similarities among the various proposals, we adopt a common framework, which decomposes the adaptivity loop into the monitoring, analysis, planning and actuation (or execution) phase. The main distributed AdQP techniques developed so far tend to differ significantly from their centralized counterparts, both in their objectives and in their focus. The objectives in distributed AdQP are more tailored to distributed settings, whereas more attention is paid to issues relating to the adaptivity cost, which is significant, especially when operators and data are moved over the network.

Keywords

Load Balance Query Processing Query Optimization Query Execution Query Plan 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arpaci-Dusseau, R.H.: Run-time Adaptation in River. ACM Trans. Comput. Syst. 21(1), 36–86 (2003)CrossRefGoogle Scholar
  2. 2.
    Avnur, R., Hellerstein, J.M.: Eddies: Continuously Adaptive Query Processing. SIGMOD Record 29(2), 261–272 (2000)CrossRefGoogle Scholar
  3. 3.
    Babcock, B., Chaudhuri, S.: Towards a Robust Query Optimizer: A Principled and Practical Approach. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 119–130 (2005)Google Scholar
  4. 4.
    Babu, S., Bizarro, P.: Adaptive Query Processing in the Looking Glass. In: Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research (CIDR), pp. 238–249 (2005)Google Scholar
  5. 5.
    Babu, S., Bizarro, P., DeWitt, D.: Proactive Re-Optimization. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 107–118 (2005)Google Scholar
  6. 6.
    Babu, S., Motwani, R., Munagala, K., Nishizawa, I., Widom, J.: Adaptive Ordering of Pipelined Stream Filters. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 407–418. ACM (2004)Google Scholar
  7. 7.
    Babu, S., Munagala, K., Widom, J., Motwani, R.: Adaptive Caching for Continuous Queries. In: ICDE, pp. 118–129 (2005)Google Scholar
  8. 8.
    Babu, S., Widom, J.: Continuous Queries over Data Streams. SIGMOD Record 30(3), 109–120 (2001)CrossRefGoogle Scholar
  9. 9.
    Balazinska, M., Balakrishnan, H., Stonebraker, M.: Contract-based Load Management in Federated Distributed Systems. In: Proceedings of the 1st Conference on Symposium on Networked Systems Design and Implementation (NSDI), pp. 15–28 (2004)Google Scholar
  10. 10.
    Bizarro, P., Babu, S., DeWitt, D., Widom, J.: Content-based Routing: Different Plans for Different Data. In: Proceedings of the 31st International Conference on Very Large Data Bases (VLDB), pp. 757–768 (2005)Google Scholar
  11. 11.
    Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Çetintemel, U., Xing, Y., Zdonik, S.B.: Scalable Distributed Stream Processing. In: CIDR (2003)Google Scholar
  12. 12.
    Chu, F.C., Halpern, J.Y., Gehrke, J.: Least Expected Cost Query Optimization: What Can We Expect? In: Proceedings of the Twenty-first ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 293–302. ACM (2002)Google Scholar
  13. 13.
    Chvatal, V.: A Greedy Heuristic for the Set-Covering Problem. Mathematics of Operations Research 4(3), 233–235 (1979)MathSciNetzbMATHCrossRefGoogle Scholar
  14. 14.
    Claypool, K.T., Claypool, M.: Teddies: Trained Eddies for Reactive Stream Processing. In: Haritsa, J.R., Kotagiri, R., Pudi, V. (eds.) DASFAA 2008. LNCS, vol. 4947, pp. 220–234. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  15. 15.
    Deshpande, A.: An Initial Study of Overheads of Eddies. SIGMOD Record 33(1), 44–49 (2004)CrossRefGoogle Scholar
  16. 16.
    Deshpande, A., Hellerstein, J.M.: Lifting the Burden of History from Adaptive Query Processing. In: Proceedings of the 30th International Conference on Very Large Data Bases (VLDB), pp. 948–959 (2004)Google Scholar
  17. 17.
    Deshpande, A., Hellerstein, L.: Flow Algorithms for Parallel Query Optimization. In: ICDE, pp. 754–763 (2008)Google Scholar
  18. 18.
    Deshpande, A., Ives, Z., Raman, V.: Adaptive Query Processing. Foundations and Trends in Databases 1(1), 1–140 (2007)zbMATHCrossRefGoogle Scholar
  19. 19.
    DeWitt, D., Gray, J.: Parallel Database Systems: The Future of High Performance Database Systems. Communications of the ACM 35(6), 85–98 (1992)CrossRefGoogle Scholar
  20. 20.
    DeWitt, D.J., Naughton, J.F., Schneider, D.A., Seshadri, S.: Practical Skew Handling in Parallel Joins. In: Proceedings of the 18th International Conference on Very Large Data Bases (VLDB), pp. 27–40 (1992)Google Scholar
  21. 21.
    Eurviriyanukul, K., Paton, N.W., Fernandes, A.A.A., Lynden, S.J.: Adaptive Join Processing in Pipelined Plans. In: EDBT, pp. 183–194 (2010)Google Scholar
  22. 22.
    Ewen, S., Kache, H., Markl, V., Raman, V.: Progressive Query Optimization for Federated Queries. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 847–864. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  23. 23.
    Gedik, B., Liu, L.: PeerCQ: A Decentralized and Self-Configuring Peer-to-Peer Information Monitoring System. In: ICDCS, pp. 490–499 (2003)Google Scholar
  24. 24.
    Gedik, B., Wu, K.L., Yu, P.S., Liu, L.: GrubJoin: An Adaptive, Multi-Way, Windowed Stream Join with Time Correlation-Aware CPU Load Shedding. IEEE Trans. Knowl. Data Eng. 19(10), 1363–1380 (2007)CrossRefGoogle Scholar
  25. 25.
    Godfrey, B., Lakshminarayanan, K., Surana, S., Karp, R., Stoica, I.: Load Balancing in Dynamic Structured P2P Systems. In: Proceedings of the 23rd Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM), pp. 2253–2262 (2004)Google Scholar
  26. 26.
    Gounaris, A., Paton, N.W., Fernandes, A.A.A., Sakellariou, R.: Adaptive Query Processing: A Survey. In: Eaglestone, B., North, S., Poulovassilis, A. (eds.) BNCOD 2002. LNCS, vol. 2405, pp. 11–25. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  27. 27.
    Gounaris, A., Smith, J., Paton, N.W., Sakellariou, R., Fernandes, A.A., Watson, P.: Adaptive Workload Allocation in Query Processing in Autonomous Heterogeneous Environments. Distrib. Parallel Databases 25(3), 125–164 (2009)CrossRefGoogle Scholar
  28. 28.
    Gounaris, A., Yfoulis, C.A., Paton, N.W.: Efficient Load Balancing in Partitioned Queries Under Random Perturbations. ACM Transactions on Autonomous and Adaptive Systems (to appear)Google Scholar
  29. 29.
    Gounaris, A., Yfoulis, C.A., Paton, N.W.: An Efficient Load Balancing LQR Controller in Parallel Databases Queries Under Random Perturbations. In: 3rd IEEE Multi-conference on Systems and Control, MSC 2009 (2009)Google Scholar
  30. 30.
    Graefe, G.: Encapsulation of Parallelism in the Volcano Query Processing System. In: Garcia-Molina, H., Jagadish, H.V. (eds.) Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, pp. 102–111 (1990)Google Scholar
  31. 31.
    Graefe, G.: Query Evaluation Techniques for Large Databases. ACM Comput. Surv. 25(2), 73–170 (1993)CrossRefGoogle Scholar
  32. 32.
    Gu, X., Yu, P., Wang, H.: Adaptive Load Diffusion for Multiway Windowed Stream Joins. In: Proceedings of the 23rd IEEE International Conference on Data Engineering (ICDE), pp. 146–155 (2007)Google Scholar
  33. 33.
    Hameurlain, A., Morvan, F.: CPU and Incremental Memory Allocation in Dynamic Parallelization of SQL Queries. Parallel Computing 28(4), 525–556 (2002)zbMATHCrossRefGoogle Scholar
  34. 34.
    Han, W.S., Ng, J., Markl, V., Kache, H., Kandil, M.: Progressive Optimization in a Shared-Nothing Parallel Database. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 809–820 (2007)Google Scholar
  35. 35.
    Hellerstein, J.M., Franklin, M.J., Chandrasekaran, S., Deshpande, A., Hildrum, K., Madden, S., Raman, V., Shah, M.A.: Adaptive Query Processing: Technology in Evolution. IEEE Data Eng. Bull. 23(2), 7–18 (2000)Google Scholar
  36. 36.
    Huebsch, R., Jeffery, S.R.: FREddies: DHT-Based Adaptive Query Processing via FedeRated Eddies. Technical Report No. UCB/CSD-4-1339, University of California (2004)Google Scholar
  37. 37.
    Hwang, J.H., Xing, Y., Cetintemel, U., Zdonik, S.: A Cooperative, Self-Configuring High-Availability Solution for Stream Processing. In: Proceedings of the 23rd IEEE International Conference on Data Engineering (ICDE), pp. 176–185 (2007)Google Scholar
  38. 38.
    Ives, Z.: Efficient Query Processing for Data Integration. Ph.D. thesis. University of Washington (2002)Google Scholar
  39. 39.
    Ives, Z.G., Halevy, A.Y., Weld, D.S.: Adapting to Source Properties in Processing Data Integration Queries. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 395–406 (2004)Google Scholar
  40. 40.
    Kabra, N., DeWitt, D.J.: Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 106–117. ACM Press (1998)Google Scholar
  41. 41.
    Kephart, J.O., Chess, D.M.: The Vision of Autonomic Computing. IEEE Computer 36(1), 41–50 (2003)CrossRefGoogle Scholar
  42. 42.
    Kossmann, D.: The State of the Art in Distributed Query Processing. ACM Computing Surveys (CSUR) 32(4), 422–469 (2000)CrossRefGoogle Scholar
  43. 43.
    Kossmann, D., Stocker, K.: Iterative Dynamic Programming: a New Class of Query Optimization Algorithms. ACM Trans. Database Syst. 25(1), 43–82 (2000)CrossRefGoogle Scholar
  44. 44.
    Lee, K., Paton, N.W., Sakellariou, R., Deelman, E., Fernandes, A.A.A., Mehta, G.: Adaptive Workflow Processing and Execution in Pegasus. Concurrency and Computation: Practice and Experience 21(16), 1965–1981 (2009)CrossRefGoogle Scholar
  45. 45.
    Liu, B., Jbantova, M., Rundensteiner, E.A.: Optimizing State-Intensive Non-Blocking Queries Using Run-time Adaptation. In: Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop (ICDEW), pp. 614–623 (2007)Google Scholar
  46. 46.
    Mackert, L.F., Lohman, G.M.: R* Optimizer Validation and Performance Evaluation for Distributed Queries. In: VLDB 1986 Twelfth International Conference on Very Large Data Bases, pp. 149–159 (1986)Google Scholar
  47. 47.
    Markl, V., Raman, V., Simmen, D., Lohman, G., Pirahesh, H., Cilimdzic, M.: Robust Query Processing through Progressive Optimization. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 659–670 (2004)Google Scholar
  48. 48.
    Nehme, R.V., Rundensteiner, E.A., Bertino, E.: Self-Tuning Query Mesh for Adaptive Multi-Route Query Processing. In: Proceedings of the 12th International Conference on Extending Database Technology (EDBT), pp. 803–814 (2009)Google Scholar
  49. 49.
    Nehme, R.V., Works, K.E., Rundensteiner, E.A., Bertino, E.: Query Mesh: Multi-Route Query Processing Technology. Proceedings of the VLDB Endowment 2(2) (2009)Google Scholar
  50. 50.
    Ozcan, F., Nural, S., Koksal, P., Evrendilek, C., Dogac, A.: Dynamic Query Optimization in Multidatabases. IEEE Data Eng. Bull. 20(3), 38–45 (1997)Google Scholar
  51. 51.
    Ozsu, M., Valduriez, P. (eds.): Principles of Distributed Database Systems, 2nd edn. Prentice-Hall (1999)Google Scholar
  52. 52.
    Pang, H., Carey, M.J., Livny, M.: Memory-Adaptive External Sorting. In: 19th International Conference on Very Large Data Bases, pp. 618–629 (1993)Google Scholar
  53. 53.
    Papoulis, A.: Probability, Random Variables, and Stochastic Processes, 3rd edn.Google Scholar
  54. 54.
    Paton, N.W., Buenabad-Chavez, J., Chen, M., Raman, V., Swart, G., Narang, I., Yellin, D.M., Fernandes, A.A.: Autonomic Query Parallelization using Non-Dedicated Computers: an Evaluation of Adaptivity Options. The VLDB Journal 18(1), 119–140 (2009)CrossRefGoogle Scholar
  55. 55.
    Pitoura, T., Ntarmos, N., Triantafillou, P.: Replication, Load Balancing and Efficient Range Query Processing in dHTs. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 131–148. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  56. 56.
    Raman, V., Deshpande, A., Hellerstein, J.M.: Using State Modules for Adaptive Query Processing. In: Proceedings of the IEEE 19th International Conference on Data Engineering (ICDE), pp. 353–364 (2003)Google Scholar
  57. 57.
    Sabesan, M., Risch, T.: Adaptive Parallelization of Queries over Dependent Web Service Calls. In: Proceedings of the 25th IEEE International Conference on Data Engineering (ICDE), pp. 1725–1732 (2009)Google Scholar
  58. 58.
    Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access Path Selection in a Relational Database Management System. In: Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data. ACM (1979)Google Scholar
  59. 59.
    Shah, M., Hellerstein, J., Chandrasekaran, S., Franklin, M.: Flux: An Adaptive Partitioning Operator for Continuous Query Systems. In: Proceedings of the IEEE 19th International Conference on Data Engineering (ICDE), pp. 25–36 (2003)Google Scholar
  60. 60.
    Shah, M.A., Hellerstein, J.M., Brewer, E.A.: Highly-Available, Fault-Tolerant, Parallel Dataflows. In: Weikum, G., König, A.C., Deßloch, S. (eds.) Proceedings of the ACM SIGMOD International Conference on Management of Data, Paris, France, June 13-18, pp. 827–838. ACM (2004)Google Scholar
  61. 61.
    Stillger, M., Lohman, G.M., Markl, V., Kandil, M.: LEO - DB2’s LEarning Optimizer. In: VLDB 2001, Proceedings of 27th International Conference on Very Large Data Bases, pp. 19–28 (2001)Google Scholar
  62. 62.
    Tian, F., DeWitt, D.J.: Tuple Routing Strategies for Distributed Eddies. In: Proceedings of the 29th International Conference on Very Large Data Bases (VLDB), pp. 333–344 (2003)Google Scholar
  63. 63.
    Urhan, T., Franklin, M.J.: XJoin: A Reactively-Scheduled Pipelined Join Operator. IEEE Data Engineering Bulletin 23(2), 27–33 (2000)Google Scholar
  64. 64.
    Vigfusson, Y., Silberstein, A., Cooper, B.F., Fonseca, R.: Adaptively Parallelizing Distributed Range Queries. Proceedings of the VLDB Endowment 2(1), 682–693 (2009)Google Scholar
  65. 65.
    Viglas, S.D., Naughton, J.F., Burger, J.: Maximizing the Output Rate of Multi-Way Join Queries over Streaming Information Sources. In: Proceedings of the 29th International Conference on Very Large Data Bases (VLDB), pp. 285–296 (2003)Google Scholar
  66. 66.
    Wang, S., Rundensteiner, E.: Scalable Stream Join Processing with Expensive Predicates: Workload Distribution and Adaptation by Time-Slicing. In: Proceedings of the 12th International Conference on Extending Database Technology (EDBT), pp. 299–310 (2009)Google Scholar
  67. 67.
    Wang, W., Sharaf, M.A., Guo, S., Özsu, M.T.: Potential-driven Load Distribution for Distributed Data Stream Processing. In: Proceedings of the 2nd International Workshop on Scalable Stream Processing System (SSPS), pp. 13–22 (2008)Google Scholar
  68. 68.
    Wu, S., Jiang, S., Ooi, B.C., Tan, K.L.: Distributed Online Aggregations. Proceedings of the VLDB Endowment 2(1), 443–454 (2009)Google Scholar
  69. 69.
    Xing, Y., Hwang, J.H., Çetintemel, U., Zdonik, S.: Providing Resiliency to Load Variations in Distributed Stream Processing. In: Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB), pp. 775–786 (2006)Google Scholar
  70. 70.
    Xing, Y., Zdonik, S., Hwang, J.H.: Dynamic Load Distribution in the Borealis Stream Processor. In: Proceedings of the 21st International Conference on Data Engineering (ICDE), pp. 791–802 (2005)Google Scholar
  71. 71.
    Yu, M.J., Sheu, P.C.Y.: Adaptive Join Algorithms in Dynamic Distributed Databases. Distributed and Parallel Databases 5(1), 5–30 (1997)CrossRefGoogle Scholar
  72. 72.
    Zhou, Y., Ooi, B.C., Tan, K.L.: Dynamic Load Management for Distributed Continuous Query Systems. In: Proceedings of the 21st International Conference on Data Engineering (ICDE), pp. 322–323 (2005)Google Scholar
  73. 73.
    Zhu, Y., Rundensteiner, E.A., Heineman, G.T.: Dynamic Plan Migration for Continuous Queries over Data Streams. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 431–442 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Anastasios Gounaris
    • 1
  • Efthymia Tsamoura
    • 1
  • Yannis Manolopoulos
    • 1
  1. 1.Aristotle University of ThessalonikiThessalonikiGreece

Personalised recommendations