Abstract
The increasing prevalence of networked storage and computational resources, along with middleware for managing resource access and sharing, raises the prospect that queries can be run over resources obtained on demand, rather than on dedicated infrastructures. However, the movement of query processing into non-dedicated environments means that it is necessary to take account of the partial information and unstable conditions that characterise autonomous, shared, distributed settings. Thus, query processing on grid platforms needs to be adaptive, revising evaluation strategies at query runtime in response to the evolving environment, such as changes to machine load and availability. To address this challenge, adaptive techniques are described that: (i) balance load across plan partitions supporting intra-operator parallelism; (ii) remove bottlenecks in pipelined plans supporting inter-operator parallelism; and (iii) combine the two aforementioned techniques. The approach has been empirically evaluated in a grid-enabled adaptive query processor.
Similar content being viewed by others
References
Alpdemir, M.N., Mukherjee, A., Paton, N.W., Watson, P., Fernandes, A.A.A., Gounaris, A., Smith, J.: Service-based distributed querying on the grid. In: Proc. 1st ICSOC, pp. 467–482. Springer, Berlin (2003)
Antonioletti, M., Atkinson, M., Baxter, R., Borley, A., Chue Hong, N.P., Collins, B., Hardman, N., Hulme, A.C., Knox, A., Jackson, M., Krause, A., Laws, S., Magowan, J., Paton, N.W., Pearson, D., Sugden, T., Watson, P., Westhead, M.: The design and implementation of grid database services in OGSA-DAI. Concurr. Pract. Exper. 17, 357–376 (2005)
Arpaci-Dusseau, R., Anderson, E., Treuhaft, N., Culler, D., Hellerstein, J., Patterson, D., Yelick, K.: Cluster I/O with river: making the fast case common. In: Proc. of the Sixth IOPADS Workshop, pp. 10–22 (1999)
Avnur, R., Hellerstein, J.: Eddies: continuously adaptive query processing. In: Proc. of ACM SIGMOD 2000, pp. 261–272 (2000)
Babu, S., Bizarro, P., DeWitt, D.: Proactive re-optimization. In: Proc. ACM SIGMOD, pp. 107–118 (2005)
Babu, S., Bizarro, P.: Adaptive query processing in the looking glass. In: CIDR, pp. 238–249 (2005)
Braumandl, R., Keidl, M., Kemper, A., Kossmann, K., Kreutz, A., Seltzsam, S., Stocker, K.: ObjectGlobe: ubiquitous query processing on the Internet. VLDB J. 10(1), 48–71 (2001)
Chandrasekaran, S., Franklin, M.: PSoup: a system for streaming queries over streaming data. VLDB J. 12, 140–156 (2003)
Chaudhuri, S., Narasayya, V., Ramamurthy, R.: Estimating progress of execution for sql queries. In: Proc. of ACM SIGMOD, pp. 803–814 (2004)
Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Cetintemel, U., Xing, Y., Zdonik, S.: Scalable distributed stream processing. In: CIDR (2003)
Yang, H.C., Dasdan, A., Hsiao, R.-L., Parker, D.S. Jr.: Map-reduce-merge: simplified relational data processing on large clusters. In: SIGMOD Conference, pp. 1029–1040 (2007)
Culler, D.E.: Planetlab: an open, community-driven infrastructure for experimental planetary-scale services. In: USENIX Symposium on Internet Technologies and Systems (2003)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: OSDI, pp. 137–150 (2004)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Deshpande, A., Hellerstein, J.M.: Lifting the burden of history from adaptive query processing. In: Proc. of 30th VLDB Conf., pp. 948–959 (2004)
Deshpande, A., Ives, Z.G., Raman, V.: Adaptive query processing. Found. Trends Databases 1(1), 1–140 (2007)
Eugster, P.Th., Felber, P.A., Guerraoui, R., Kermarrec, A.-M.: The many faces of publish/subscribe. ACM Comput. Surv. 35(2), 114–131 (2003)
Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure, 2nd edn. Morgan Kaufmann, San Mateo (2003)
Gounaris, A., Paton, N.W., Fernandes, A.A.A., Sakellariou, R.: Self monitoring query execution for adaptive query processing. Data Knowl. Eng. 51(3), 325–348 (2004)
Gounaris, A., Paton, N.W., Sakellariou, R., Fernandes, A.A.A.: Adapting to changing resource performance in grid query processing. In: 1st Int. Workshop on Data Management in Grids, pp. 30–44. Springer, Berlin (2005)
Gounaris, A., Sakellariou, R., Paton, N.W., Fernandes, A.A.A.: A novel approach to resource scheduling for parallel query processing on computational grids. Distrib. Parallel Databases 19(2–3), 87–106 (2006)
Graefe, G.: Encapsulation of parallelism in the volcano query processing system. In: Proc. SIGMOD, pp. 102–111 (1990)
Hameurlain, A., Morvan, F.: CPU and incremental memory allocation in dynamic parallelization of SQL queries. Parallel Comput. 28(4), 525–556 (2002)
Hellerstein, J.M., Stonebraker, M.: Predicate migration: optimizing queries with expensive predicates. In: SIGMOD Conference, pp. 267–276 (1993)
Ives, Z.: Efficient query processing for data integration. PhD thesis, University of Washington (2002)
Ives, Z., Florescu, D., Friedman, M., Levy, A., Weld, D.: An adaptive query execution system for data integration. In: Proc. of ACM SIGMOD 1999, pp. 299–310 (1999)
Ives, Z., Halevy, A., Weld, D.: Adapting to source properties in processing data integration queries. In: Proc. of ACM SIGMOD, pp. 395–406 (2004)
Josifovski, V., Schwarz, P., Haas, L., Lin, E.: Garlic: a new flavor of federated query processing for db2. In: Proc. of ACM SIGMOD, pp. 524–532 (2002)
Kabra, N., DeWitt, D.: Efficient mid-query re-optimization of sub-optimal query execution plans. In: Proc. of ACM SIGMOD, pp. 106–117 (1998)
Li, Q., Shao, M., Markl, V., Beyer, K.S., Colby, L.S., Lohman, G.M.: Adaptively reordering joins during query execution. In: ICDE, pp. 26–35 (2007)
Liu, D.T., Franklin, M.J.: GridDB: a data-centric overlay for scientific grids. In: Proc. VLDB, pp. 600–611. Morgan Kaufmann, San Mateo (2004)
Markl, V., Raman, V., Simmen, D.E., Lohman, G.M., Pirahesh, H.: Robust query processing through progressive optimization. In: Proc. ACM SIGMOD, pp. 659–670 (2004)
Narayanan, S., Kurc, T.M., Saltz, J.: Database support for data-driven scientific applications in the grid. Parallel Process. Lett. 13(2), 245–271 (2003)
Ng, K., Wang, Z., Muntz, R., Nittel, S.: Dynamic query re-optimization. In: Proc. of 11th SSDBM, pp. 264–273 (1999)
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: a not-so-foreign language for data processing. In: SIGMOD Conference, pp. 1099–1110 (2008)
Oram, A.: Peer-to-Peer: Harnessing the Power of Disruptive Technologies. O’Reilly (2001)
Ozcan, F., Nural, S., Koksal, P., Evrendilek, C., Dogac, A.: Dynamic query optimization in multidatabases. IEEE Data Eng. Bull. 20(3), 38–45 (1997)
Paton, N.W., Chávez, J.B., Chen, M., Raman, V., Swart, G., Narang, I., Yellin, D.M., Fernandes, A.A.A.: Autonomic query parallelization using non-dedicated computers: an evaluation of adaptivity options. VLDB J. (2008). doi:10.1007/s00778-007-0090-x
Porto, F., da Silva, V.F.V., Dutra, M.L., Schulze, B.: An adaptive distributed query processing grid service. In: Proc. 1st Data Management in Grids Workshop, pp. 45–57. Springer, Berlin (2005)
Raman, V., Han, W., Narang, I.: Parallel querying with non-dedicated computers. In: Proc. VLDB, pp. 61–72 (2005)
Raman, V., Raman, B., Hellerstein, J.: Online dynamic reordering for interactive data processing. In: Proc. of 25th VLDB Conference, pp. 709–720 (1999)
Shah, M., Hellerstein, J., Chandrasekaran, S., Franklin, M.: Flux: an adaptive partitioning operator for continuous query systems. In: Proc. of ICDE, pp. 25–36 (2003)
Shah, M.A., Hellerstein, J.M., Brewer, E.A.: Highly available fault-tolerant, parallel dataflows. In: Proc. SIGMOD, pp. 827–838 (2004)
Smith, J., Gounaris, A., Watson, P., Paton, N.W., Fernandes, A.A.A., Sakellariou, R.: Distributed query processing on the grid. Intl. J. High Perform. Comput. Appl. 17(4), 353–368 (2003)
Smith, J., Watson, P.: Fault-tolerance in distributed query processing. In: Proc. 9th IDEAS, pp. 329–338 (2005)
Srivastava, U., Munagala, K., Widom, J., Motwani, R.: Query optimization over web services. In: VLDB, pp. 355–366 (2006)
Stonebraker, M., Aoki, P.M., Litwin, W., Pfeffer, A., Sah, A., Sidell, J., Staelin, C., Mariposa, A.Yu.: A wide-area distributed database system. VLDB J. 5(1), 48–63 (1996)
Tian, F., DeWitt, D.: Tuple routing strategies for distributed eddies. In: Proc. of 29th VLDB Conference, pp. 333–344 (2003)
Wang, X., Burns, R., Terzis, A.: Throughput-optimized, global-scale join processing in scientific federations. In: NETB’07: Proceedings of the 3rd USENIX International Workshop on Networking Meets Databases, pp. 1–6. USENIX Association, Berkeley (2007)
Wang, X., Burns, R.C., Terzis, A., Deshpande, A.: Network-aware join processing in global-scale database federations. In: ICDE, pp. 586–595 (2008)
Xing, Y., Zdonik, S., Hwang, J.-H.: Dynamic load distribution in the Borealis stream processor. In: Proc ICDE, pp. 791–802 (2005)
Yu, M.J., Sheu, P.C.-Y.: Adaptive join algorithms in dynamic distributed databases. Distrib. Parallel Databases 5(1), 5–30 (1997)
Zhou, Y., Ooi, B.C., Tan, K.-L., Tok, W.H.: An adaptable distributed query processing architecture. Data Knowl. Eng. 53(3), 283–309 (2005)
Zhu, Y., Rundensteiner, E.A., Heineman, G.T.: Dynamic plan migration for continuous queries over data streams. In: Proc. ACM SIGMOD, pp. 431–442 (2004)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Ahmed K. Elmagarmid.
Rights and permissions
About this article
Cite this article
Gounaris, A., Smith, J., Paton, N.W. et al. Adaptive workload allocation in query processing in autonomous heterogeneous environments. Distrib Parallel Databases 25, 125–164 (2009). https://doi.org/10.1007/s10619-008-7032-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-008-7032-5