Abstract
A multidatabase system (MDBS) integrates information from multiple autonomous local databases. Performing global query optimization to achieve efficient query processing in such a system is challenging due to local autonomy of the data sources. Dynamic factors in the environment make the problem even more difficult. In this paper, we present two techniques, i.e., contention space partitioning and cost error controlling, to perform global query optimization in a dynamic MDBS. Both techniques generate an execution plan with multiple versions for a query in a dynamic MDBS, utilizing the multistate cost models built for the dynamic environment via our previous multistate query sampling method. The first technique partitions the contention space of a dynamic multidatabase environment into a given number of subspaces and chooses a good query execution plan version for each subspace, while the second technique selects a set of execution plan versions by using a given error tolerance to control query execution costs. Experiments demonstrate that the proposed techniques are quite promising for performing global query optimization in a dynamic MDBS. Compared with related work on dynamic query optimization, our approach has an advantage of avoiding the high overhead for modifying or re-generating an execution plan for a query based on dynamic runtime information.
Similar content being viewed by others
References
Adali, S., et al.: Query caching and optimization in distributed mediator systems. In: Proc. of ACM SIGMOD Conf., pp. 137–148 (1996)
Amsaleg, L., Franklin, M.J., Tomasic, A., Urhan, T.: Scrambling query plans to cope with unexpected delays. In: Proc. of Int. Conf. on Paral. and Distr. Inf. Syst., pp. 208–219 (1996)
Amsaleg, L., et al.: Scrambling query plans to cope with unexpected delays. In: Proc. of Int. Conf. on Paral. and Distr. Inf. Syst., pp. 208–219 (1996)
Arasu, A., Babcock, B., et al.: STREAM: the Stanford stream data manager. IEEE Data Eng. Bull. 26(1), 19–26 (2003)
Bouganim, L., et al.: Dynamic query scheduling in data integration systems. In: Proc. of IEEE Int. Conf. on Data Eng., pp. 425–434 (2000)
Chandrasekaran, S., Cooper, O., et al.: TelegraphCQ: continuous dataflow processing for an uncertain world. In: Proc. of CIDR Conf., pp. 1–12 (2003)
Chandrasekaran, S., Cooper, O., et al.: TelegraphCQ: continuous dataflow processing. In: Proc. of ACM SIGMOD Conf., pp. 668 (2003)
Chen, A.L.P.: Outerjoin optimization in multidatabase systems. In: Proc. of Int. Symp. on DB in Paral. and Distr. Syst., pp. 211–218 (1990)
Chen, C.-M., Sun, W., Rishe, N.: Performance comparison of three alternatives of distributed multidatabase systems: a global query perspective.. In: Proc. of Int. Conf. on Performance, Computing and Communications, pp. 53–59 (1998)
Cheng, X., Dong, G., Lau, T., Su, J.: Data integration by describing sources with constraint databases. In: Proc. of IEEE Int. Conf. on Data Eng., pp. 374–381 (1999)
Reiss, F., Hellerstein, J.M.: Lifting the burden of history from adaptive query processing. In: Proc. of VLDB Conf., pp. 948–959 (2004)
Du, W., et al.: Query optimization in heterogeneous DBMS. In: Proc. of VLDB Conf., pp. 277–291 (1992)
Du, W., Shan, M.C., Dayal, U.: Reducing multidatabase query response time by tree balancing. In: Proc. of ACM SIGMOD Conf., pp. 293–303 (1995)
Evrendilek, C., Dogac, A., Nural, S., Ozcan, F.: Multidatabase query optimization. Distrib. Parallel Databases 5(1), 77–113 (1997)
Garcia-Molina, H., Labio, W., Yerneni, R.: Capability-sensitive query processing on Internet sources. In: Proc. of IEEE Int. Conf. on Data Eng., pp. 50–59 (1999)
Gardarin, G., et al.: Calibrating the query optimizer cost model of IRO-DB, an object-oriented federated database system. In: Proc. of VLDB Conf., pp. 378–389 (1996)
Goni, A., Bermudez, J., Blanco, J.M., Illarramendi, A.: Using reasoning of description logics for query processing in multidatabase systems. In: Proc. of the 3rd Workshop on Knowl. Repres. Meets DB, pp. 1–6 (1996)
Hsu, C.-N., Knoblock, C.A.: Reformulating query plans for multidatabase systems. In: Proc. of ACM CIKM Conf., pp. 423–432 (1993)
Hsu, C.-N., Knoblock, C.A.: Semantic query optimization for query plans of heterogeneous multidatabase systems. IEEE Trans. Knowl. Data Eng. 12(6), 959–978 (2000)
Ives, Z.G., Florescu, D., Friedman, M.: An adaptive query execution system for data integration. In: Proc. of ACM SIGMOD Conf., pp. 299–310 (1999)
Ives, Z.G., Levy, A.Y., Weld, D.S.: Adaptive query processing for Internet applications. IEEE Data Eng. Bull. 23(2), 19–26 (2000)
Josifovski, V., Katchaounov, T., Risch, T.: Optimizing queries in distributed and composable mediators. In: Proc. of Int. Conf. CoopIS, pp. 291–302 (1999)
Josinski, H.: Dynamic query optimization and query processing in multidatabase systems. In: Int. Conf. on Extending DB Tech. Ph.D. Workshop, pp. 1–4 (2000)
Kang, S., Moon, S.: Global query management in heterogeneous distributed database systems. Microproces. Microprogram. 38, 377–384 (1993)
Lee, C., Chen, C.J.: Query optimization in multidatabase systems considering schema conflicts. IEEE Trans. Know. Data Eng. 9(6), 941–955 (1997)
Lee, J.-O., Baik, D.-K.: SemQL: a semantic query language for multidatabase systems. In: Proc. of ACM CIKM Conf., pp. 259–266 (1999)
Levy, A.Y., Rajaraman, A., Ordille, J.J.: Querying heterogeneous information sources using source descriptions. In: Proc. of VLDB Conf., pp. 226–251
Lim, E.-P., et al.: An algebraic transformation framework for multidatabase queries. Distrib. Parallel Databases 3, 273–307 (1995)
Motwani, R., Widom, J., et al.: Query processing, resource management, and approximation in a data stream management system. In: Proc. of CIDR Conf., pp. 1–12 (2003)
Naacke, H., Gardarin, G., Tomasic, A.: Leveraging mediator cost models with heterogeneous data sources. In: Proc. of IEEE Int. Conf. on Data Eng., pp. 351–360 (1998)
Otsuka, S., Miyazaki, N.: An incomplete database approach to global query processing. In: Proc. of the 12th Int. Conf. on Inf. Networking, pp. 337–342 (1998)
Ozcan, F., Nural, S., Koksal, P., Evrendilek, C.: Dynamic query optimization in multidatabases. IEEE Data Eng. Bull. 20(3), 38–44 (1997)
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C: the Art of Scientific Computing, 2nd edn. Cambridge University Press, Cambridge (1992)
Rahal, A., Zhu, Q., Larson, P.-Å.: Evolutionary techniques for updating query cost models in a dynamic multidatabase environment. VLDB J. 13(2), 162–176 (2004)
Reiss, F., Hellerstein, J.M.: Data Triage: an adaptive architecture for load shedding in TelegraphCQ. In: Proc. of IEEE Int. Conf. on Data Eng., pp. 155–156 (2005)
Roth, M.T. et al.: Cost models DO matter: providing cost information for diverse data sources in a federated system. In: Proc. of VLDB Conf., pp. 599–610 (1999)
Subramanian, D.K., Subramanian, K.: Query optimization in multidatabase systems. Distrib. Parallel Databases 6(3), 183–210 (1998)
Tsai, P.S.M., Chen, A.L.P.: Optimizing entity join queries when data transmission cost dominates. Data Knowl. Eng. 22, 283–308 (1997)
Tomasic, A., Raschid, L.: Scaling access to heterogeneous data sources with DISCO. IEEE Trans. Knowl. Data Eng. 10(5), 808–823 (1998)
Urhan, T., Franklin, M.J., Amsaleg, L.: Cost-based query scrambling for initial delays. In: Proc. of ACM SIGMOD Conf., pp. 130–141 (1998)
Vassalos, V., Papakonstantinou, Y.: Describing and using query capabilities of heterogeneous sources. In: Proc. of VLDB Conf., pp. 256–265 (1997)
Wei, C.-P., Sheng, O.R.L., Hu, P.J.-H.: Fuzzy statistics estimation in supporting multidatabase query optimization. Electron. Commer. Res. 2(3), 287–316 (2002)
Zhu, Q., Haridas, J., Hou, W.-C.: Global query optimization based on multistate cost models for a dynamic multidatabase system. In: Proc. of Int. Conf. on Enterprise Infor. Syst., pp. 144–155 (2003)
Zhu, Q., Larson, P.-Å.: A query sampling method for estimating local cost parameters in a multidatabase system. In: Proc. of IEEE Int. Conf. on Data Eng., pp. 144–153 (1994)
Zhu, Q., Larson, P.-Å.: Building regression cost models for multidatabase systems. In: Proc. of Int. Conf. on Paral. and Distr. Inf. Syst., pp. 220–231 (1996)
Zhu, Q., Larson, P.-Å.: Global query processing and optimization in the CORDS multidatabase system. In: Proc. of 9th Int. Conf. on Paral. and Distr. Comp. Syst., pp. 640–646 (1996)
Zhu, Q., Larson, P.-Å.: A fuzzy query optimization approach for multidatabase systems. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 5(6), 701–722 (1997)
Zhu, Q., Larson, P.-Å.: Solving local cost estimation problem for global query optimization in multidatabase systems. Distrib. Parallel Databases 6(4), 373–420 (1998)
Zhu, Q., Sun, Y., Motheramgari, S.: Developing cost models with qualitative variables for dynamic multidatabase environments. In: Proc. of IEEE Int. Conf. on Data Eng., pp. 413–424 (2000)
Zhu, Q., Larson, P.-Å.: Classifying local queries for global query optimization in multidatabase systems. Int. J. Cooperative Inf. Syst. 9(3), 315–355 (2000)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Ahmed K. Elmagarmid.
Research was supported by the US National Science Foundation under Grant # IIS-9811980 and The University of Michigan.
Rights and permissions
About this article
Cite this article
Zhu, Q., Haridas, J. & Hou, WC. Query optimization via contention space partitioning and cost error controlling for dynamic multidatabase systems. Distrib Parallel Databases 23, 151–188 (2008). https://doi.org/10.1007/s10619-008-7025-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-008-7025-4