On Transformation of Query Scheduling Strategies in Distributed and Heterogeneous Database Systems
This work considers a problem of optimal query processing in heterogeneous and distributed database systems. A global query submitted at a local site is decomposed into a number of queries processed at the remote sites. The partial results returned by the queries are integrated at a local site. The paper addresses a problem of an optimal scheduling of queries that minimizes time spend on data integration of the partial results into the final answer. A global data model defined in this work provides a unified view of the heterogeneous data structures located at the remote sites and a system of operations is defined to express the complex data integration procedures. This work shows that the transformations of an entirely simultaneous query processing strategies into a hybrid (simultaneous/sequential) strategy may in some cases lead to significantly faster data integration. We show how to detect such cases, what conditions must be satisfied to transform the schedules, and how to transform the schedules into the more efficient ones.
KeywordsDistributed heterogenous database systems Data integration Optimization of query processing
Unable to display preview. Download preview PDF.
- 1.Ahmad, M., Aboulnaga, A., Babu, S.: Query interactions in database workloads. In: Proceedings of the Second International Workshop on Testing Database Systems, pp. 1–6 (2009)Google Scholar
- 2.Ahmad, M., Duan, S., Aboulnaga, A., Babu, S.: Predicting completion times of batch query workloads using interaction-aware models and simulation. In: Proceedings of the 14th International Conference on Extending Database Technology, pp. 449–460 (2011)Google Scholar
- 3.Costa, R.L.-C., Furtado, P.: Runtime estimations, reputation and elections for top performing distributed query scheduling. In: Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 28–35 (2009)Google Scholar
- 4.Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating Systems Design and Implementation (2004)Google Scholar
- 5.Granas, A., Dugundji, J.: Fixed Point Theory. Springer-Verlag (2003)Google Scholar
- 6.Ives, Z.G., Green, T.J., Karvounarakis, G., Taylor, N.E., Tannen, V., Talukdar, P.P., Jacob, M., Pereira F.: The ORCHESTRA Collaborative Data Sharing System. SIGMOD Record (2008)Google Scholar
- 8.Lenzerini, M.: Data Integration: A Theoretical Perspective (2002)Google Scholar
- 9.Liu L., Pu, C.: A dynamic query scheduling framework for distributed and evolving information systems. In: Proceedings of the 17th International Conference on Distributed Computing Systems (1997)Google Scholar
- 12.Ozcan, F., Nural, S., Koksal, P., Evrendilek, C., Dogac, A.: Dynamic Query Optimization in Multidatabases. Bulletin of the Technical Committee on Data Engineering 20(3), 38–45 (2011)Google Scholar
- 16.Ziegler, P.: Three Decades of Data Integration - All problems Solved? In: 18th IFIP World Computer Congress, vol. 12 (2004)Google Scholar