Optimizing join queries in distributed databases
A reduced cover set of the set of full reducer semijoin programs for an acyclic query graph for a distributed database system is given. An algorithm based on this reduced cover set is then presented which determines the minimum cost full reducer program. We show that the computational complexity of finding the optimal full reducer for a single relation is of the same order as that of finding the optimal full reducer for all the relations. The optimization algorithm is able to handle query graphs where more than one attribute is common between the relations. We also present a method for determining the optimum profitable semijoin program. The computational complexities of finding the optimum cost semijoin program is high. We present a low cost algorithm which determines a near optimal profitable semijoin program. We do this by converting a semijoin program into a partial order graph. This graph also allows us to maximize the concurrent processing of the semijoins. It is shown that the minimum response time is given by the largest cost path of the partial order graph. We can use this reducibility as a post optimizer for the SDD-1 query optimization algorithm. Finally, it is shown that the least upper bound on the length of any profitable semijoin program is N*(N−1) for a query graph of N nodes.
Index termsFull reducer semijoin program distributed databases profitable semijoin partial order graph
Unable to display preview. Download preview PDF.
- 1.P. Apers, A. Hevner, and S. B. Yao, "Optimization algorithms for distributed queries," IEEE Trans. on Software Engineering, vol. SE-9 No. 1, pp. 57–68, Jan. 1983.Google Scholar
- 5.D. Chiu and Y. Ho, "A methology for interpreting tree queries into optimal semi-join expressions," in Proc. ACM SIGMOD, May 1980, pp. 169–178.Google Scholar
- 6.R. Epstein, M. Stonebraker, and E. Wong, "Distributed query processing in a relational database system," in Proc. ACM SIGMOD, May 1978, pp. 169–180.Google Scholar
- 8.M. Stonebreaker, and E. Neuhold, "A Distributed database version of INGRESS," in Proc. second Berkeley workshop on Dist. Data Management and Computer Networks, 1977, pp. 19–36.Google Scholar
- 9.S. Su, L. Nguyen, A. Emam, and G. Lipovskky, "The Architectural Features and Implementation Techniques of the Multicell CASSM," IEEE Trans. on Computers, vol. C-28(6), pp. 430–445, June 1979.Google Scholar
- 10.E. Wong, "Retrieving dispersed data from SDD-1: A system of distributed databases," in Proc. second Berkeley wordshop on Dist. Data Management and Computer Networks, 1977, pp. 217–235.Google Scholar