Skip to main content
Log in

Optimizing distributed join queries: A genetic algorithm approach

  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

ptimizing join queries is a major problem in distributed database systems, particularly when files are replicated and copies stored at different nodes in the network. A distributed query optimization algorithm must select file copies and determine how and where those files will be processed. Process decisions include which files to reduce via semijoins, if any, the sites at which to perform join operations, and the order in which to perform those join operations. We extend the scope of distributed query optimization research by develop-ing a model that, for the first time, includes all of these design decisions and considers both communication and local processing costs. We develop a genetic algorithm-based solution procedure for this model which quickly determines efficient query processing plans. We demonstrate that ignoring local processing costs or restricting join processing to the result site, as commonly done in prior research, can result in inefficient query execution plans.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. P.M.G. Apers, A.R. Hevner and S.B. Yao, Optimization algorithms for distributed queries, IEEE Transactions on Software Engineering SE-9(1983)57–68.

    Google Scholar 

  2. K. Bennett, M.C. Ferris and Y.E. Ioannidis, A genetic algorithm for database query optimization, Proceedings of the 4th International Conference on Genetic Algorithms, San Diego, CA, July 1991, pp. 400–407.

  3. P.A. Bernstein and D.W. Chiu, Using semi-joins to solve relational queries, Journal of the ACM 28(1981)25–40.

    Google Scholar 

  4. P.A. Bernstein et al., Query processing in a system for distributed databases (SDD-1), ACM Transactions on Database Systems 6(1981)602–625.

    Google Scholar 

  5. M.-S. Chen and P.S. Yu, Interleaving a join sequence with semijoins in distributed query processing, IEEE Transactions on Parallel and Distributed Systems 3(1992)611–621.

    Google Scholar 

  6. M.-S. Chen and P.S. Yu, Combining join and semijoin operations for distributed query processing, IEEE Transactions on Knowledge and Data Engineering 5(1993)534–542.

    Google Scholar 

  7. M.-S. Chen and P.S. Yu, A graph theoretical approach to determine a join reducer sequence in distributed query processing, IEEE Transactions on Knowledge and Data Engineering 6(1994) 152–165.

    Google Scholar 

  8. J.S.J. Chen and V.O.K. Li, Optimizing joins in fragmented database systems on a broadcast local network, IEEE Transactions on Software Engineering 15(1989)26–38.

    Google Scholar 

  9. S. Christodoulakis, Implications of certain assumptions in database performance evaluation, ACM Transactions on Database Systems 9(1984)163–186.

    Google Scholar 

  10. D.W. Cornell and P.S. Yu, Site assignment for relations and join operations in the distributed transaction processing environment, Proceedings of the 4th International Conference on Data Engineering, 1988, pp. 100–108.

  11. D.W. Cornell, and P.S. Yu, On optimal site assignment for relations in the distributed database environment, IEEE Transactions on Software Engineering 15(1989)1004–1009.

    Google Scholar 

  12. L. Davis (ed.), Handbook of Genetic Algorithms, Van Nostrand Reinhold, New York, 1991.

    Google Scholar 

  13. K.A. De Jong, Genetic-algorithm-based learning, in: Machine Learning, Y. Kodratoff and R.S. Michalski, eds., Morgan Kaufmann, 1990, pp. 611–638.

  14. R. Elmasri and S.B. Navathe, Fundamentals of Database Systems, Benjamin/Cummings, 1989.

  15. D.E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, 1989.

  16. A.R. Hevner and S.B. Yao, Query processing in distributed database systems, IEEE Transactions on Software Engineering SE-5(1979)177–187.

    Google Scholar 

  17. A.R. Hevner and S.B. Yao, Querying distributed databases on local area networks, Proceedings of the IEEE 75(1987)563–572.

    Google Scholar 

  18. J.H. Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor, MI, 1975.

    Google Scholar 

  19. E.S.H. Hou, N. Ansari and H. Ren, A genetic algorithm for multiprocessor scheduling, IEEE Transactions on Parallel and Distributed Systems 5(1994)113–120.

    Google Scholar 

  20. S. Lafortune and E. Wong, A state transition model for distributed query processing, ACM Transactions on Database Systems 11(1986)294–322.

    Google Scholar 

  21. E.-P. Lim and J. Srivastava, Query optimization/processing in federated database systems, Proceedings of Conference on Information and Knowledge Management, 1993.

  22. G.M. Lohman et al., Query processing in R*, in: Query Processing in Database Systems, W. Kim et al., eds., Springer, Berlin, 1985, pp. 31–47.

    Google Scholar 

  23. H. Lu and M.J. Carey, Some experimental results on distributed join algorithms in a local network, Proceedings of the 10th International Conference on VLDB, Stockholm, 1985, pp. 292–304.

  24. L.F. Mackert and G.M. Lohman, R* optimizer validation and performance evaluation for distributed queries, Proceedings of the 12th International Conference on VLDB, Kyoto, 1986, pp. 149–159.

  25. S.T. March and S. Rho, Allocating data and operations to nodes in distributed database design, IEEE Transactions on Knowledge and Data Engineering 7(1995)305–317.

    Google Scholar 

  26. T.P. Martin, K.H. Lam and J.L. Russell, Evaluation of site selection algorithms for distributed query processing, Computer Journal 33(1990)61–70.

    Google Scholar 

  27. P. Mishra and M.H. Eich, Join processing in relational databases, ACM Computing Surveys 24 (1992)63–113.

    Google Scholar 

  28. I. Oliver, D. Smith and J.H. Holland, A study of permutation crossover operators on the traveling salesman problem, Proceedings of the 2nd International Conference on Genetic Algorithms, Cambridge, MA, July 28–31, 1987, pp. 224–230.

  29. C.B. Petty, M.R. Leuze and J.J. Grefenstette, A parallel genetic algorithm, Proceedings of the 2nd International Conference on Genetic Algorithms, 1987, pp. 155–161.

  30. S. Pramanik and D. Vineyard, Optimizing join queries in distributed databases, IEEE Transactions on Software Engineering 14(1988)1319–1326.

    Google Scholar 

  31. S. Rho and S.T. March, A nested genetic algorithm for distributed database design, Proceeding of the 27th Hawaii International Conference on System Sciences, January 1994.

  32. S. Rho, Distributed database design: Allocation of data and operations to nodes in distributed database systems, unpublished Ph.D. Dissertation, University of Minnesota, May 1995.

  33. A. Segev, Optimization of join operations in horizontally partitioned database systems, ACM Transactions on Database Systems 11(1986)48–80.

    Google Scholar 

  34. A. Segev, Strategies for distributed query optimization, Information Sciences 54(1991)67–88.

    Google Scholar 

  35. D. Shasha and T. Wang, Optimizing equijoin queries in distributed databases where relations are hash partitioned, ACM Transactions on Database Systems 16(1991)279–308.

    Google Scholar 

  36. G. Syswerda, Uniform crossover in genetic algorithm, Proceedings of the 3rd International Conference on Genetic Algorithms, 1989, pp. 2–9.

  37. R. Tanese, Parallel genetic algorithms for a hypercube, Proceedings of the 2nd International Conference on Genetic Algorithms, 1987, pp. 177–183.

  38. D. Whitley, T. Starkweather and D. Fuquay, Problems and traveling salesmen: The genetic edge recombination operator, Proceedings of the 3rd International Conference on Genetic Algorithms, Morgan Kaufmann, 1989.

  39. H. Yoo and S. Lafortune, An intelligent search method for query optimization by semijoins, IEEE Transactions on Knowledge and Data Engineering 1(1989)226–237.

    Google Scholar 

  40. C.T. Yu and C.C. Chang, On the design of a distributed query processing algorithm, Proceedings of the ACM-SIGMOD International Conference on the Management of Data, San Jose, 1983, pp. 30–39.

  41. C.T. Yu and C.C. Chang, Distributed query processing, ACM Computing Surveys 16(1984)399–433.

    Google Scholar 

Download references

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rho, S., March, S.T. Optimizing distributed join queries: A genetic algorithm approach. Annals of Operations Research 71, 199–228 (1997). https://doi.org/10.1023/A:1018967414664

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1018967414664

Keywords

Navigation