Optimizing Large Join Queries in Mediation Systems

  • Ramana Yerneni
  • Chen Li
  • Jeffrey Ullman
  • Hector Garcia-Molina
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1540)

Abstract

In data integration systems, queries posed to a mediator need to be translated into a sequence of queries to the underlying data sources. In a heterogeneous environment, with sources of diverse and limited query capabilities, not all the translations are feasible. In this paper, we study the problem of finding feasible and efficient query plans for mediator systems. We consider conjunctive queries on mediators and model the source capabilities through attribute-binding adornments. We use a simple cost model that focuses on the major costs in mediation systems, those involved with sending queries to sources and getting answers back. Under this metric, we develop two algorithms for source query sequencing - one based on a simple greedy strategy and another based on a partitioning scheme. The first algorithm produces optimal plans in some scenarios, and we show a linear bound on its worst case performance when it misses optimal plans. The second algorithm generates optimal plans in more scenarios, while having no bound on the margin by which it misses the optimal plans. We also report on the results of the experiments that study the performance of the two algorithms.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    P. Apers, A. Hevner, S. Yao. Optimization Algorithms for Distributed Queries. In IEEE Trans. Software Engineering, 9(1), 1983.Google Scholar
  2. 2.
    P. Bernstein, N. Goodman, E. Wong, C. Reeve, J. Rothnie. Query Processing in a System for Distributed Databases (SDD-1). In ACM Trans. Database Systems, 6(4), 1981.Google Scholar
  3. 3.
    S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, J. Widom. The TSIMMIS project: Integration of heterogeneous information sources. In IPSJ, Japan, 1994.Google Scholar
  4. 4.
    S. Cluet, G. Moerkotte. On the Complexity of Generating Optimal Left-deep Processing Trees with Cross Products. In ICDT Conference, 1995.Google Scholar
  5. 5.
    R. Epstein, M. Stonebraker. Analysis of Distributed Database Strategies. In VLDB Conference, 1980.Google Scholar
  6. 6.
    C. Galindo-Legaria, A. Pellenkoft, M. Kersten. Fast, Randomized Join Order Selection-Why Use Transformations? In VLDB Conference, 1994.Google Scholar
  7. 7.
    M. Garey, D. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, San Francisco, 1979.MATHGoogle Scholar
  8. 8.
    L. Haas, D. Kossman, E.L. Wimmers, J. Yang. Optimizing queries across diverse data sources. In VLDB Conference, 1997.Google Scholar
  9. 9.
    J. Hammer, H. Garcia-Molina, S. Nestorov, R. Yerneni, M. Breunig, V. Vassalos. Template-Based Wrappers in the TSIMMIS System. In SIGMOD Conference, 1997.Google Scholar
  10. 10.
    T. Ibaraki, T. Kameda. On the Optimal Nesting Order for Computing N-relational Joins. In ACM Trans. Database Systems, 9(3), 1984.Google Scholar
  11. 11.
    Y. Ioannidis, Y. Kang. Randomized Algorithms for Optimizing Large Join Queries. In SIGMOD Conference, 1990.Google Scholar
  12. 12.
    Y. Ioannidis, E. Wong. Query Optimization by Simulated Annealing. In SIGMOD Conference, 1987.Google Scholar
  13. 13.
    R. Krishnamurthy, H. Boral, C. Zaniolo. Optimization of Non-recursive Queries. In VLDB Conference, 1986.Google Scholar
  14. 14.
    A. Levy, A. Rajaraman, J. Ordille. Querying Heterogeneous Information Sources Using Source Descriptions. In VLDB Conference, 1996.Google Scholar
  15. 15.
    C. Li, R. Yerneni, V. Vassalos, H. Garcia-Molina, Y. Papakonstantinou, J. Ullman, M. Valiveti. Capability Based Mediation in TSIMMIS. In SIGMOD Conference, 1998.Google Scholar
  16. 16.
    K. Morris. An algorithm for ordering subgoals in NAIL!. In ACM PODS, 1988.Google Scholar
  17. 17.
    K. Ono, G. Lohman. Measuring the Complexity of Join Enumeration in Query Optimization. In VLDB Conference, 1990.Google Scholar
  18. 18.
    C. Papadimitriou, K. Steiglitz. Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, 1982.Google Scholar
  19. 19.
    Y. Papakonstantinou, A. Gupta, L. Haas. Capabilities-based Query Rewriting in Mediator Systems. In PDIS Conference, 1996.Google Scholar
  20. 20.
    A. Pellenkoft, C. Galindo-Legaria, M. Kersten. The Complexity of Transformation-Based Join Enumeration. In VLDB Conference, 1997.Google Scholar
  21. 21.
    W. Scheufele, G. Moerkotte. On the Comlexity of Generating Optimal Plans with Cartesian Products. In PODS Conference, 1997.Google Scholar
  22. 22.
    P. Selinger, M. Adiba. Access Path Selection in Distributed Databases Management Systems. In Readings in Database Systems. Edited by M. Stonebraker. Morgan-Kaufman Publishers, 1994.Google Scholar
  23. 23.
    M. Steinbrunn, G. Moerkotte, A. Kemper. Heuristic and Randomized Optimization for the Join Ordering Problem. In VLDB Journal, 6(3), 1997.Google Scholar
  24. 24.
    A. Swami. Optimization of Large Join Queries: Combining Heuristic and Combinatorial Techniques. In SIGMOD Conference, 1989.Google Scholar
  25. 25.
    A. Swami, A. Gupta. Optimization of Large Join Queries. In SIGMOD Conference, 1988.Google Scholar
  26. 26.
    A. Tomasic, L. Raschid, P. Valduriez. Scaling Heterogeneous Databases and the Design of Disco. In Int. Conf. on Distributed Computing Systems, 1996.Google Scholar
  27. 27.
    J. Ullman. Principles of Database and Knowledge-base Systems, Volumes I, II. Computer Science Press, Rockville MD.Google Scholar
  28. 28.
    J. Ullman, M. Vardi. The Complexity of Ordering Subgoals. In ACM PODS, 1988.Google Scholar
  29. 29.
    B. Vance, D. Maier. Rapid Bushy Join-Order Optimization with Cross Products. In SIGMOD Conference, 1996.Google Scholar
  30. 30.
    V. Vassalos, Y. Papakonstantinou. Describing and using query capabilities of heterogeneous sources. In VLDB Conference, 1997.Google Scholar
  31. 31.
    G. Wiederhold. Mediators in the Architecture of Future Information Systems. In IEEE Computer, 25:38–49, 1992.Google Scholar
  32. 32.
    R. Yerneni, C. Li, J. Ullman, H. Garcia-Molina. Optimizing Large Join Queries in Mediation Systems. http://www-db.stanford.edu/pub/papers/ljq.ps

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Ramana Yerneni
    • 1
  • Chen Li
    • 1
  • Jeffrey Ullman
    • 1
  • Hector Garcia-Molina
    • 1
  1. 1.Stanford UniversityUSA

Personalised recommendations