Skip to main content

Optimizing Large Join Queries in Mediation Systems

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1540))

Abstract

In data integration systems, queries posed to a mediator need to be translated into a sequence of queries to the underlying data sources. In a heterogeneous environment, with sources of diverse and limited query capabilities, not all the translations are feasible. In this paper, we study the problem of finding feasible and efficient query plans for mediator systems. We consider conjunctive queries on mediators and model the source capabilities through attribute-binding adornments. We use a simple cost model that focuses on the major costs in mediation systems, those involved with sending queries to sources and getting answers back. Under this metric, we develop two algorithms for source query sequencing - one based on a simple greedy strategy and another based on a partitioning scheme. The first algorithm produces optimal plans in some scenarios, and we show a linear bound on its worst case performance when it misses optimal plans. The second algorithm generates optimal plans in more scenarios, while having no bound on the margin by which it misses the optimal plans. We also report on the results of the experiments that study the performance of the two algorithms.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. P. Apers, A. Hevner, S. Yao. Optimization Algorithms for Distributed Queries. In IEEE Trans. Software Engineering, 9(1), 1983.

    Google Scholar 

  2. P. Bernstein, N. Goodman, E. Wong, C. Reeve, J. Rothnie. Query Processing in a System for Distributed Databases (SDD-1). In ACM Trans. Database Systems, 6(4), 1981.

    Google Scholar 

  3. S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, J. Widom. The TSIMMIS project: Integration of heterogeneous information sources. In IPSJ, Japan, 1994.

    Google Scholar 

  4. S. Cluet, G. Moerkotte. On the Complexity of Generating Optimal Left-deep Processing Trees with Cross Products. In ICDT Conference, 1995.

    Google Scholar 

  5. R. Epstein, M. Stonebraker. Analysis of Distributed Database Strategies. In VLDB Conference, 1980.

    Google Scholar 

  6. C. Galindo-Legaria, A. Pellenkoft, M. Kersten. Fast, Randomized Join Order Selection-Why Use Transformations? In VLDB Conference, 1994.

    Google Scholar 

  7. M. Garey, D. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, San Francisco, 1979.

    MATH  Google Scholar 

  8. L. Haas, D. Kossman, E.L. Wimmers, J. Yang. Optimizing queries across diverse data sources. In VLDB Conference, 1997.

    Google Scholar 

  9. J. Hammer, H. Garcia-Molina, S. Nestorov, R. Yerneni, M. Breunig, V. Vassalos. Template-Based Wrappers in the TSIMMIS System. In SIGMOD Conference, 1997.

    Google Scholar 

  10. T. Ibaraki, T. Kameda. On the Optimal Nesting Order for Computing N-relational Joins. In ACM Trans. Database Systems, 9(3), 1984.

    Google Scholar 

  11. Y. Ioannidis, Y. Kang. Randomized Algorithms for Optimizing Large Join Queries. In SIGMOD Conference, 1990.

    Google Scholar 

  12. Y. Ioannidis, E. Wong. Query Optimization by Simulated Annealing. In SIGMOD Conference, 1987.

    Google Scholar 

  13. R. Krishnamurthy, H. Boral, C. Zaniolo. Optimization of Non-recursive Queries. In VLDB Conference, 1986.

    Google Scholar 

  14. A. Levy, A. Rajaraman, J. Ordille. Querying Heterogeneous Information Sources Using Source Descriptions. In VLDB Conference, 1996.

    Google Scholar 

  15. C. Li, R. Yerneni, V. Vassalos, H. Garcia-Molina, Y. Papakonstantinou, J. Ullman, M. Valiveti. Capability Based Mediation in TSIMMIS. In SIGMOD Conference, 1998.

    Google Scholar 

  16. K. Morris. An algorithm for ordering subgoals in NAIL!. In ACM PODS, 1988.

    Google Scholar 

  17. K. Ono, G. Lohman. Measuring the Complexity of Join Enumeration in Query Optimization. In VLDB Conference, 1990.

    Google Scholar 

  18. C. Papadimitriou, K. Steiglitz. Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, 1982.

    Google Scholar 

  19. Y. Papakonstantinou, A. Gupta, L. Haas. Capabilities-based Query Rewriting in Mediator Systems. In PDIS Conference, 1996.

    Google Scholar 

  20. A. Pellenkoft, C. Galindo-Legaria, M. Kersten. The Complexity of Transformation-Based Join Enumeration. In VLDB Conference, 1997.

    Google Scholar 

  21. W. Scheufele, G. Moerkotte. On the Comlexity of Generating Optimal Plans with Cartesian Products. In PODS Conference, 1997.

    Google Scholar 

  22. P. Selinger, M. Adiba. Access Path Selection in Distributed Databases Management Systems. In Readings in Database Systems. Edited by M. Stonebraker. Morgan-Kaufman Publishers, 1994.

    Google Scholar 

  23. M. Steinbrunn, G. Moerkotte, A. Kemper. Heuristic and Randomized Optimization for the Join Ordering Problem. In VLDB Journal, 6(3), 1997.

    Google Scholar 

  24. A. Swami. Optimization of Large Join Queries: Combining Heuristic and Combinatorial Techniques. In SIGMOD Conference, 1989.

    Google Scholar 

  25. A. Swami, A. Gupta. Optimization of Large Join Queries. In SIGMOD Conference, 1988.

    Google Scholar 

  26. A. Tomasic, L. Raschid, P. Valduriez. Scaling Heterogeneous Databases and the Design of Disco. In Int. Conf. on Distributed Computing Systems, 1996.

    Google Scholar 

  27. J. Ullman. Principles of Database and Knowledge-base Systems, Volumes I, II. Computer Science Press, Rockville MD.

    Google Scholar 

  28. J. Ullman, M. Vardi. The Complexity of Ordering Subgoals. In ACM PODS, 1988.

    Google Scholar 

  29. B. Vance, D. Maier. Rapid Bushy Join-Order Optimization with Cross Products. In SIGMOD Conference, 1996.

    Google Scholar 

  30. V. Vassalos, Y. Papakonstantinou. Describing and using query capabilities of heterogeneous sources. In VLDB Conference, 1997.

    Google Scholar 

  31. G. Wiederhold. Mediators in the Architecture of Future Information Systems. In IEEE Computer, 25:38–49, 1992.

    Google Scholar 

  32. R. Yerneni, C. Li, J. Ullman, H. Garcia-Molina. Optimizing Large Join Queries in Mediation Systems. http://www-db.stanford.edu/pub/papers/ljq.ps

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yerneni, R., Li, C., Ullman, J., Garcia-Molina, H. (1999). Optimizing Large Join Queries in Mediation Systems. In: Beeri, C., Buneman, P. (eds) Database Theory — ICDT’99. ICDT 1999. Lecture Notes in Computer Science, vol 1540. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49257-7_22

Download citation

  • DOI: https://doi.org/10.1007/3-540-49257-7_22

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65452-0

  • Online ISBN: 978-3-540-49257-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics