Skip to main content
Log in

Query Processing and Optimization on the Web

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

The advent of the Internet and the Web and their subsequent ubiquity have brought forth opportunities to connect information sources across all types of boundaries (local, regional, organizational, etc.). Examples of such information sources include databases, XML documents, and other unstructured sources. Uniformly querying those information sources has been extensively investigated. A major challenge relates to query optimization. Indeed, querying multiple information sources scattered on the Web raises several barriers for achieving efficiency. This is due to the characteristics of Web information sources that include volatility, heterogeneity, and autonomy. Those characteristics impede a straightforward application of classical query optimization techniques. They add new dimensions to the optimization problem such as the choice of objective function, selection of relevant information sources, limited query capabilities, and unpredictable events. In this paper, we survey the current research on fundamental problems to efficiently process queries over Web data integration systems. We also outline a classification for optimization techniques and a framework for evaluating them.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. S. Adali, K.S. Candan, Y. Papakonstantinou, and V.S. Subrahmanian, “Query caching and optimization in distributed mediator systems,” in Proceeedings of ACM SIGMOD International Conference on Management of Data, Montreal, Canada, June 1996.

  2. B. Amann, C. Beeri, I. Fundulaki, and M. Scholl, “QueryingXMLsources using an ontology-based mediator,” in Proceedings of the Tenth International Conference on Cooperative Information Systems, Irvine, CA, USA, Oct. 2002.

  3. J.L. Ambite and C.A. Knoblock, “Flexible and scalable query planning in distributed and heterogeneous environments,” in Proceedings of the Fourth International Conference on Artificial Intelligence Planning Systems, Pitsburg, USA, June 1998.

  4. L. Amsaleg, P. Bonnet, M.J. Franklin, A. Tomasic, and T. Urhan, “Improving responsivness for wide-area data access,” IEEE Data Engineering Bulletin, vol. 20, no. 3, pp. 3–11, 1997.

    Google Scholar 

  5. G.O. Arocena and A.O. Mendelzon, “WebOQL: Restructuring documents, databases and Webs,” in Proceedings of the 14th International Conference on Data Engineering, Orlando, Florida, Feb. 1998.

  6. R.H. Arpaci-Dusseau, E. Anderson, N. Treuhaft, D.E. Culler, J.M. Hellerstein, D. Patterson, and K. Yelick, “Cluster I/O with River: Making the fast case common,” in Proceedings of the Sixth Workshop on I/O in Parallel and Distributed Systems. ACM Press, May 1999.

  7. R. Avnur and J. Hellerstein, “Eddies: Continuously adaptive query processing,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, Dallas, Texas, USA, May 2000.

  8. C. Batini, M. Lenzerini, and S.B. Navathe, “A comparative analysis of methodologies for database schema integration,” ACM Computing Surveys, vol. 18, no. 4, 1986.

  9. T. Berners-Lee, Services and Semantics: Web Architecture. http://www.w3.org/2001/04/30-tbl, 2001.

  10. T. Berners-Lee, R. Calliau, A. Luotonen, H.F. Nielsen, and A. Secret, “The world wideWeb,” CACM, vol. 37, no. 8, 1994.

  11. T. Berners-Lee, J. Hendler, and O. Lassila, “The semantic Web,” Scientific American, vol. 284, no. 5, 2001.

  12. E. Bertino and A. Bouguettaya, “Introduction to the special issue on database technology on the Web,” IEEE Internet Computing, vol. 6, no. 4, 2002.

  13. L. Bouganim, F. Fabret, C. Mohan, and P. Valduriez, “Dynamic query scheduling in data integration systems,” in Proceedings of the 16th International Conference on Data Engineering, San Diego, CA, USA, Feb.-March 2000.

  14. A. Bouguettaya, B. Benatallah, and A. Elmagarmid, Interconnecting Heterogeneous Information Systems, Kluwer Academic Publishers (ISBN 0-7923-8216-1), 1998.

  15. A. Bouguettaya and R. King, “Large multidatabases: Issues and directions,” in IFIP DS-5 Semantics of Interoperable Database Systems, E.K. Hsiao, E.J. Neuhold, and R. Sacks-Davis (Eds.), Elsevier Publishers, 1993.

    Google Scholar 

  16. A. Bouguettaya, R. King, and K. Zhao, “FINDIT: A server based approach to finding information in large scale heterogeneous databases,” in First International Workshop on Interoperability in Multidatabase Systems, Kyoto, Japan, April 1991.

  17. R. Braumandl, M. Keidl, A. Kemper, D. Kossmann, A. Kreutz, S. Seltzsam, and K. Stocker, “ObjectGlobe: Ubiquitous query processing on the internet,” The VLDB Journal, vol. 10, no. 1, 2001.

  18. M. Conti, M. Kumar, S.K. Das, and B.A. Shirazi, “Quality of service issues in internet Web services,” IEEE Transactions on Computers, vol. 51, no. 6, 2001.

  19. W. Du, R. Krishnamurthy, and M.-C. Shan, “Query optimization in a heterogeneous DBMS,” in Proceeedings of the 18th International Conference on Very Large Data Bases (VLDB), Vancouver, Canada, 1992.

  20. O.M. Duschka, Query Planning and Optimization in Information Integration, PhD thesis, Computer Science Department, Stanford University, 1997.

    Google Scholar 

  21. O.M. Duschka and M.R. Genesereth, “Query planning in infomaster,” in Proceedings of the Twelfth Annual ACM Symposium on Applied Computing, SAC '97, San Jose, CA, USA, Feb. 1997.

  22. D. Florescu, A. Levy, I. Manolescu, and D. Suciu, “Query optimization in the presence of limited access patterns,” in Proceedings ACM SIGMOD International Conference on Management of Data, Philadephia, Pennsylvania, USA, June 1999.

  23. H. Garcia-Molina, W. Labio, and R. Yerneni, “Capability sensitive query processing on internet sources,” in Proceedings of the 15th International Conference on Data Engineering, Sydney, Australia, March 1999.

  24. H. Garcia-Molina, Y. Papakonstantinou, D. Quass, A. Rajaraman, Y. Sagiv, J.D. Ullman, V. Vassalos, and J. Widom, “The TSIMMIS approach to mediation: Data models and languages,” Journal of Intelligent Information Systems, vol. 8, no. 2, 1997.

  25. G. Gardarin, F. Sha, and Z. Tang, “Calibrating the query optimizer cost model of IRO-DB,” in Proceeedings of the 22nd International Conference on Very Large Data Bases (VLDB), Bombay, India, Sept. 1996.

  26. G. Graefe, “Query evaluation techniques for large databases,” ACM Computing Survey, vol. 25, no. 2, 1993.

  27. L. Gravano and Y. Papakonstantinou, “Mediating and metasearching on the Internet,” IEEE Data Engineering Bulletin, vol. 21, no. 2, 1998.

  28. L.M. Haas, D. Kossmann, E.L. Wimmers, and J. Yang, “Optimizing queries across diverse data sources,” in Proceeedings of the 23rd International Conference on Very Large Data Bases (VLDB), Athens, Greece, Aug. 1997.

  29. D. Heimbigner and D. McLeod, “A federated architecture for information systems,” ACM Transactions on Office Information Systems, vol. 3, no. 3, 1985.

  30. J.M. Hellerstein, M.J. Franklin, S. Chnadrasekaran, A. Deshpande, K. Hildrum, S. Madden, V. Ramana, and M.A. Shah, “Adaptive query processing: Technology in evolution,” IEEE Data Engineering Bulletin, vol. 23, no. 2, 2000.

  31. A.R. Hurson, M.W. Bright, and H. Pakzad, Multidatabase Systems: An Advanced Solution for Global Information Sharing, IEEE Computer Society Press: Los Alamitos, CA, 1994.

    Google Scholar 

  32. Z. Ives, D. Florescu, M. Friedman, A. Levy, and D.Weld, “An adaptive query execution system for data integration,” in Proceedings of the ACMSIGMOD International Conference on Management of Data, Philadelphia, PA, USA, June 1999.

  33. D. Konopnicki and O. Shmueli, “WWW information gathering: The W3QL query language and the W3QS system,” ACM Transaction on Database Systems, vol. 23, no. 4, 1998.

  34. A. Levy, A. Rajaraman, and J. Ordille, “Querying heterogeneous information sources using source descriptions,” in Proceeedings of the 22nd International Conference on Very Large Data Bases (VLDB), Bombay, India, 1996.

  35. G. Lohman, “Grammer-like functional rules for representing query optimization alternatives,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, Chicago, 1988.

  36. R. MacGregor, “A deductive pattern matcher,” in Proceedings of AAAI-88, The National Conference on Artificial Intelligence, St. Paul, MN, USA, 1988.

  37. E. Mena, A. Illarramendi, V. Kashyap, and A. Sheth, “OBSERVER: An approach for query processing in global information systems based on interoperation across pre-existing ontologies,” International Journal Distributed and Parallel Databases, vol. 8, no. 2, 2000.

  38. A.O. Mendelzon, G.A. Mihaila, and T. Milo, “Queyring the world wideWeb,” International Journal on Digital Libraries, vol. 1,no.1, 1997.

  39. H. Naacke, G. Gardarin, and A. Tomasic, “Leveraging mediator cost models with heterogeneous data sources,” in Proceedings of the 14th International Conference on Data Engineering, Orlando, Florida, Feb. 1998.

  40. F. Naumann and U. Lesser, “Quality-driven integration of heterogeneous information systems,” in Proceeedings of the 25th International Conference on Very Large Data Bases (VLDB), Edinburgh, UK, Sept. 1999.

  41. M. Nodine, W. Bohrer, and A.H.H. Ngu, “Semantic brokering over dynamic heterogeneous data sources in InfoSleuth,” in Proceedings of the 15th International Conference on Data Engineering, Sydney, Australia, March 1999.

  42. F. Ozcan, S. Nural, P. Koskal, C. Evrendilek, and A. Dogac, “Dynamic query optimization in multidatabases,” IEEE Data Engineering Bulletin, vol. 20, no. 3, 1997.

  43. M. Ouzzani, B. Benatallah, and A. Bouguettaya, “Ontological approach for information discovery in internet databases,” Distributed and Parallel Databases, an International Journal, vol. 8, no. 3, 2000.

  44. M.T. Ozsu and P. Valduriez, Principles of Distributed Database Systems, Prentice Hall, 1999.

  45. J.S. Quarterman and J.C. Hoskins, “Notable computer networks,” Communications of the ACM, vol. 29, no. 10, 1986.

  46. M.T. Roth, F. Ozcan, and L.M. Haas, “Cost models do matter: Providing cost information for diverse data sources in a federated system,” in Proceedings of 25th International Conference on Very Large Data Bases, Edinburgh, Scotland, UK, Sept. 1999.

  47. M.T. Roth and P. Schwarz, “Don't scrap it, wrap it! A wrapper architecture for legacy data sources,” in Proceeedings of the 23rd International Conference on Very Large Data Bases (VLDB), Athens, Greece, Aug. 1997.

  48. A. Ruiz, R. Corchuelo, Duran, and M. Toro, “Automated support for quality requirements in web-based systems,” in Proceedings of the 8th IEEE Workshop on Future Trends of Distributed Computing Systems, Bologna, Italy, IEEE, Oct.-Nov. 2001.

  49. P. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, and T. Price, “Access path selection in a relational database management system,” in Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, P.A. Bernstein (Ed), Boston, Massachusetts, ACM, May 30-June 1, 1979.

    Google Scholar 

  50. M.-C. Shan, “Pegasus architecture and design principles,” in Proceeedings of theACMSIGMOD International Conference on Management of Data, Washington, DC, USA, June 1993.

  51. A.P. Sheth and J.A. Larson, “Federated database systems and managing distributed, heterogeneous, and autonomous databases,” ACM Computing Surveys, vol. 22, no. 3, pp. 183–226, 1990.

    Google Scholar 

  52. S. Spaccapietra and C. Parent, “A step forward in solving structural conflicts,” IEEE Transactions on Knowledge and Data Engineering, vol. 6, no. 2, 1994.

  53. A. Tomasic, L. Rashid, and P. Valduriez, “Scaling heterogeneous database and design of DISCO,” in Proceedings of the 16th International Conference on Distributing Computing Systems (ICDCS), Hong Kong, May 1996.

  54. G. Wiederhold, “Mediators in the architecture of future information systems,” IEEE Computer, vol. 25, no. 3, 1992.

  55. Y. Yerneni, C. Li, J. Ullman, and H. Garcia-Molina, “Optimizing large join queries in mediation systems,” in Proceedings of the International Conference Database Theory, Al Qods, Jan. 1999.

  56. K. Zhao, R. King, and A. Bouguettaya, “Incremental specification of views across databases,” in First International Workshop on Interoperability in Multidatabase Systems, Kyoto, Japan, April 1991.

  57. Q. Zhu and P. Larson, “Global query processing and optimization in the CORDS multidatabase system,” in Proceedings of ACMSIGMOD International Conference on Management of Data, San Jose, CA, USA, 1995.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ouzzani, M., Bouguettaya, A. Query Processing and Optimization on the Web. Distributed and Parallel Databases 15, 187–218 (2004). https://doi.org/10.1023/B:DAPD.0000018574.71588.06

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:DAPD.0000018574.71588.06

Navigation