Skip to main content
Log in

Optimizing Recursive Information Gathering Plans in EMERAC

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

In this paper we describe two optimization techniques that are specially tailored for information gathering. The first is a greedy minimization algorithm that minimizes an information gathering plan by removing redundant and overlapping information sources without loss of completeness. We then discuss a set of heuristics that guide the greedy minimization algorithm so as to remove costlier information sources first. In contrast to previous work, our approach can handle recursive query plans that arise commonly in the presence of constrained sources. Second, we present a method for ordering the access to sources to reduce the execution cost. This problem differs significantly from the traditional database query optimization problem as sources on the Internet have a variety of access limitations and the execution cost in information gathering is affected both by network traffic and by the connection setup costs. Furthermore, because of the autonomous and decentralized nature of the Web, very little cost statistics about the sources may be available. In this paper, we propose a heuristic algorithm for ordering source calls that takes these constraints into account. Specifically, our algorithm takes both access costs and traffic costs into account, and is able to operate with very coarse statistics about sources (i.e., without depending on full source statistics). Finally, we will discuss implementation and empirical evaluation of these methods in Emerac, our prototype information gathering system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Raschid, L., Tomasic, A., and Valduriez, P. (1998). Scaling Access to Heterogeneous Data Sources with Disco. IEEE TKDE, 10(5).

  • Abiteboul, S. and Duschka, O.M. (1998). Complexity of Answering Queries Using Materialized Views. In ‘98.

  • Adali, S., Candan, K.S., Papakonstantinou, Y., and Subrahmanian, V.S. (1996). Query Caching and Optimization in Distributed Mediator Systems. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (pp. 137-148).

  • Adali, S., Candan, K.S., Papakonstantinou, Y., and Subrahmanian V.S. (1996). Query Caching and Optimization in Distributed Mediator Systems. In Proceedings of the ACMSigmod International Conference on Management of Data (pp. 137-148).

  • Chaudhuri, S. (1998). An Overview of Query Optimization in Relational Systems. In Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS 98 (pp. 34-43).

  • Chaudhuri, S., Krishnamurthy, R., Potamianos, S., and Shim, K. (1995). Optimizing Queries with Materialized Views. In Proceedings of the Eleventh International Conference on Data Engineering (pp. 190-200). Los Alamitos, CA: IEEE Comput. Soc. Press.

    Google Scholar 

  • Chaudhuri, S. and Shim, K. (1993). Query Optimization in the Presence of Foreign Functions. In Proc. 19th VLDB Conference.

  • Chawathe, S., Garcia-Molina, H., Hammer, J., Ireland, K., Papakonstantinou, Y., Ullman, J., and Widom, J. (1994). The TSIMMIS Project: Integration of Heterogeneous Information Sources. In Proceedings of the 100th Anniversary Meeting (pp. 7-18). Tokyo, Japan: Information Processing Society of Japan.

    Google Scholar 

  • Duschka, O.M. (1997). Query Optimization Using Local Completeness. In Proceedings of the Fourteenth National Conference on Artificial Intelligence, AAAI-97 (pp. 249-255). Providence, RI.

  • Duschka, O.M. and Genesereth, M.R. (1997). Answering Recursive Queries Using Views. In ‘97 (pp. 109-116). Tucson, AZ.

  • Duschka, O.M. and Levy, A.Y. (1997). Recursive Plans for Information Gathering. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, IJCAI. Nagoya, Japan.

  • Duschka, O.M. (1997). Query Planning and Optimization in Information Integration. PhD thesis, Stanford University.

  • Elmasri, R. and Navathe, S.B. (1994). Fundamentals of Database Systems, 2nd edn. The Benjamin/Cummings Publishing Company, Inc.

  • Etzioni, O., Golden, K., and Weld, D. (1997). Sound and Efficient Closed-World Reasoning for Planning. Artificial Intelligence, 89(1/2), 113-148.

    Google Scholar 

  • Florescu, D., Koller, D., Levy, A.Y., and Pfeffer, A. (1997). Using Probabilistic Information in Data Integration. In Proceedings of VLDB-97.

  • Florescu, D., Levy, A., Manolescu, I., and Suciu, D. (1999). Query Optimization in the Presence of Limited Access Patterns. In Proc. SIGMOD Conference.

  • Florescu, D., Levy, A., and Mendelzon, A. (1998). Database Techniques forWorld-WideWeb:ASurvey. SIGMOD Record.

  • Friedman, M. and Weld, D.S. (1997). Efficiently Executing Information-Gathering Plans. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, IJCAI, Nagoya, Japan.

  • Garcia-Molina, H., Labio,W., and Yerneni, R. (1999). Capability Sensitive Query Processing on Internet Sources. In Proc. ICDE.

  • Garcia-Molina, H., Papakonstantinou, Y., Quass, D., Rajaraman, A. Sagiv, Y., Ullman, J.D., Vassalos, V., and Widom, J. (1997). The TSIMMIS Approach to Mediation: Data Models and Languages. Journal of Intelligent Information Systems, 8(2), 117-132.

    Google Scholar 

  • Geddis, D.F., Genesereth, M.R., Keller, A.M., and Singh, N.P. (1995). Infomaster: A Virtual Information System. In ‘95. Baltimore, MD.

  • Haas, L.M.,Kossmann, D.,Wimmers, E.L., andYang, J. (1997). Optimizing Queries Across Diverse Data Sources. In Proc. VLDB.

  • Halevy, A. (2001). Answering Queries Using Views: A Survey. VLDB Journal.

  • Hsu, C.-N. (1998). Initial Results on Wrapping Semistructured Web Pages with Finite-State Transducers and Contextual Rules. In Proceedings of the AAAI Workshop on AI and Information Integration (pp. 66-73).

  • Gruser, L.R.J. and Zadorozhny, V. (2000). Learning Response Time for Websources Using Query Feedback and Application in Query Optimization. The VLDB Journal, 9.

  • Kambhampati, S. and Gnanaprakasam, S. (1999). Optimizing Source-Call Ordering in Information Gathering Plans. In Proc. IJCAI-99 Workshop on Intelligent Information Integration.

  • Kwok, C.T. and Weld, D.S. (1996). Planning to Gather Information. In Proceedings of the AAAI Thirteenth National Conference on Artificial Intelligence.

  • Lambrecht, E. (1998). Optimizing Recursive Information Gathering Plans. Master's thesis, Arizona State University.

  • Lambrecht, E. and Kambhampati, S. (1997). Planning for Information Gathering: A Tutorial Survey. Technical Report ASU CSE TR 97-017, Arizona State University. Available at: rakaposhi.eas.asu.edu/ig-tr.ps.

  • Lambrecht, E. and Kambhampati, S. (1998). Optimizing Information Gathering Plans. In Proc. AAAI-98Workshop on Intelligent Information Integration.

  • Lambrecht, E., Kambhampati, S., and Gnanaprakasam, S. (1999). Optimizing Recursive Information Gathering Plans. In Proc. IJCAI.

  • Levy, A.Y. (1996). Obtaining Complete Answers from Incomplete Databases. In Proceedings of the 22nd International Conference on Very Large Databases (pp. 402-412). Bombay, India.

  • Levy, A.Y., Rajaraman, A., and Ordille, J.J. (1996). Querying Heterogeneous Information Sources Using Source Descriptions. In Proceedings of the 22nd International Conference on Very Large Databases (pp. 251-262). Bombay, India.

  • Lohman, G., Haas, L.M., Freytag, J. and Pirahesh, H. (1989). Extensible Query Processing in Starburst. In Proceedings of SIGMOD.

  • Morris, K.A. (1988). An Algorithm for Ordering Subgoals in Nail! In Proceedings of PODS.

  • Nie, Z. and Kambhampati, S. (2001). Joint Optimization of Cost and Coverage of Query Plans in Data Integration. In Proc. CIKM.

  • Nie, Z., Nambiar, U., Vaddi, S., and Kambhampati, S. (2002). Mining Coverage Atatistics forWebsource Selection in a Mediator. In Proc. CIKM.

  • Nie, Z., Kambhampati, S., Nambiar, U., and Vaddi, S. (2001). Mining Source Coverage Statistics for Data Integration. in a Mediator. In Proc. CIKM.

  • Nie, Z., Kambhampati, S., Nambiar, U., and Vaddi, S. (2001). Mining Source Coverage Statistics for Data Integration. In Proc. Web Information and Data Management (WIDM) Workshop.

  • Qian, X. (1996). Query Folding. In Proceedings of the 12th International Conference on Data Engineering (pp. 48-55), New Orleans, LA.

  • Sagiv, Y. (1988). Optimizing Datalog Programs, ch. 17. M. Kaufmann Publishers.

  • Tomasic, A., Raschid, L., and Valduriez, P. (1997). A Data Model and Query Processing Techniques for Scaling Access to Distributed Heterogeneous Databases in Disco. IEEE Transactions on Computers, special issue on Distributed Computing Systems.

  • Ullman, J.D. (1989). Principles of Database and Knowledgebase Systems, vol. 2. Computer Science Press.

  • Urhan, T. and Franklin, M. (1998). Cost-Based Query Scrambling for Initial Delays. In Proceedings of SIGMOD.

  • Vassalos, V. and Papakonstantinou, Y. (1998). Using Knowledge of Tedundancy for Query Optimization in Mediators. In Proceedings of the AAAI Workshop on AI and Information Integration (pp. 29-35).

  • Vassalos, V. and Papakonstantinou, Y. (1997). Describing and Using Query Capabilities of Heterogeneous Sources. In Proc. VLDB.

  • Yerneni, R. and Li, C. (1999). Optimizing Large Join Queries in Mediation Systems. In Proc. International Conference on Database Theory.

  • Zhu, Q. and Larson, P.-A. (1996). Developing Regression Cost Models for Multidatabase Systems. In Proceedings of PDIS.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Subbarao Kambhampati.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kambhampati, S., Lambrecht, E., Nambiar, U. et al. Optimizing Recursive Information Gathering Plans in EMERAC. Journal of Intelligent Information Systems 22, 119–153 (2004). https://doi.org/10.1023/B:JIIS.0000012467.66268.9e

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:JIIS.0000012467.66268.9e

Navigation