Abstract
Data integration is a problem at the intersection of the fields of Artificial Intelligence and Database Systems. The goal of a data integration system is to provide a uniform interface to a multitude of data sources, whether they are within one enterprise or on the World-Wide Web. The key challenges in data integration arise because the data sources being integrated have been designed independently for autonomous applications, and their contents are related in subtle ways. As a result, a data integration system requires rich formalisms for describing contents of data sources and relating between contents of different sources. This paper discusses works aimed at applying techniques from Artificial Intelligence to the problem of data integration. In addition to employing Knowledge Representation techniques for describing contents of information sources, projects have also made use of Machine Learning techniques for extracting data from sources and planning techniques for query optimization. The paper also outlines future opportunities for applying AI techniques in the context of data integration.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Jos Luis Ambite, Naveen Ashish, Greg Barish, Craig A. Knoblock, Steven Minton, Pragnesh J. Modi, Ion Muslea, Andrew Philpot, and Sheila Tejada. ARIADNE: A system for constructing mediators for internet sources (system demonstration). In Proc. of ACM SIGMOD Conf. on Management of Data, Seattle, WA, 1998.
Serge Abiteboul. Querying semi-structured data. In Proc. of the Int. Conf. on Database Theory (ICDT), Delphi, Greece, 1997.
S. Adali, K. Candan, Y. Papakonstantinou, and V.S. Subrahmanian. Query caching and optimization in distributed mediator systems. In Proc. of ACM SIGMOD Conf. on Management of Data, Montreal, Canada, 1996.
S. Abiteboul and O. Duschka. Complexity of answering queries using materialized views. In Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), Seattle, WA, 1998.
Naveen Ashish and Craig A. Knoblock. Wrapper generation for semi-structured internet sources. SIGMOD Record, 26(4):8–15, 1997.
Yigal Arens, Craig A. Knoblock, and Wei-Min Shen. Query reformulation for dynamic information integration. International Journal on Intelligent and Cooperative Information Systems, (6)2/3:99–130, June 1996.
Gustavo Arocena and Alberto Mendelzon. WebOQL: Restructuring documents, databases and webs. In Proc. of Int. Conf. on Data Engineering (ICDE), Orlando, Florida, 1998.
Paolo Atzeni, Giansalvatore Mecca, and Paolo Merialdo. Design and maintenance of data-intensive web sites. In Proc. of the Conf. on Extending Database Technology (EDBT), Valencia, Spain, 1998.
D. Angluin. Inference of reversible languages. Journal of the ACM, 29(3):741–65, 1982.
C. Beeri, G. Elber, T. Milo, Y. Sagiv, O. Shmueli, N. Tishby, Y. Kogan, D. Konopnicki, P. Mogilevski, and N. Slonim. Websuite-a tool suite for harnessing web data. In Proceedings of the International Workshop on the Web and Databases, Valencia, Spain, 1998.
Catriel Beeri, Alon Y. Levy, and Marie-Christine Rousset. Rewriting queries using views in description logics. In Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), Tucson, Arizona., 1997.
Peter Buneman. Semistructured data. In Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), pages 117–121, Tucson, Arizona, 1997.
T. Catarci, S.K. Chang, M. Lenzerini, D. Nardi, and G. Santucci. Turning the web into a database: the WAG approach. In Proceedings of HICSS, 1998.
Mark Craven, Dan DiPasquo, Dayne Freitag, Andrew McCallum, Tom Mitchell, Kamal Nigam, and Sean Slattery. Learning to extract symbolic knowledge from the world-wide web. In Proceedings of the AAAI Fifteenth National Conference on Artificial Intelligence, 1998.
Sophie Cluet, Claude Delobel, Jerome Simeon, and Katarzyna Smaga. Your mediators need data conversion. In Proc. of ACM SIGMOD Conf. on Management of Data, Seattle, WA, 1998.
Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzcrini. What can knowledge representation do for semi-structured data? In Proceedings of the National Conference on Artificial Intelligence, 1998.
Surajit Chaudhuri, Ravi Krishnamurthy, Spyros Potamianos, and Kyuseok Shim. Optimizing queries with materialized views. In Proc. of Int. Conf. on Data Engineering (ICDE), Taipei, Taiwan, 1995.
T. Cutarci and M. Lenzerini. Representing and using interschema knowledge in cooperative information systems. Journal of Intelligent and Cooperative Information Systems, 1993.
W. Cohen. A web-based information system that reasons with structured collections of text. In Proc. Second Intl. Conf. Autonomous Agents, pages 400–407, 1998.
William Cohen. Integration of heterogeneous databases without common domains using queries based on textual similarity. In Proc. of ACM SIGMOD Conf. on Management of Data, Seattle, WA, 1998.
B. Doorenbos, O. Etzioni, and D. Weld. Scalable comparison-shopping agent for the world-wide web. In Proceedings of the International Conference on Autonomous Agents, February 1997.
Oliver M. Duschka and Michael R,. Genesereth. Answering recursive queries using views. In Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), Tucson, Arizona., 1997.
Oliver M. Duschka and Michael R. Genesereth. Query planning in infomaster. In Proceedings of the ACM Symposium on Applied Computing, San Jose, CA, 1997.
Oliver M. Duschka and Alon Y. Levy. Recursive plans for information gathering. In Proceedings of the 15th International Joint Conference on Artificial Intelligence, 1997.
Oliver Duschka. Query optimization using local completeness. In Proceedings of the AAAI Fourteenth National Conference on Artificial Intelligence, 1997.
Oren Etzioni, Keith Golden, and Daniel Weld. Tractable closed world reasoning with updates. In Proceedings of the Conference on Principles of Knowledge Representation and Reasoning, KR-94., 1994. Extended version to appear in Artificial Intelligence.
Mary Fernandez, Daniela Florescu, Jaewoo Kang, Alon Levy, and Dan Suciu. Catching the boat with Strudel: Experiences with a web-site management system. In Proc. of ACM SIGMOD Conf. on Management of Data, Seattle, WA, 1998.
Mary Fernandez, Daniela Florescu, Alon Levy, and Dan Suciu. Reasoning about web-sites. In Working notes of the AAAI-98 Workshop on Artificial Intelligence and Data Integration. American Association of Artificial Intelligence., 1998.
Daniela Florescu, Daphne Koller, and Alon Levy. Using probabilistic information in data integration. In Proc. of the Int. Conf. on Very Large Data Bases (VLDD), pages 216–225, Athens, Greece, 1997.
Daniela Florescu, Louiqa Raschid, and Patrick Valduriez. A methodology for query reformulation in cis using semantic knowledge. Int. Journal of Intelligent & Cooperative Information Systems, special issue on Formal Methods in Cooperative Information Systems, 5(4), 1996.
M. Friedman and D. Weld. Efficient execution of information gathering plans. In Proceedings of the International Joint Conference on Artificial Intelligence, Nagoya, Japan, 1997.
H. Garcia-Molina, Y. Papakonstantinou, D. Quass, A. Rajaraman, Y. Sagiv, J. Ullman, and J. Widom. The TSIMMIS project: Integration of heterogeneous information sources. Journal of Intelligent Information Systems, 8(2):117–132, March 1997.
Jean-Robert Gruser, Louiqa Raschid, MarÃa Esther Vidal, and Laura Bright. Wrapper generation for web accessible data sources. In Proceedings of the CoopIS, 1998.
Joachim Hammer, Hector Garcia-Molina, Svetlozar Nestorov, Kamana Yerneni, Markus M. Breunig, and Vasilis Vassalos. Template-based wrappers in the TSIMMIS system (system demonstration). In Proc. of ACM SIGMOD Conf. on Management of Data, Tucson, Arizona, 1998.
Laura Haas, Donald Kossmann, Edward Wimmers, and Jun Yang. Optimizing queries across diverse data sources. In Proc. of the Int. Conf. on Very Large Data Bases (VLDB), Athens, Greece, 1997.
Zachary Ives, Daniela Florescu, Marc Friedman, Alon Levy, and Dan Weld. An adaptive query execution engine for data integration. submitted for publication, 1998.
R. Jakobovits and J. F. Brinkley. Managing medical research data with a web-interfacing repository manager. In American Medical Informatics Association Fall Symposium, pages 454–458, Nashville, Oct 1997.
N. Kuslinierick, R. Doorenbos, and D. Wold. Wrapper induction for information extraction. In Proceedings of the 15th International Joint Conference on Artificial Intelligence, 1907.
Craig A. Knoblock. Planning executing, sensing and replanning for information gathering. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, 1995.
Chung T. Kwok and Daniel S. Weld. Planning to gather information. In Proceedings of the AAAI Thirteenth National Conference on Artificial Intelligence, 1996.
Hector J. Levesque and Ronald J. Brachman. Expressiveness and tractability in knowledge representation and reasoning. Computational Intelligence, 3:78–93, 1987.
Alon Y. Levy. Obtaining complete answers from incomplete databases. In Proc. of the Int. Conf. on Very Large Data Bases (VLDB), Bombay, India, 1990.
E. Lambrecht and S. Kambhampati. Optimization strategies for information gathering plans. TR-98-018, Arizona State University Department of Computer Science, 1998.
Witold Litwin, Leo Mark, and Nick Roussopoulos. Interoperability of multiple autonomous databases. ACM Computing Surveys, 22(3):267–293, 1990.
Alon Y. Levy, Alberto O. Mendelzon, Yehoshua Sagiv, and Divesh Srivastava. Answering queries using views. In Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), San Jose, CA, 1995.
T. Landers and R. Rosenberg. An overview of multibase. In Proceedings of the Second International Symoposium on Distributed Databases, pages 153–183. North Holland, Amsterdam, 1982.
Alon Y. Levy and Marie-Christine Rousset. Verification of knowledge bases using containment checking. In In Proceedings of AAAI, 1996.
Veronique Lattes and Marie-Christine Rousset. The use of the CARIN language and algorithms for information integration: the PICSEL project. In Proceedings of the ECAI-98 Workshop on Intelligent Information Integration, 1998.
Alon Y. Levy, Anand Rajaraman, and Joann J. Ordille. Query answering algorithms for information agents. In Proceedings of AAAI, 1996.
Alon Y. Levy, Anand Rajaraman, and Joann J. Ordille. Querying heterogeneous information sources using source descriptions. In Proc. of the Int. Conf. on Very Large Data Bases (VLDB), Bombay, India, 1996.
Gerald Lohse and Peter Spiller. Electronic shopping. Comm. of the. ACM, 41(7), July 1998.
Y. Papakonstantinou, S. Abiteboul, and H. Garcia-Molina. Object fusion in mediator systems. In Proc. of the Int. Conf. on Very Large Data Bases (VLDB), Bombay, India, 1996.
Mike Perkowitz and Oren Etzioni. Category translation: Learning to understand information on the internet. In Working Notes of the AAAI Spring Symposium on Information Gathering from Heterogeneous Distributed Environments. American Association for Artificial Intelligence., 1995.
Mike Perkowitz and Oren Etzioni. Adaptive web sites: an AI challenge. In Proceedings of the 15th International Joint Conference on Artificial Intelligence, 1997.
P. Paolini and P. Fraternali. A conceptual model and a tool environment for developing more scalable, dynamic, and customizable web applications. In Proc. of the Conf. on Extending Database Technology (EDBT), Valencia, Spain, 1998.
Yannis Papakonstantinou, Ashish Gupta, Hector Garcia-Molina, and Jeffrey Ullman. A query translation scheme for rapid implementation of wrappers. In Proc. of the Int. Conf. on Deductive and Object-Oriented Databases (DOOD), 1995.
Raymond Reiter. Towards a logical reconstruction of relational database theory. In John Mylopoulos and Michael Brodie, editors, Readings in Artificial Intelligence and Databases, pages 301–326. Morgan Kaufmann, Los Altos, CA, 1988.
Anand Rajaraman, Yehoshua Sagiv, and Jeffrey D. Ullman. Answering queries using templates with binding patterns. In Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), San Jose, CA, 1995.
James Schmolze and Wayne Snyder. Detecting redundant production rules. In Proceedings of the National Conference on Artificial Intelligence, 1997.
Motomichi Toyama and T. Nagafuji. Dynamic and structured presentation of database contents on the web. In Proc. of the Conf. on Extending Database Technology (EDBT), Valencia, Spain, 1998.
A. Tomasic, L. Raschid, and P. Valduriez. Scaling access to distributed heterogeneous data sources with Disco. IEEE Transactions On Knowledge and Data Engineering (to appear), 1998.
Odysseas G. Tsatalos, Marvin H. Solomon, and Yannis E. Ioannidis. The GMAP: A versatile tool for physical data independence. VLDB Journal, 5(2):101–118, 1996.
Tolga Urhan, Michael J. Franklin, and Laurent Amsaleg. Cost based query scrambling for initial delays. In Proc. of ACM SIGMOD Conf. on Management of Data, pages 130–141, Seattle, WA, 1998.
Vasilis Vassalos and Yannis Papakonstantinou. Describing and using the query capabilities of heterogeneous sources. In Proc. of the Int. Conf. on Very Large Data Bases (VLDB), Athens, Greece, 1997.
H. Z. Yang and P. A. Larson. Query transformation for PSJ-queries. In Proc. of the Int. Conf. on Very Large Data Bases (VLDB), pages 245–254, Brighton, England, 1987.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Levy, A.Y. (1999). Combining Artificial Intelligence and Databases for Data Integration. In: Wooldridge, M.J., Veloso, M. (eds) Artificial Intelligence Today. Lecture Notes in Computer Science(), vol 1600. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48317-9_10
Download citation
DOI: https://doi.org/10.1007/3-540-48317-9_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66428-4
Online ISBN: 978-3-540-48317-5
eBook Packages: Springer Book Archive