Skip to main content

Combining Artificial Intelligence and Databases for Data Integration

  • Chapter
  • First Online:
Artificial Intelligence Today

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1600))

Abstract

Data integration is a problem at the intersection of the fields of Artificial Intelligence and Database Systems. The goal of a data integration system is to provide a uniform interface to a multitude of data sources, whether they are within one enterprise or on the World-Wide Web. The key challenges in data integration arise because the data sources being integrated have been designed independently for autonomous applications, and their contents are related in subtle ways. As a result, a data integration system requires rich formalisms for describing contents of data sources and relating between contents of different sources. This paper discusses works aimed at applying techniques from Artificial Intelligence to the problem of data integration. In addition to employing Knowledge Representation techniques for describing contents of information sources, projects have also made use of Machine Learning techniques for extracting data from sources and planning techniques for query optimization. The paper also outlines future opportunities for applying AI techniques in the context of data integration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jos Luis Ambite, Naveen Ashish, Greg Barish, Craig A. Knoblock, Steven Minton, Pragnesh J. Modi, Ion Muslea, Andrew Philpot, and Sheila Tejada. ARIADNE: A system for constructing mediators for internet sources (system demonstration). In Proc. of ACM SIGMOD Conf. on Management of Data, Seattle, WA, 1998.

    Google Scholar 

  2. Serge Abiteboul. Querying semi-structured data. In Proc. of the Int. Conf. on Database Theory (ICDT), Delphi, Greece, 1997.

    Google Scholar 

  3. S. Adali, K. Candan, Y. Papakonstantinou, and V.S. Subrahmanian. Query caching and optimization in distributed mediator systems. In Proc. of ACM SIGMOD Conf. on Management of Data, Montreal, Canada, 1996.

    Google Scholar 

  4. S. Abiteboul and O. Duschka. Complexity of answering queries using materialized views. In Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), Seattle, WA, 1998.

    Google Scholar 

  5. Naveen Ashish and Craig A. Knoblock. Wrapper generation for semi-structured internet sources. SIGMOD Record, 26(4):8–15, 1997.

    Article  Google Scholar 

  6. Yigal Arens, Craig A. Knoblock, and Wei-Min Shen. Query reformulation for dynamic information integration. International Journal on Intelligent and Cooperative Information Systems, (6)2/3:99–130, June 1996.

    Article  Google Scholar 

  7. Gustavo Arocena and Alberto Mendelzon. WebOQL: Restructuring documents, databases and webs. In Proc. of Int. Conf. on Data Engineering (ICDE), Orlando, Florida, 1998.

    Google Scholar 

  8. Paolo Atzeni, Giansalvatore Mecca, and Paolo Merialdo. Design and maintenance of data-intensive web sites. In Proc. of the Conf. on Extending Database Technology (EDBT), Valencia, Spain, 1998.

    Google Scholar 

  9. D. Angluin. Inference of reversible languages. Journal of the ACM, 29(3):741–65, 1982.

    Article  MathSciNet  MATH  Google Scholar 

  10. C. Beeri, G. Elber, T. Milo, Y. Sagiv, O. Shmueli, N. Tishby, Y. Kogan, D. Konopnicki, P. Mogilevski, and N. Slonim. Websuite-a tool suite for harnessing web data. In Proceedings of the International Workshop on the Web and Databases, Valencia, Spain, 1998.

    Google Scholar 

  11. Catriel Beeri, Alon Y. Levy, and Marie-Christine Rousset. Rewriting queries using views in description logics. In Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), Tucson, Arizona., 1997.

    Google Scholar 

  12. Peter Buneman. Semistructured data. In Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), pages 117–121, Tucson, Arizona, 1997.

    Google Scholar 

  13. T. Catarci, S.K. Chang, M. Lenzerini, D. Nardi, and G. Santucci. Turning the web into a database: the WAG approach. In Proceedings of HICSS, 1998.

    Google Scholar 

  14. Mark Craven, Dan DiPasquo, Dayne Freitag, Andrew McCallum, Tom Mitchell, Kamal Nigam, and Sean Slattery. Learning to extract symbolic knowledge from the world-wide web. In Proceedings of the AAAI Fifteenth National Conference on Artificial Intelligence, 1998.

    Google Scholar 

  15. Sophie Cluet, Claude Delobel, Jerome Simeon, and Katarzyna Smaga. Your mediators need data conversion. In Proc. of ACM SIGMOD Conf. on Management of Data, Seattle, WA, 1998.

    Google Scholar 

  16. Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzcrini. What can knowledge representation do for semi-structured data? In Proceedings of the National Conference on Artificial Intelligence, 1998.

    Google Scholar 

  17. Surajit Chaudhuri, Ravi Krishnamurthy, Spyros Potamianos, and Kyuseok Shim. Optimizing queries with materialized views. In Proc. of Int. Conf. on Data Engineering (ICDE), Taipei, Taiwan, 1995.

    Google Scholar 

  18. T. Cutarci and M. Lenzerini. Representing and using interschema knowledge in cooperative information systems. Journal of Intelligent and Cooperative Information Systems, 1993.

    Google Scholar 

  19. W. Cohen. A web-based information system that reasons with structured collections of text. In Proc. Second Intl. Conf. Autonomous Agents, pages 400–407, 1998.

    Google Scholar 

  20. William Cohen. Integration of heterogeneous databases without common domains using queries based on textual similarity. In Proc. of ACM SIGMOD Conf. on Management of Data, Seattle, WA, 1998.

    Google Scholar 

  21. B. Doorenbos, O. Etzioni, and D. Weld. Scalable comparison-shopping agent for the world-wide web. In Proceedings of the International Conference on Autonomous Agents, February 1997.

    Google Scholar 

  22. Oliver M. Duschka and Michael R,. Genesereth. Answering recursive queries using views. In Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), Tucson, Arizona., 1997.

    Google Scholar 

  23. Oliver M. Duschka and Michael R. Genesereth. Query planning in infomaster. In Proceedings of the ACM Symposium on Applied Computing, San Jose, CA, 1997.

    Google Scholar 

  24. Oliver M. Duschka and Alon Y. Levy. Recursive plans for information gathering. In Proceedings of the 15th International Joint Conference on Artificial Intelligence, 1997.

    Google Scholar 

  25. Oliver Duschka. Query optimization using local completeness. In Proceedings of the AAAI Fourteenth National Conference on Artificial Intelligence, 1997.

    Google Scholar 

  26. Oren Etzioni, Keith Golden, and Daniel Weld. Tractable closed world reasoning with updates. In Proceedings of the Conference on Principles of Knowledge Representation and Reasoning, KR-94., 1994. Extended version to appear in Artificial Intelligence.

    Google Scholar 

  27. Mary Fernandez, Daniela Florescu, Jaewoo Kang, Alon Levy, and Dan Suciu. Catching the boat with Strudel: Experiences with a web-site management system. In Proc. of ACM SIGMOD Conf. on Management of Data, Seattle, WA, 1998.

    Google Scholar 

  28. Mary Fernandez, Daniela Florescu, Alon Levy, and Dan Suciu. Reasoning about web-sites. In Working notes of the AAAI-98 Workshop on Artificial Intelligence and Data Integration. American Association of Artificial Intelligence., 1998.

    Google Scholar 

  29. Daniela Florescu, Daphne Koller, and Alon Levy. Using probabilistic information in data integration. In Proc. of the Int. Conf. on Very Large Data Bases (VLDD), pages 216–225, Athens, Greece, 1997.

    Google Scholar 

  30. Daniela Florescu, Louiqa Raschid, and Patrick Valduriez. A methodology for query reformulation in cis using semantic knowledge. Int. Journal of Intelligent & Cooperative Information Systems, special issue on Formal Methods in Cooperative Information Systems, 5(4), 1996.

    Google Scholar 

  31. M. Friedman and D. Weld. Efficient execution of information gathering plans. In Proceedings of the International Joint Conference on Artificial Intelligence, Nagoya, Japan, 1997.

    Google Scholar 

  32. H. Garcia-Molina, Y. Papakonstantinou, D. Quass, A. Rajaraman, Y. Sagiv, J. Ullman, and J. Widom. The TSIMMIS project: Integration of heterogeneous information sources. Journal of Intelligent Information Systems, 8(2):117–132, March 1997.

    Article  Google Scholar 

  33. Jean-Robert Gruser, Louiqa Raschid, María Esther Vidal, and Laura Bright. Wrapper generation for web accessible data sources. In Proceedings of the CoopIS, 1998.

    Google Scholar 

  34. Joachim Hammer, Hector Garcia-Molina, Svetlozar Nestorov, Kamana Yerneni, Markus M. Breunig, and Vasilis Vassalos. Template-based wrappers in the TSIMMIS system (system demonstration). In Proc. of ACM SIGMOD Conf. on Management of Data, Tucson, Arizona, 1998.

    Google Scholar 

  35. Laura Haas, Donald Kossmann, Edward Wimmers, and Jun Yang. Optimizing queries across diverse data sources. In Proc. of the Int. Conf. on Very Large Data Bases (VLDB), Athens, Greece, 1997.

    Google Scholar 

  36. Zachary Ives, Daniela Florescu, Marc Friedman, Alon Levy, and Dan Weld. An adaptive query execution engine for data integration. submitted for publication, 1998.

    Google Scholar 

  37. R. Jakobovits and J. F. Brinkley. Managing medical research data with a web-interfacing repository manager. In American Medical Informatics Association Fall Symposium, pages 454–458, Nashville, Oct 1997.

    Google Scholar 

  38. N. Kuslinierick, R. Doorenbos, and D. Wold. Wrapper induction for information extraction. In Proceedings of the 15th International Joint Conference on Artificial Intelligence, 1907.

    Google Scholar 

  39. Craig A. Knoblock. Planning executing, sensing and replanning for information gathering. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, 1995.

    Google Scholar 

  40. Chung T. Kwok and Daniel S. Weld. Planning to gather information. In Proceedings of the AAAI Thirteenth National Conference on Artificial Intelligence, 1996.

    Google Scholar 

  41. Hector J. Levesque and Ronald J. Brachman. Expressiveness and tractability in knowledge representation and reasoning. Computational Intelligence, 3:78–93, 1987.

    Article  MATH  Google Scholar 

  42. Alon Y. Levy. Obtaining complete answers from incomplete databases. In Proc. of the Int. Conf. on Very Large Data Bases (VLDB), Bombay, India, 1990.

    Google Scholar 

  43. E. Lambrecht and S. Kambhampati. Optimization strategies for information gathering plans. TR-98-018, Arizona State University Department of Computer Science, 1998.

    Google Scholar 

  44. Witold Litwin, Leo Mark, and Nick Roussopoulos. Interoperability of multiple autonomous databases. ACM Computing Surveys, 22(3):267–293, 1990.

    Article  Google Scholar 

  45. Alon Y. Levy, Alberto O. Mendelzon, Yehoshua Sagiv, and Divesh Srivastava. Answering queries using views. In Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), San Jose, CA, 1995.

    Google Scholar 

  46. T. Landers and R. Rosenberg. An overview of multibase. In Proceedings of the Second International Symoposium on Distributed Databases, pages 153–183. North Holland, Amsterdam, 1982.

    Google Scholar 

  47. Alon Y. Levy and Marie-Christine Rousset. Verification of knowledge bases using containment checking. In In Proceedings of AAAI, 1996.

    Google Scholar 

  48. Veronique Lattes and Marie-Christine Rousset. The use of the CARIN language and algorithms for information integration: the PICSEL project. In Proceedings of the ECAI-98 Workshop on Intelligent Information Integration, 1998.

    Google Scholar 

  49. Alon Y. Levy, Anand Rajaraman, and Joann J. Ordille. Query answering algorithms for information agents. In Proceedings of AAAI, 1996.

    Google Scholar 

  50. Alon Y. Levy, Anand Rajaraman, and Joann J. Ordille. Querying heterogeneous information sources using source descriptions. In Proc. of the Int. Conf. on Very Large Data Bases (VLDB), Bombay, India, 1996.

    Google Scholar 

  51. Gerald Lohse and Peter Spiller. Electronic shopping. Comm. of the. ACM, 41(7), July 1998.

    Google Scholar 

  52. Y. Papakonstantinou, S. Abiteboul, and H. Garcia-Molina. Object fusion in mediator systems. In Proc. of the Int. Conf. on Very Large Data Bases (VLDB), Bombay, India, 1996.

    Google Scholar 

  53. Mike Perkowitz and Oren Etzioni. Category translation: Learning to understand information on the internet. In Working Notes of the AAAI Spring Symposium on Information Gathering from Heterogeneous Distributed Environments. American Association for Artificial Intelligence., 1995.

    Google Scholar 

  54. Mike Perkowitz and Oren Etzioni. Adaptive web sites: an AI challenge. In Proceedings of the 15th International Joint Conference on Artificial Intelligence, 1997.

    Google Scholar 

  55. P. Paolini and P. Fraternali. A conceptual model and a tool environment for developing more scalable, dynamic, and customizable web applications. In Proc. of the Conf. on Extending Database Technology (EDBT), Valencia, Spain, 1998.

    Google Scholar 

  56. Yannis Papakonstantinou, Ashish Gupta, Hector Garcia-Molina, and Jeffrey Ullman. A query translation scheme for rapid implementation of wrappers. In Proc. of the Int. Conf. on Deductive and Object-Oriented Databases (DOOD), 1995.

    Google Scholar 

  57. Raymond Reiter. Towards a logical reconstruction of relational database theory. In John Mylopoulos and Michael Brodie, editors, Readings in Artificial Intelligence and Databases, pages 301–326. Morgan Kaufmann, Los Altos, CA, 1988.

    Google Scholar 

  58. Anand Rajaraman, Yehoshua Sagiv, and Jeffrey D. Ullman. Answering queries using templates with binding patterns. In Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), San Jose, CA, 1995.

    Google Scholar 

  59. James Schmolze and Wayne Snyder. Detecting redundant production rules. In Proceedings of the National Conference on Artificial Intelligence, 1997.

    Google Scholar 

  60. Motomichi Toyama and T. Nagafuji. Dynamic and structured presentation of database contents on the web. In Proc. of the Conf. on Extending Database Technology (EDBT), Valencia, Spain, 1998.

    Google Scholar 

  61. A. Tomasic, L. Raschid, and P. Valduriez. Scaling access to distributed heterogeneous data sources with Disco. IEEE Transactions On Knowledge and Data Engineering (to appear), 1998.

    Google Scholar 

  62. Odysseas G. Tsatalos, Marvin H. Solomon, and Yannis E. Ioannidis. The GMAP: A versatile tool for physical data independence. VLDB Journal, 5(2):101–118, 1996.

    Article  Google Scholar 

  63. Tolga Urhan, Michael J. Franklin, and Laurent Amsaleg. Cost based query scrambling for initial delays. In Proc. of ACM SIGMOD Conf. on Management of Data, pages 130–141, Seattle, WA, 1998.

    Google Scholar 

  64. Vasilis Vassalos and Yannis Papakonstantinou. Describing and using the query capabilities of heterogeneous sources. In Proc. of the Int. Conf. on Very Large Data Bases (VLDB), Athens, Greece, 1997.

    Google Scholar 

  65. H. Z. Yang and P. A. Larson. Query transformation for PSJ-queries. In Proc. of the Int. Conf. on Very Large Data Bases (VLDB), pages 245–254, Brighton, England, 1987.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Levy, A.Y. (1999). Combining Artificial Intelligence and Databases for Data Integration. In: Wooldridge, M.J., Veloso, M. (eds) Artificial Intelligence Today. Lecture Notes in Computer Science(), vol 1600. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48317-9_10

Download citation

  • DOI: https://doi.org/10.1007/3-540-48317-9_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66428-4

  • Online ISBN: 978-3-540-48317-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics