Document Interrogation: Architecture, Information Extraction and Approximate Answers

  • Soraya Abad-Mota
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4254)


We present an architecture for structuring and querying the contents of a set of documents which belong to an organization. The structure is a database which is semi-automatically populated using information extraction techniques. We provide an ontology-based language to interrogate the contents of the documents. The processing of queries in this language can give approximate answers and triggers a mechanism for improving the answers by doing additional information extraction of the textual sources. Individual database items have associated quality metadata which can be used when evaluating the quality of answers. The interaction between information extraction and query processing is a pivotal aspect of this research.


Relational Database Query Processing Natural Language Processing Information Extraction Extraction Rule 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Hendler, J., Berners-Lee, T., Lassila, O.: The semantic web. Scientific American (May 2001)Google Scholar
  2. 2.
    Florescu, D., Levy, A., Mendelzon, A.: Database techniques for the world-wide-web: A survey. SIGMOD Record 27(3), 59–74 (1998)CrossRefGoogle Scholar
  3. 3.
    Wiederhold, G.: Mediators in the architecture of future information systems. IEEE Computer 25(3), 38–49 (1992)Google Scholar
  4. 4.
    Abad-Mota, S., Helman, P.A.: Dia: A document interrogation architecture. In: Proceedings of the Text Mining Workshop in conjunction with the Sixth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2002), pp. 35–45 (2002)Google Scholar
  5. 5.
    Guarino, N. (ed.): Formal Ontology and Information Systems. IOS Press, Amsterdam (1998)Google Scholar
  6. 6.
    Abad-Mota, S., Helman, P.A.: Odil: Ontology-based document interrogation language. In: Khosrow-Pour, M. (ed.) Proceedings of the 2004 Information Resources Management Association International Conference, IRMA, pp. 517–520. Idea Group Publishing, USA (2004)Google Scholar
  7. 7.
    Brill, E., Mooney, R.J.: An overview of empirical natural language processing. AI Magazine (Winter), 13–24 (1997)Google Scholar
  8. 8.
    Cardie, C.: Empirical methods in information extraction. AI Magazine 18(4), 65–80 (1997)Google Scholar
  9. 9.
    Muslea, I.: Extraction patterns for information extraction tasks: A survey. In: AAAI 1999 Workshop on Machine Learning for Information Extraction, Orlando, Florida, July 19 (1999)Google Scholar
  10. 10.
    Yong Nahm, U., Mooney, R.J.: A mutually beneficial integration of data mining and information extraction. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI 2000) (July 2000)Google Scholar
  11. 11.
    Embley, D.W., Campbell, D.M., Jiang, Y.S., Liddle, S.W., Ng, Y.-K., Quass, D., Smith, R.D.: Conceptual-model-based data extraction from multiple-record web pages. Data Knowledge Engineering 31(3), 227–251 (1999)MATHCrossRefGoogle Scholar
  12. 12.
    Embley, D.W.: Toward semantic understanding: an approach based on information extraction ontologies. In: CRPIT 2004: Proceedings of the fifteenth conference on Australasian database, Darlinghurst, Australia, pp. 3–12. Australian Computer Society, Inc. (2004)Google Scholar
  13. 13.
    Elaine Califf, M.: Relational learning techniques for natural language extraction, Tech. Rep. AI98-276, University of Texas (January 1998)Google Scholar
  14. 14.
    Soderland, S.: Learning information extraction rules for semi-structured and free text. Machine Learning 34(1-3), 233–272 (1999)MATHCrossRefGoogle Scholar
  15. 15.
    Ciarvegna, F.: (lp)2, an adaptative algorithm from information extraction from web-related texts. In: Proceedings of the IJCAI-2001 Workshop on Adaptative Text Extraction and Mining, IJCAI 2001 (August 2001)Google Scholar
  16. 16.
    Laender, B.A., Ribeiro-Neto, da Silva, A., Teixeira, J.: A brief survey of web data extraction tools. SIGMOD Record 31(2), 84–93 (2002)CrossRefGoogle Scholar
  17. 17.
    Ireson, N., Ciarvegna, F., Elaine Califf, M., Freitag, D., Kushmerick, N., Lavelli, A.: Evaluating machine learning for information extraction. In: Proceedings of the 22nd International Conference on Machine Learning (ICML 2005) IJCAI 2001 (August 2005)Google Scholar
  18. 18.
    Abad-Mota, S., Ruiz, E.: Experiments in information extraction. In: Khosrow-Pour, M. (ed.) The Proceedings of the 2006 Information Resources Management Association International Conference, IRMA. Idea Group Publishing, USA (to appear, 2006)Google Scholar
  19. 19.
    Motro, A.: Integrity = validity + completeness. ACM Transactions on Database Systems 14(4), 481–502 (1989)CrossRefGoogle Scholar
  20. 20.
    Andrei Mihaila, G.: Publishing, Locating, and Querying Networked Information Sources, Ph.D. thesis, University of Toronto (2000)Google Scholar
  21. 21.
    Rakov, I.: Quality of information in relational databases and its use for reconciling inconsistent answers in multidatabases. electronic document,
  22. 22.
    Abad-Mota, S.: Approximate query processing with summary tables in statistical databases. In: Pirotte, A., Delobel, C., Gottlob, G. (eds.) EDBT 1992. LNCS, vol. 580, pp. 499–515. Springer, Heidelberg (1992)CrossRefGoogle Scholar
  23. 23.
    Bouzeghoub, M., Peralta, V.: A framework for analysis of data freshness. In: IQIS 2004: Proceedings of the 2004 international workshop on Information quality in information systems, pp. 59–67. ACM Press, New York (2004)CrossRefGoogle Scholar
  24. 24.
    Motro, A., Rakov, I.: chapter Not all answers are equally good: estimating the quality of database answers. In: Flexible query answering systems, Norwell, MA, USA, pp. 1–21. Kluwer Academic Publishers, Dordrecht (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Soraya Abad-Mota
    • 1
    • 2
  1. 1.University of New Mexico 
  2. 2.Universidad Simón Bolívar 

Personalised recommendations