Skip to main content
Log in

An Adaptable IE System to New Domains

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The most extended way of acquiring information for knowledge based systems is to do it manually. However, the high cost of this approach and the availability of alternative Knowledge Sources has lead to an increasing use of automatic acquisition approaches. In this paper we present M-TURBIO, a Text-Based Intelligent System (TBIS) that extracts information contained in restricted-domain documents. The system acquires part of its knowledge about the structure of the documents and the way the information is presented (i.e., syntactic-semantic rules) from a training set of these. Then, a database is created by means of applying these syntactic-semantic rules to extract the information contained in the whole document.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. D. Oard and G. Marchioni, “A conceptual framework for text filtering,” Technical Report EE-TR–96–25, University of Maryland, 1996.

  2. M.T. Pazienza, Information Extraction, Lecture Notes in Artificial Intelligence, Springer-Verlag: Rome, 1997.

    Google Scholar 

  3. J. Cowie and W. Lehnert, “Information extraction,” Communications of the ACM, vol. 39,no. 1, pp. 80–91, 1996.

    Google Scholar 

  4. Proceedings of the Third Message Understanding Conference (MUC-3), Morgan Kaufmann Publishers, May 1991.

  5. Proceedings of the Fourth Message Understanding Conference (MUC-4), Morgan Kaufmann Publishers, June 1992.

  6. Proceedings of the Fifth Message Understanding Conference (MUC-5), Morgan Kaufmann Publishers, August 1993.

  7. Proceedings of the Sixth Message Understanding Conference (MUC-6), Morgan Kaufmann Publishers, 1995.

  8. J.R. Hobbs, “The generic information extraction system,” in Proceedings of the Third Message Understanding Conference (MUC-5), San Francisco, CA, August 1993, pp. 87–91, Morgan Kaufmann.

  9. R. Grishman, “Information extraction: Techniques and challenges,” in Information Extraction, edited by M.T. Pazienza, Lecture Notes in Artificial Intelligence, Springer-Verlag, pp. 10–27, 1997.

  10. R. Gaizauskas, T. Wakao, K. Humphreys, H. Cunningham, and Y. Wilks, “University of sheffield: Description of the laSIE system as used for MUC-6,” in Proceedings of the Sixth Message Understanding Conference (MUC-6), San Francisco, CA, 1995, pp. 207–220, Morgan Kaufmann.

  11. R. Morgan, R. Garigliano, P. Callaghan, S. Poria, M. Smith, A. Urbanowicx, R. Collingham, M. Constantino, and C. Cooper, “Description of the LOLITA system as used for MUC-6.” in Proceedings of the Sixth Message Understanding Conference (MUC-6), San Francisco, CA, 1995. pp. 71–85, Morgan Kaufmann.

  12. E. Riloff, “Automatically constructing a dictionary for information extraction tasks,” in Proceedings of the Eleventh National Conference on Artificial Intelligence, 1993, pp. 811–816.

  13. S. Soderland, D. Fisher, J. Aseltine, and W. Lehnert, “CRYSTAL: Inducing a conceptual dictionary,” in Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1995.

  14. E. Riloff and J. Shoen, “Automatically acquiring conceptual patterns without and annotated corpus,” in Proceedings of the Third Workshop on Very Large Corpora, 1995, pp. 148–161.

  15. S.B. Huffman, “Learning information extraction patterns from examples,” IJCAI-95 Workshop on New Approaches to Learning for NLP, 1995.

  16. R.S. Michalski, “A theory and methodology of inductive learning,” Artificial Intelligence, vol. 20, pp. 111–161, 1983.

    Google Scholar 

  17. T.M. Mitchell, “Generalization as search,” Artificial Intelligence, vol. 18, pp. 203–226, 1982.

    Google Scholar 

  18. R. Yangarber and R. Grishman, “Customization of information extraction systems,” in Proceedings of the International Workshop on Lexically Driven IE, Frascati, Italy, July 1997.

  19. J. Atserias, N. Castell, N. Català, H. Rodrìguez, and J. Turmo, “Del texto a la información,” Novatica, vol. 133, pp. 31–35, May–June 1998.

    Google Scholar 

  20. H. Rodríguez, S. Climent, P. Vossen, L. Bloksma, W. Peters, A. Roventini, F. Bertagna, and A. Alonge, “The top-down strategy for building eurowordnet: Vocabulary coverage, base concepts and top ontology,” in press.

  21. H. Cunningham, R. Gaizauskas, and Y. Wilks, “A general architecture for text engineering(gate)—a new approach to language engineering R&D,” Technical Report CS–95–21, Department of Computer Science, University of Sheffield, 1995. Also available at http://xxx.lanl.gov/ps/cmp-lg/9601009.

  22. S. Cervell, J. Carmona, L. Màrquez, M.A. Martí, L. Padró, R. Placer, H. Rodríguez, M. Taulé, and J. Turmo, “An environment for morphosyntactic processing of unrestricted spanish text,” in Proceedings of ELRA'98, Granada, Spain, 1998.

  23. L. Padró, “A hybrid environment for syntax-semantic tagging,” Ph.D. Thesis, Dept. Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya, Barcelona, 1997.

    Google Scholar 

  24. L. Màrquez and L. Padró, “A flexible pos tagger using an automatically acquired language model,” in Proceedings of EACL/ACL, Madrid, Spain, 1997, pp. 238–245.

  25. L. Màrquez and H. Rodríguez, “Automatically acquiring a language model for pos tagging using decision trees,” in Proceedings of Recent Advances on NLP, RANLP97, Tzigov Chark, Bulgaria, 1997, pp. 27–34.

  26. L. Màrquez and H. Rodríguez, “Part-of-speech-tagging using decision trees,” in Proceedings of 10th European Conference on Machine Learning, Chemnitz, Germany, 1998, pp. 25–36.

  27. L. Màrquez and L. Padró, “Improving tagging accuracy by voting taggers,” in Proceedings of the 2nd Conference on Natural Language Processing & Industrial Applications, NLP+IA/TAL+AI, New Brunswick, Canada, August 1998, pp. 149–155.

  28. G. Rigau, J. Atserias, and E. Agirre, “Combining unsupervised lexical knowledge methods for word sense disambiguation,” in Proceedings of joint EACL/ACL-97, Madrid, Spain, 1997.

  29. M. Civit and I. Castellón, “Gramesp: Una gramática de corpus para el español,” Revista Española de Linguística Aplicada (RESLA), 1998.

  30. R. Gaizauskas, K. Humphreys, S. Azzam, and Y. Wilks, “Concepticons vs. lexicons: Architecture for multilingual, in Information Extraction, Lecture Notes in Artificial Intelligence, edited by M.T. Pazienza, Springer-Verlag, pp. 28–43, 1997.

  31. C. Fluhr, “Multilingual Information Retrieval,” Lecture Notes in Artificial Intelligence, Center for Spoken Language Understanding, Oregon Graduate Institute, Oregon, 1995.

    Google Scholar 

  32. M. Kameyama, “Information extraction across linguistic boundaries,” in AAAI Spring Symposium on Cross-Language Text and Speech Processing, 1997.

  33. R. Grishman and B. Sundheim, “Message understanding conference-6: A brief history,” in Proceedings of 16th International Conference on Computational Linguistics (COLING-96), August 1996.

  34. N. Català, “ESSENCE: A portable methodology for building information extraction systems,” Technical Report LSI–98–54R (in press), Departament de Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya, 1998.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Turmo, J., Català, N. & Rodríguez, H. An Adaptable IE System to New Domains. Applied Intelligence 10, 225–246 (1999). https://doi.org/10.1023/A:1008332021052

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008332021052

Navigation