Abstract
The most extended way of acquiring information for knowledge based systems is to do it manually. However, the high cost of this approach and the availability of alternative Knowledge Sources has lead to an increasing use of automatic acquisition approaches. In this paper we present M-TURBIO, a Text-Based Intelligent System (TBIS) that extracts information contained in restricted-domain documents. The system acquires part of its knowledge about the structure of the documents and the way the information is presented (i.e., syntactic-semantic rules) from a training set of these. Then, a database is created by means of applying these syntactic-semantic rules to extract the information contained in the whole document.
Similar content being viewed by others
References
D. Oard and G. Marchioni, “A conceptual framework for text filtering,” Technical Report EE-TR–96–25, University of Maryland, 1996.
M.T. Pazienza, Information Extraction, Lecture Notes in Artificial Intelligence, Springer-Verlag: Rome, 1997.
J. Cowie and W. Lehnert, “Information extraction,” Communications of the ACM, vol. 39,no. 1, pp. 80–91, 1996.
Proceedings of the Third Message Understanding Conference (MUC-3), Morgan Kaufmann Publishers, May 1991.
Proceedings of the Fourth Message Understanding Conference (MUC-4), Morgan Kaufmann Publishers, June 1992.
Proceedings of the Fifth Message Understanding Conference (MUC-5), Morgan Kaufmann Publishers, August 1993.
Proceedings of the Sixth Message Understanding Conference (MUC-6), Morgan Kaufmann Publishers, 1995.
J.R. Hobbs, “The generic information extraction system,” in Proceedings of the Third Message Understanding Conference (MUC-5), San Francisco, CA, August 1993, pp. 87–91, Morgan Kaufmann.
R. Grishman, “Information extraction: Techniques and challenges,” in Information Extraction, edited by M.T. Pazienza, Lecture Notes in Artificial Intelligence, Springer-Verlag, pp. 10–27, 1997.
R. Gaizauskas, T. Wakao, K. Humphreys, H. Cunningham, and Y. Wilks, “University of sheffield: Description of the laSIE system as used for MUC-6,” in Proceedings of the Sixth Message Understanding Conference (MUC-6), San Francisco, CA, 1995, pp. 207–220, Morgan Kaufmann.
R. Morgan, R. Garigliano, P. Callaghan, S. Poria, M. Smith, A. Urbanowicx, R. Collingham, M. Constantino, and C. Cooper, “Description of the LOLITA system as used for MUC-6.” in Proceedings of the Sixth Message Understanding Conference (MUC-6), San Francisco, CA, 1995. pp. 71–85, Morgan Kaufmann.
E. Riloff, “Automatically constructing a dictionary for information extraction tasks,” in Proceedings of the Eleventh National Conference on Artificial Intelligence, 1993, pp. 811–816.
S. Soderland, D. Fisher, J. Aseltine, and W. Lehnert, “CRYSTAL: Inducing a conceptual dictionary,” in Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, 1995.
E. Riloff and J. Shoen, “Automatically acquiring conceptual patterns without and annotated corpus,” in Proceedings of the Third Workshop on Very Large Corpora, 1995, pp. 148–161.
S.B. Huffman, “Learning information extraction patterns from examples,” IJCAI-95 Workshop on New Approaches to Learning for NLP, 1995.
R.S. Michalski, “A theory and methodology of inductive learning,” Artificial Intelligence, vol. 20, pp. 111–161, 1983.
T.M. Mitchell, “Generalization as search,” Artificial Intelligence, vol. 18, pp. 203–226, 1982.
R. Yangarber and R. Grishman, “Customization of information extraction systems,” in Proceedings of the International Workshop on Lexically Driven IE, Frascati, Italy, July 1997.
J. Atserias, N. Castell, N. Català, H. Rodrìguez, and J. Turmo, “Del texto a la información,” Novatica, vol. 133, pp. 31–35, May–June 1998.
H. Rodríguez, S. Climent, P. Vossen, L. Bloksma, W. Peters, A. Roventini, F. Bertagna, and A. Alonge, “The top-down strategy for building eurowordnet: Vocabulary coverage, base concepts and top ontology,” in press.
H. Cunningham, R. Gaizauskas, and Y. Wilks, “A general architecture for text engineering(gate)—a new approach to language engineering R&D,” Technical Report CS–95–21, Department of Computer Science, University of Sheffield, 1995. Also available at http://xxx.lanl.gov/ps/cmp-lg/9601009.
S. Cervell, J. Carmona, L. Màrquez, M.A. Martí, L. Padró, R. Placer, H. Rodríguez, M. Taulé, and J. Turmo, “An environment for morphosyntactic processing of unrestricted spanish text,” in Proceedings of ELRA'98, Granada, Spain, 1998.
L. Padró, “A hybrid environment for syntax-semantic tagging,” Ph.D. Thesis, Dept. Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya, Barcelona, 1997.
L. Màrquez and L. Padró, “A flexible pos tagger using an automatically acquired language model,” in Proceedings of EACL/ACL, Madrid, Spain, 1997, pp. 238–245.
L. Màrquez and H. Rodríguez, “Automatically acquiring a language model for pos tagging using decision trees,” in Proceedings of Recent Advances on NLP, RANLP97, Tzigov Chark, Bulgaria, 1997, pp. 27–34.
L. Màrquez and H. Rodríguez, “Part-of-speech-tagging using decision trees,” in Proceedings of 10th European Conference on Machine Learning, Chemnitz, Germany, 1998, pp. 25–36.
L. Màrquez and L. Padró, “Improving tagging accuracy by voting taggers,” in Proceedings of the 2nd Conference on Natural Language Processing & Industrial Applications, NLP+IA/TAL+AI, New Brunswick, Canada, August 1998, pp. 149–155.
G. Rigau, J. Atserias, and E. Agirre, “Combining unsupervised lexical knowledge methods for word sense disambiguation,” in Proceedings of joint EACL/ACL-97, Madrid, Spain, 1997.
M. Civit and I. Castellón, “Gramesp: Una gramática de corpus para el español,” Revista Española de Linguística Aplicada (RESLA), 1998.
R. Gaizauskas, K. Humphreys, S. Azzam, and Y. Wilks, “Concepticons vs. lexicons: Architecture for multilingual, in Information Extraction, Lecture Notes in Artificial Intelligence, edited by M.T. Pazienza, Springer-Verlag, pp. 28–43, 1997.
C. Fluhr, “Multilingual Information Retrieval,” Lecture Notes in Artificial Intelligence, Center for Spoken Language Understanding, Oregon Graduate Institute, Oregon, 1995.
M. Kameyama, “Information extraction across linguistic boundaries,” in AAAI Spring Symposium on Cross-Language Text and Speech Processing, 1997.
R. Grishman and B. Sundheim, “Message understanding conference-6: A brief history,” in Proceedings of 16th International Conference on Computational Linguistics (COLING-96), August 1996.
N. Català, “ESSENCE: A portable methodology for building information extraction systems,” Technical Report LSI–98–54R (in press), Departament de Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya, 1998.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Turmo, J., Català, N. & Rodríguez, H. An Adaptable IE System to New Domains. Applied Intelligence 10, 225–246 (1999). https://doi.org/10.1023/A:1008332021052
Issue Date:
DOI: https://doi.org/10.1023/A:1008332021052