Can We Make Information Extraction More Adaptive?

Wilks, Yorick; Catizone, Roberta

doi:10.1007/3-540-48089-7_1

Yorick Wilks² &
Roberta Catizone²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1714))

Included in the following conference series:

International Summer School on Information Extraction

585 Accesses
5 Citations

Abstract

It seems widely agreed that IE (Information Extraction) is now a tested language technology that has reached precision+recall values that put it in about the same position as Information Retrieval and Machine Translation, both of which are widely used commercially. There is also a clear range of practical applications that would be eased by the sort of template-style data that IE provides. The problem for wider deployment of the technology is adaptability: the ability to customize IE rapidly to new domains.

In this paper we discuss some methods that have been tried to ease this problem, and to create something more rapid than the bench-mark one-month figure, which was roughly what ARPA teams in IE needed to adapt an existing system by hand to a new domain of corpora and templates. An important distinction in discussing the issue is the degree to which a user can be assumed to know what is wanted, to have pre-existing templates ready to hand, as opposed to a user who has a vague idea of what is needed from a corpus.

We shall discuss attempts to derive templates directly from corpora; to derive knowledge structures and lexicons directly from corpora, including discussion of the recent LE project ECRAN which attempted to tune existing lexicons to new corpora. An important issue is how far established methods in Information Retrieval of tuning to a user’s needs with feedback at an interface can be transferred to IE.

The authors are grateful to discussion and contributions from Hamish Cunningham, Robert Gaizauskas, Louise Guthrie and Evelyne Viegas, All errors are our own of course.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

J. Aberdeen, J. Burger, D. Day, L. Hirschman, P. Robinson, and M. Vilain. MITRE-Description of the Alembic System used for MUC-6. In Proceedings of the Sixth Message Understanding Conference (MUC-6), pages 141–156, 1995.
Google Scholar
S. Azzam, K. Humphreys, and R. Gaizauskas. Using conference chains for text summarization. In Proceedings of the ACL’ 99 WOrkshop on Conference and its Applications. Maryland, 1999.
Google Scholar
R. Basili, M. Pazienza, and P. Velardi. Aquisition of selectional patterns from sub-langauges. Machine Translation, 8, 1993.
Google Scholar
R. Catizone M.T. Pazienza M. Stevenson M. P. Velardi M. Vindigni Y. Wilks Basili, R. An empirical approach to lexical tuning. In Workshop on Adapting Lexical and Corpus Resources to Sublanguages and Applications, LREC, First International Conference on Language Resources and Evaluation, Granada, Spain, 1998.
Google Scholar
D. Bikel, S. Miller, R. Schwartz, and R. Weischedel. Nymble: a High-Performance Learning Name-finder. In Proceedings of the Fifth conference on Applied Natural Language Processing, 1997.
Google Scholar
D.G. Bobrow and T. Winograd. An overview of krl, a knowledge representation language. Cognitive Science 1, pages 3–46, 1977.
Article Google Scholar
E. Brill. Some Advances in Transformation-Based Part of Speech Tagging. In Proceedings of the Twelfth National Conference on AI (AAAI-94), Seattle, Washington, 1994.
Google Scholar
E. Brill. Transformation-Based Error-Driven Learning and Natural Language. Computational Linguistics, 21(4), December 1995.
Google Scholar
E. Briscoe, A. Copestake, and V. De Pavia. Default inheritance in unification-based approaches to the lexicon. Technical report, Cambridge University Computer Laboratory, 1991.
Google Scholar
R. Bruce and L. Guthrie. Genus disambiguation: A study in weighted preference. In Proceesings of COLING-92, pages 1187–1191, Nantes, France, 1992.
Google Scholar
P. Buitelaar. A lexicon for underspecified semantic tagging. In Proceedings of the ACL-Siglex Workshop on Tagging Text with Lexical Semantics, Washington, D.C., 1997.
Google Scholar
Claire Cardie. Empirical methods in information extraction. AI Magazine. Special Issue on Empirical Natural Language Processing, 18(4), 1997.
Google Scholar
N. Chinchor. The statistical significance of the MUC-5 results. In Proceedings of the Fifth Message Understanding Conference (MUC-5), pages 79–83. Morgan Kaufmann, 1993.
Google Scholar
N. Chinchor and Sundheim B. MUC-5 Evaluation Metrics. In Proceedings of the Fifth Message Understanding Conference (MUC-5), pages 69–78. Morgan Kaufmann, 1993.
Google Scholar
N. Chinchor, L. Hirschman, and D.D. Lewis. Evaluating message understanding systems: An analysis of the third message understanding conference (muc-3). Computational Linguistics, 19(3):409–449, 1993.
Google Scholar
R. Collier. Automatic Template Creation for Information Extraction. PhD thesis, UK, 1998.
Google Scholar
J. Cowie, L. Guthrie, W. Jin, W. Odgen, J. Pustejowsky, R. Wanf, T. Wakao, S. Waterman, and Y. Wilks. CRL/Brandeis: The Diderot System. In Proceedings of Tipster Text Program (Phase I). Morgan Kaufmann, 1993.
Google Scholar
J. Cowie and W. Lehnert. Information extraction. Special NLP Issue of the Communications of the ACM, 1996.
Google Scholar
H. Cunningham. JAPE-a Jolly Advanced Pattern Engine. 1997.
Google Scholar
H. Cunningham, S. Azzam, and Y. Wilks. Domain Modelling for AVENTINUS (WP 4.2). LE project LE1-2238 AVENTINUS internal technical report, University of Sheffield, UK, 1996.
Google Scholar
H. Cunningham, R.G. Gaizauskas, and Y. Wilks. A General Architecture for Text Engineering (GATE)-a new approach to Language Engineering R&D. Technical Report CS-95-21, Department of Computer Science, University of Sheffield, 1995. Also available as http://xxx.lanl.gov/ps/cmp-lg/9601009.
W. Daelemans, J. Zavrel, K. van der Sloot, and A. van den Bosch. TiMBL: Tilburg memory based learner version 1.0. Technical report, ILK Technical Report 98-03, 1998.
Google Scholar
D. Day, J. Aberdeen, L. Hirschman, R. Kozierok, P. Robinson, and M. Vilain. Mixed-Initiative Development of Language Processing Systems. In Proceedings of the 5th Conference on Applied NLP Systems (ANLP-97), 1997.
Google Scholar
J. Sterling.NYU E. Agichtein R. Grishman A.Borthwick.
Google Scholar
R. Evans and G. Gazdar. Datr: A language for lexical knowledge representation . Computational Linguistics 22 2, pages 167–216, 1996.
Google Scholar
R. Gaizauskas. XI: A Knowledge Representation Language Based on Cross-Classification and Inheritance. Technical Report CS-95-24, Department of Computer Science, University of Sheffield, 1995.
Google Scholar
R. Gaizauskas and Y. Wilks. Information Extraction: Beyond Document Retrieval. Journal of Documentation, 1997. In press (Also available as Technical Report CS-97-10).
Google Scholar
G. Gazdar and C. Mellish. Natural Language Processing in Prolog. Addison-Wesley, 1989.
Google Scholar
T. Givon. Transformations of ellipsis, sense development and rules of lexical derivation. Technical Report SP-2896, Systems Development Corp., Sta Monica, CA, 1967.
Google Scholar
R. Grishman. Information extraction: Techniques and challenges. In M-T. Pazienza, editor, Proceedings of the Summer School on Information Extraction (SCIE-97), LNCS/LNAI. Springer-Verlag, 1997.
Google Scholar
R. Grishman and J. Sterling. Generalizing automatically generated patterns. In Proceedings of COLING-92, 1992.
Google Scholar
R. Grishman and J. Sterling. Description of the Proteus system as used for MUC-5. In Proceedings of the Fifth Message Understanding Conference (MUC-5), pages 181–194. Morgan Kaufmann, 1993.
Google Scholar
G. Hirst. Semantic Interpretation and the Resolution of Ambiguity. CUP, Cambridge, England, 1987.
Google Scholar
J.R. Hobbs. The generic information extraction system. In Proceedings of the Fifth Message Understanding Conference (MUC-5), pages 87–91. Morgan Kaufman, 1993.
Google Scholar
W.J. Hutchins. Machine Translation: past, present, future. Chichester: Ellis Horwood, 1986.
Google Scholar
H. Khosravi and Y. Wilks. Extracting pragmatic content from e-mail. Journal of Natural Language Engineering, 1997. submitted.
Google Scholar
R. Krovetz and B. Croft. Lexical ambiguity and information retrieval. ACM Transactions on Information Systems 2 10, 1992.
Google Scholar
W. Lehnert, C. Cardie, D. Fisher, J. McCarthy, and E. Riloff. University of Massachusetts: Description of the CIRCUS system as used for MUC-4. In Proceedings of the Fourth Message Understanding Conference MUC-4, pages 282–288. Morgan Kaufmann, 1992.
Google Scholar
B. Levin. English Verb Calsses and Alternations. Chicago, II, 1993.
Google Scholar
H. P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development 1, pages 309–317, 1957.
Article MathSciNet Google Scholar
R. Morgan, R. Garigliano, P. Callaghan, S. Poria, M. Smith, A. Urbanowicz, R. Collingham, M. Costantino, and C. Cooper. Description of the LOLITA System as used for MUC-6. In Proceedings of the Sixth Message Understanding Conference (MUC-6), pages 71–86, San Francisco, 1995. Morgan Kaufmann.
Chapter Google Scholar
S. Muggleton. Recent advances in inductive logic programming. In Proc. 7th Annu. ACM Workshop on Comput. Learning Theory, pages 3–11. ACM Press, New York, NY, 1994.
Google Scholar
S. Nirenburg and V. Raskin. Ten choices for lexical semantics. Technical report, Computing Research Lab, Las Cruces, NM, 1996. MCCS-96-304.
Google Scholar
J. Pustejovsky. The Generative Lexicon. MIT, 1995.
Google Scholar
J. Pustejovsky and P. Anick. Automatically acquiring conceptual patterns without an annotated corpus. In Proceedings of the Third Workshop on Very Large Corpora, 1988. 1
Google Scholar
E. Riloff. Automatically contructing a dictionary for information extraction tasks. In Proceedings of Eleventh National Conference on Artificial Intelligence, 1993.
Google Scholar
E. Riloff and W. Lehnert. Automated dictionary construction for information extraction from text. In Proceedings of Ninth IEEE Conference on Artificial Intelligence for Applications, pages 93–99, 1993.
Google Scholar
E. Riloff and J. Shoen. Automatically aquiring conceptual patterns without an annotated corpus. In Proceedings of the Third Workshop on Very Large Corpora, 1995.
Google Scholar
E. Roche and Y. Schabes. Deterministic Part-of-Speech Tagging with Finite-State Transducers. Computational Linguistics, 21(2):227–254, June 1995. 4
Google Scholar
K. Samuel, S. Carberry, and K. Vijay-Shanker. Dialogue act tagging with transofrmation-based learning. In Proceedings of the COLING-ACL 1998 Conference, pages 1150–1156, 1998.
Google Scholar
S. Small and C. Rieger. Parsing and comprehending with word experts (a theory and it’s realiastion). In W. Lehnert and M. Ringle, editors, Strategies for Natural Language Processing. Lawrence Erlbaum Associates, Hillsdale, NJ, 1982.
Google Scholar
David Page Stephen Muggleton James Cussens and Ashwin Srinivasan. Using inductive logic programming for natural language processing. In Proceedings of in ECML.Workshop Notes on Empirical Learning of Natural Language Tasks, pages 25–34, Prague, 1997.
Google Scholar
Jin Wang T. Strzalkowski, Fang Lin and Jose Perez-Caballo. Natural Language Information Retrieval, chapter Evaluating Natural Language Processing Techniques in Information Retrieval, pages 113–146. Kluwer Academic Publishers, 1997.
Google Scholar
Mark Vilain.
Google Scholar
Y. Wilks. Grammar, Meaning and the Machine Analysis of Meaning. Routledge and Kegan Paul, 1972.
Google Scholar
Y. Wilks, L. Guthrie, J. Guthrie, and J. Cowie. Combining Weak Methods in Large-Scale Text Processing, in Jacobs 1992, Text-Based Intelligent Systems. Lawrence Erlbaum, 1992.
Google Scholar
Y. Wilks and M. Stevenson. Sense tagging: Semantic tagging with a lexicon. In Proceedings of the SIGLEX Workshop “Tagging Text with Lexical Semantics: What, why and how?”, Washington, D.C., April 1997. Available as http://xxx.lanl.gov/ps/cmp-lg/9705016.

Download references

Author information

Authors and Affiliations

The University of Sheffield, UK
Yorick Wilks & Roberta Catizone

Authors

Yorick Wilks
View author publications
You can also search for this author in PubMed Google Scholar
Roberta Catizone
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Systems and Production, University of Roma, Tor Vergata, Via di Tor Vergata, I-00133, Roma, Italy
Maria Teresa Pazienza

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wilks, Y., Catizone, R. (1999). Can We Make Information Extraction More Adaptive?. In: Pazienza, M.T. (eds) Information Extraction. SCIE 1999. Lecture Notes in Computer Science(), vol 1714. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48089-7_1

Download citation

DOI: https://doi.org/10.1007/3-540-48089-7_1
Published: 28 March 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66625-7
Online ISBN: 978-3-540-48089-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics