Skip to main content

Can We Make Information Extraction More Adaptive?

  • Conference paper
  • First Online:
Information Extraction (SCIE 1999)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1714))

Included in the following conference series:

Abstract

It seems widely agreed that IE (Information Extraction) is now a tested language technology that has reached precision+recall values that put it in about the same position as Information Retrieval and Machine Translation, both of which are widely used commercially. There is also a clear range of practical applications that would be eased by the sort of template-style data that IE provides. The problem for wider deployment of the technology is adaptability: the ability to customize IE rapidly to new domains.

In this paper we discuss some methods that have been tried to ease this problem, and to create something more rapid than the bench-mark one-month figure, which was roughly what ARPA teams in IE needed to adapt an existing system by hand to a new domain of corpora and templates. An important distinction in discussing the issue is the degree to which a user can be assumed to know what is wanted, to have pre-existing templates ready to hand, as opposed to a user who has a vague idea of what is needed from a corpus.

We shall discuss attempts to derive templates directly from corpora; to derive knowledge structures and lexicons directly from corpora, including discussion of the recent LE project ECRAN which attempted to tune existing lexicons to new corpora. An important issue is how far established methods in Information Retrieval of tuning to a user’s needs with feedback at an interface can be transferred to IE.

The authors are grateful to discussion and contributions from Hamish Cunningham, Robert Gaizauskas, Louise Guthrie and Evelyne Viegas, All errors are our own of course.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. Aberdeen, J. Burger, D. Day, L. Hirschman, P. Robinson, and M. Vilain. MITRE-Description of the Alembic System used for MUC-6. In Proceedings of the Sixth Message Understanding Conference (MUC-6), pages 141–156, 1995.

    Google Scholar 

  2. S. Azzam, K. Humphreys, and R. Gaizauskas. Using conference chains for text summarization. In Proceedings of the ACL’ 99 WOrkshop on Conference and its Applications. Maryland, 1999.

    Google Scholar 

  3. R. Basili, M. Pazienza, and P. Velardi. Aquisition of selectional patterns from sub-langauges. Machine Translation, 8, 1993.

    Google Scholar 

  4. R. Catizone M.T. Pazienza M. Stevenson M. P. Velardi M. Vindigni Y. Wilks Basili, R. An empirical approach to lexical tuning. In Workshop on Adapting Lexical and Corpus Resources to Sublanguages and Applications, LREC, First International Conference on Language Resources and Evaluation, Granada, Spain, 1998.

    Google Scholar 

  5. D. Bikel, S. Miller, R. Schwartz, and R. Weischedel. Nymble: a High-Performance Learning Name-finder. In Proceedings of the Fifth conference on Applied Natural Language Processing, 1997.

    Google Scholar 

  6. D.G. Bobrow and T. Winograd. An overview of krl, a knowledge representation language. Cognitive Science 1, pages 3–46, 1977.

    Article  Google Scholar 

  7. E. Brill. Some Advances in Transformation-Based Part of Speech Tagging. In Proceedings of the Twelfth National Conference on AI (AAAI-94), Seattle, Washington, 1994.

    Google Scholar 

  8. E. Brill. Transformation-Based Error-Driven Learning and Natural Language. Computational Linguistics, 21(4), December 1995.

    Google Scholar 

  9. E. Briscoe, A. Copestake, and V. De Pavia. Default inheritance in unification-based approaches to the lexicon. Technical report, Cambridge University Computer Laboratory, 1991.

    Google Scholar 

  10. R. Bruce and L. Guthrie. Genus disambiguation: A study in weighted preference. In Proceesings of COLING-92, pages 1187–1191, Nantes, France, 1992.

    Google Scholar 

  11. P. Buitelaar. A lexicon for underspecified semantic tagging. In Proceedings of the ACL-Siglex Workshop on Tagging Text with Lexical Semantics, Washington, D.C., 1997.

    Google Scholar 

  12. Claire Cardie. Empirical methods in information extraction. AI Magazine. Special Issue on Empirical Natural Language Processing, 18(4), 1997.

    Google Scholar 

  13. N. Chinchor. The statistical significance of the MUC-5 results. In Proceedings of the Fifth Message Understanding Conference (MUC-5), pages 79–83. Morgan Kaufmann, 1993.

    Google Scholar 

  14. N. Chinchor and Sundheim B. MUC-5 Evaluation Metrics. In Proceedings of the Fifth Message Understanding Conference (MUC-5), pages 69–78. Morgan Kaufmann, 1993.

    Google Scholar 

  15. N. Chinchor, L. Hirschman, and D.D. Lewis. Evaluating message understanding systems: An analysis of the third message understanding conference (muc-3). Computational Linguistics, 19(3):409–449, 1993.

    Google Scholar 

  16. R. Collier. Automatic Template Creation for Information Extraction. PhD thesis, UK, 1998.

    Google Scholar 

  17. J. Cowie, L. Guthrie, W. Jin, W. Odgen, J. Pustejowsky, R. Wanf, T. Wakao, S. Waterman, and Y. Wilks. CRL/Brandeis: The Diderot System. In Proceedings of Tipster Text Program (Phase I). Morgan Kaufmann, 1993.

    Google Scholar 

  18. J. Cowie and W. Lehnert. Information extraction. Special NLP Issue of the Communications of the ACM, 1996.

    Google Scholar 

  19. H. Cunningham. JAPE-a Jolly Advanced Pattern Engine. 1997.

    Google Scholar 

  20. H. Cunningham, S. Azzam, and Y. Wilks. Domain Modelling for AVENTINUS (WP 4.2). LE project LE1-2238 AVENTINUS internal technical report, University of Sheffield, UK, 1996.

    Google Scholar 

  21. H. Cunningham, R.G. Gaizauskas, and Y. Wilks. A General Architecture for Text Engineering (GATE)-a new approach to Language Engineering R&D. Technical Report CS-95-21, Department of Computer Science, University of Sheffield, 1995. Also available as http://xxx.lanl.gov/ps/cmp-lg/9601009.

  22. W. Daelemans, J. Zavrel, K. van der Sloot, and A. van den Bosch. TiMBL: Tilburg memory based learner version 1.0. Technical report, ILK Technical Report 98-03, 1998.

    Google Scholar 

  23. D. Day, J. Aberdeen, L. Hirschman, R. Kozierok, P. Robinson, and M. Vilain. Mixed-Initiative Development of Language Processing Systems. In Proceedings of the 5th Conference on Applied NLP Systems (ANLP-97), 1997.

    Google Scholar 

  24. J. Sterling.NYU E. Agichtein R. Grishman A.Borthwick.

    Google Scholar 

  25. R. Evans and G. Gazdar. Datr: A language for lexical knowledge representation . Computational Linguistics 22 2, pages 167–216, 1996.

    Google Scholar 

  26. R. Gaizauskas. XI: A Knowledge Representation Language Based on Cross-Classification and Inheritance. Technical Report CS-95-24, Department of Computer Science, University of Sheffield, 1995.

    Google Scholar 

  27. R. Gaizauskas and Y. Wilks. Information Extraction: Beyond Document Retrieval. Journal of Documentation, 1997. In press (Also available as Technical Report CS-97-10).

    Google Scholar 

  28. G. Gazdar and C. Mellish. Natural Language Processing in Prolog. Addison-Wesley, 1989.

    Google Scholar 

  29. T. Givon. Transformations of ellipsis, sense development and rules of lexical derivation. Technical Report SP-2896, Systems Development Corp., Sta Monica, CA, 1967.

    Google Scholar 

  30. R. Grishman. Information extraction: Techniques and challenges. In M-T. Pazienza, editor, Proceedings of the Summer School on Information Extraction (SCIE-97), LNCS/LNAI. Springer-Verlag, 1997.

    Google Scholar 

  31. R. Grishman and J. Sterling. Generalizing automatically generated patterns. In Proceedings of COLING-92, 1992.

    Google Scholar 

  32. R. Grishman and J. Sterling. Description of the Proteus system as used for MUC-5. In Proceedings of the Fifth Message Understanding Conference (MUC-5), pages 181–194. Morgan Kaufmann, 1993.

    Google Scholar 

  33. G. Hirst. Semantic Interpretation and the Resolution of Ambiguity. CUP, Cambridge, England, 1987.

    Google Scholar 

  34. J.R. Hobbs. The generic information extraction system. In Proceedings of the Fifth Message Understanding Conference (MUC-5), pages 87–91. Morgan Kaufman, 1993.

    Google Scholar 

  35. W.J. Hutchins. Machine Translation: past, present, future. Chichester: Ellis Horwood, 1986.

    Google Scholar 

  36. H. Khosravi and Y. Wilks. Extracting pragmatic content from e-mail. Journal of Natural Language Engineering, 1997. submitted.

    Google Scholar 

  37. R. Krovetz and B. Croft. Lexical ambiguity and information retrieval. ACM Transactions on Information Systems 2 10, 1992.

    Google Scholar 

  38. W. Lehnert, C. Cardie, D. Fisher, J. McCarthy, and E. Riloff. University of Massachusetts: Description of the CIRCUS system as used for MUC-4. In Proceedings of the Fourth Message Understanding Conference MUC-4, pages 282–288. Morgan Kaufmann, 1992.

    Google Scholar 

  39. B. Levin. English Verb Calsses and Alternations. Chicago, II, 1993.

    Google Scholar 

  40. H. P. Luhn. A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development 1, pages 309–317, 1957.

    Article  MathSciNet  Google Scholar 

  41. R. Morgan, R. Garigliano, P. Callaghan, S. Poria, M. Smith, A. Urbanowicz, R. Collingham, M. Costantino, and C. Cooper. Description of the LOLITA System as used for MUC-6. In Proceedings of the Sixth Message Understanding Conference (MUC-6), pages 71–86, San Francisco, 1995. Morgan Kaufmann.

    Chapter  Google Scholar 

  42. S. Muggleton. Recent advances in inductive logic programming. In Proc. 7th Annu. ACM Workshop on Comput. Learning Theory, pages 3–11. ACM Press, New York, NY, 1994.

    Google Scholar 

  43. S. Nirenburg and V. Raskin. Ten choices for lexical semantics. Technical report, Computing Research Lab, Las Cruces, NM, 1996. MCCS-96-304.

    Google Scholar 

  44. J. Pustejovsky. The Generative Lexicon. MIT, 1995.

    Google Scholar 

  45. J. Pustejovsky and P. Anick. Automatically acquiring conceptual patterns without an annotated corpus. In Proceedings of the Third Workshop on Very Large Corpora, 1988. 1

    Google Scholar 

  46. E. Riloff. Automatically contructing a dictionary for information extraction tasks. In Proceedings of Eleventh National Conference on Artificial Intelligence, 1993.

    Google Scholar 

  47. E. Riloff and W. Lehnert. Automated dictionary construction for information extraction from text. In Proceedings of Ninth IEEE Conference on Artificial Intelligence for Applications, pages 93–99, 1993.

    Google Scholar 

  48. E. Riloff and J. Shoen. Automatically aquiring conceptual patterns without an annotated corpus. In Proceedings of the Third Workshop on Very Large Corpora, 1995.

    Google Scholar 

  49. E. Roche and Y. Schabes. Deterministic Part-of-Speech Tagging with Finite-State Transducers. Computational Linguistics, 21(2):227–254, June 1995. 4

    Google Scholar 

  50. K. Samuel, S. Carberry, and K. Vijay-Shanker. Dialogue act tagging with transofrmation-based learning. In Proceedings of the COLING-ACL 1998 Conference, pages 1150–1156, 1998.

    Google Scholar 

  51. S. Small and C. Rieger. Parsing and comprehending with word experts (a theory and it’s realiastion). In W. Lehnert and M. Ringle, editors, Strategies for Natural Language Processing. Lawrence Erlbaum Associates, Hillsdale, NJ, 1982.

    Google Scholar 

  52. David Page Stephen Muggleton James Cussens and Ashwin Srinivasan. Using inductive logic programming for natural language processing. In Proceedings of in ECML.Workshop Notes on Empirical Learning of Natural Language Tasks, pages 25–34, Prague, 1997.

    Google Scholar 

  53. Jin Wang T. Strzalkowski, Fang Lin and Jose Perez-Caballo. Natural Language Information Retrieval, chapter Evaluating Natural Language Processing Techniques in Information Retrieval, pages 113–146. Kluwer Academic Publishers, 1997.

    Google Scholar 

  54. Mark Vilain.

    Google Scholar 

  55. Y. Wilks. Grammar, Meaning and the Machine Analysis of Meaning. Routledge and Kegan Paul, 1972.

    Google Scholar 

  56. Y. Wilks, L. Guthrie, J. Guthrie, and J. Cowie. Combining Weak Methods in Large-Scale Text Processing, in Jacobs 1992, Text-Based Intelligent Systems. Lawrence Erlbaum, 1992.

    Google Scholar 

  57. Y. Wilks and M. Stevenson. Sense tagging: Semantic tagging with a lexicon. In Proceedings of the SIGLEX Workshop “Tagging Text with Lexical Semantics: What, why and how?”, Washington, D.C., April 1997. Available as http://xxx.lanl.gov/ps/cmp-lg/9705016.

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wilks, Y., Catizone, R. (1999). Can We Make Information Extraction More Adaptive?. In: Pazienza, M.T. (eds) Information Extraction. SCIE 1999. Lecture Notes in Computer Science(), vol 1714. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48089-7_1

Download citation

  • DOI: https://doi.org/10.1007/3-540-48089-7_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66625-7

  • Online ISBN: 978-3-540-48089-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics