Advertisement

Issues in inductive learning of domain-specific text extraction rules

  • Stephen Soderland
  • David Fisher
  • Jonathan Aseltine
  • Wendy Lehnert
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1040)

Abstract

Domain-specific text analysis requires a dictionary of linguistic patterns that identify references to relevant information in a text. This paper describes CRYSTAL, a fully automated tool that induces such a dictionary of text extraction rules. We discuss some key issues in developing an automatic dictionary induction system, using CRYSTAL as a concrete example. CRYSTAL derives text extraction rules from training instances and generalizes each rule as far as possible, testing the accuracy of each proposed rule on the training corpus. An error tolerance parameter allows CRYSTAL to manipulate a trade-off between recall and precision. We discuss issues involved with creating training data, defining a domain ontology, and allowing a flexible and expressive representation while designing a search control mechanism that avoids intractability.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Lehnert, W., McCarthy, J., Soderland, S., Riloff, E., Cardie, C., Peterson, J., Feng, F., Dolan, C., Goldman, S. University of Massachusetts/Hughes: Description of the CIRCUS system as used for MUC-5. In Proceedings of the Fifth Message Understanding Conference (MUC-5). (1993) 277–290Google Scholar
  2. 2.
    Lehnert, W. Symbolic/subsymbolic sentence analysis: Exploiting the best of two worlds. In J. Barnden and J. Pollack, editors, Advances in Connectionist and Neural Computation Theory, Vol. 1. Ablex Publishers, Norwood, NJ. (1991) 135–164Google Scholar
  3. 3.
    Lindberg, D., Humphreys, B., McCray, A. Unified Medical Language Systems. Methods of Information in Medicine, 32(4). (1993) 281–291PubMedGoogle Scholar
  4. 4.
    Michalski, R.S. A theory and methodology of inductive learning. Artificial Intelligence, 20. (1983) 111–161CrossRefGoogle Scholar
  5. 5.
    Mitchell, T.M. Generalization as search. Artificial Intelligence, 18. (1982) 203–226CrossRefGoogle Scholar
  6. 6.
    Moldovan, D. and Kim, J. PALKA: A system for lingistic knowledge acquisition. Technical Report PKPL 92-8, USC Department of Electrical Engineering Systems. (1992)Google Scholar
  7. 7.
    Moldovan, D., Cha, S., Chung, M., Gallippi, T., Hendrickson, K., Kim, J., Lin, C., and Lin, C. USC: Description of the SNAP system as used for MUC-5. In Proceedings of the Fifth Message Understanding Conference (MUC-5). (1993) 305–319Google Scholar
  8. 8.
    Riloff, E. Automatically constructing a dictionary for information extraction tasks. In Proceedings of the Eleventh National Conference on Artificial Intelligence. AAAI Press/ MIT Press. (1993) 811–816Google Scholar
  9. 9.
    Soderland, S., Fisher, D., Aseltine, J., Lehnert, W. CRYSTAL: Inducing a conceptual dictionary. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence. (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1996

Authors and Affiliations

  • Stephen Soderland
    • 1
  • David Fisher
    • 1
  • Jonathan Aseltine
    • 1
  • Wendy Lehnert
    • 1
  1. 1.Department of Computer ScienceUniversity of MassachusettsAmherst

Personalised recommendations