Issues in inductive learning of domain-specific text extraction rules
Domain-specific text analysis requires a dictionary of linguistic patterns that identify references to relevant information in a text. This paper describes CRYSTAL, a fully automated tool that induces such a dictionary of text extraction rules. We discuss some key issues in developing an automatic dictionary induction system, using CRYSTAL as a concrete example. CRYSTAL derives text extraction rules from training instances and generalizes each rule as far as possible, testing the accuracy of each proposed rule on the training corpus. An error tolerance parameter allows CRYSTAL to manipulate a trade-off between recall and precision. We discuss issues involved with creating training data, defining a domain ontology, and allowing a flexible and expressive representation while designing a search control mechanism that avoids intractability.
Unable to display preview. Download preview PDF.
- 1.Lehnert, W., McCarthy, J., Soderland, S., Riloff, E., Cardie, C., Peterson, J., Feng, F., Dolan, C., Goldman, S. University of Massachusetts/Hughes: Description of the CIRCUS system as used for MUC-5. In Proceedings of the Fifth Message Understanding Conference (MUC-5). (1993) 277–290Google Scholar
- 2.Lehnert, W. Symbolic/subsymbolic sentence analysis: Exploiting the best of two worlds. In J. Barnden and J. Pollack, editors, Advances in Connectionist and Neural Computation Theory, Vol. 1. Ablex Publishers, Norwood, NJ. (1991) 135–164Google Scholar
- 6.Moldovan, D. and Kim, J. PALKA: A system for lingistic knowledge acquisition. Technical Report PKPL 92-8, USC Department of Electrical Engineering Systems. (1992)Google Scholar
- 7.Moldovan, D., Cha, S., Chung, M., Gallippi, T., Hendrickson, K., Kim, J., Lin, C., and Lin, C. USC: Description of the SNAP system as used for MUC-5. In Proceedings of the Fifth Message Understanding Conference (MUC-5). (1993) 305–319Google Scholar
- 8.Riloff, E. Automatically constructing a dictionary for information extraction tasks. In Proceedings of the Eleventh National Conference on Artificial Intelligence. AAAI Press/ MIT Press. (1993) 811–816Google Scholar
- 9.Soderland, S., Fisher, D., Aseltine, J., Lehnert, W. CRYSTAL: Inducing a conceptual dictionary. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence. (1995)Google Scholar