Abstract
We consider the problem of learning to perform information extraction in domains where linguistic processing is problematic, such as Usenet posts, email, and finger plan files. In place of syntactic and semantic information, other sources of information can be used, such as term frequency, typography, formatting, and mark-up. We describe four learning approaches to this problem, each drawn from a different paradigm: a rote learner, a term-space learner based on Naive Bayes, an approach using grammatical induction, and a relational rule learner. Experiments on 14 information extraction problems defined over four diverse document collections demonstrate the effectiveness of these approaches. Finally, we describe a multistrategy approach which combines these learners and yields performance competitive with or better than the best of them. This technique is modular and flexible, and could find application in other machine learning problems.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Aone, C. & Bennett, S.W. (1996). Applying machine learning to anaphora resolution. In S. Wermter, E. Riloff, & G. Scheler (Eds.), Connectionist, statistical and symbolic approaches to learning for natural language processing (pp. 302–314). Berlin: Springer-Verlag.
Appelt, D.E., Hobbs, J.R., Bear, J., Israel, D., & Tyson, M. (1993). FASTUS: a finite-state processor for information extraction from real-world text. Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI-93) (pp. 1172–1178).
Apté, C., Damerau, F., & Weiss, S.M., (1994). Automated learning of decision rules for text categorization. ACM Transactions on Information Systems, 12(3), 233–251.
August, S.E. & Dolan, C.P. (1992). Hughes Research Laboratories: Description of the trainable text skimmer used for MUC-4. Proceedings of the Fourth Message Understanding Conference (MUC-4), pp. 189–196.
Bikel, D.M., Miller, S., Schwartz, R., & Weischedel, R. (1997). Nymble: a high-performance learning name-finder. Proceedings of the Fifth Conference on Applied Natural Language Processing (pp. 194–201).
Califf, M.E. (1998). Relational learning techniques for natural language information extraction. Ph.D. Thesis, University of Texas at Austin.
Cardie, C. (1993). A case-based approach to knowledge acquisition for domain-specific sentence analysis. Proceedings of the Eleventh National Conference on Artificial Intelligence (AAAI-93) (pp. 798–803).
Cardie, C. (1997). Empirical methods in information extraction. AI Magazine, 18(4), 65–79.
Carrasco, R.C. & Oncina J. (1994). Learning stochastic regular grammars by means of a state merging method. In R.C. Carrasco & J. Oncina (Eds.), Grammatical inference and applications: Second international colloquium, ICGI-94, Springer-Verlag.
Chan, P. & Stolfo, S. (1993). Experiments on multistrategy learning by meta-learning, Proceedings of the Second International Conference on Information and Knowledge Management (CIKM 93) (pp. 314–323).
Clark, P. & Boswell, R. (1991). Rule induction with CN2: some recent improvements. In Y. Kodratoff (Ed.), Machine learning—EWSL-91 (pp.151–163). Springer-Verlag, Berlin.
Cohen, W.W. & Singer, Y. (1996). Context-sensitive learning methods for text categorization. Proceedings of the 19th Annual International ACM Conference on Research and Development in Information Retrieval (pp. 307–315) Zurich, Switzerland: ACM Press.
Cowie, J., Guthrie, L., Jin, W., Wang, R., Wakao, T., Pustejovsky, J., & Waterman, S. (1993). CRL/Brandeis: Description of the Diderot system as used for MUC-5. Proceedings of the Fifth Message Understanding Conference (MUC-5) (pp. 161–179).
Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., & Slattery, S. (1998). Learning to extract symbolic knowledge from the World Wide Web. Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-98).
Defense Advanced Research Projects Agency (1992). Proceedings of the Fourth Message Understanding Conference (MUC-4), McLean, Virginia. Morgan Kaufmann Publisher, Inc.
Defense Advanced Research Projects Agency (1993). Proceedings of the Fifth Message Understanding Conference (MUC-5), Baltimore, Maryland. Morgan Kaufmann Publisher, Inc.
Defense Advanced Research Projects Agency (1995). Proceedings of the Sixth Message Understanding Conference (MUC-6). Morgan Kaufmann Publisher, Inc.
Domingos, P. (1996). Unifying instance-based and rule-based induction. Machine Learning, 24(2), 141–168.
Domingos, P. & Pazzani, M. (1996). Beyond independence: Conditions for the optimality of the simple Bayesian classifier. Proceedings of the Thirteenth International Conference on Machine Learning (ICML-96) pp. 105–112).
Doorenbos, R., Etzioni, O., & Weld, D.S. (1997). A scalable comparison-shopping agent for the world-wide web. Proceedings of the First International Conference on Autonomous Agents.
Duda, R. & Hart, P. (1973). Pattern classification and scene analysis. New York: John Wiley and Sons.
Freitag, D. (1998). Multistrategy learning for information extraction. Proceedings of the Fifteenth International Machine Learning Conference (ICML-98).
Freitag, D. (1999). Machine learning for information extraction in informal domains. Ph.D. Thesis, Carnegie Melon University.
Goan, T., Benson, N., & Etzioni, O. (1996). A grammar inference algorithm for the World Wide Web. Working Notes of the AAAI-96 Spring Symposium on Machine Learning in Information Access.
Gold, E. (1967). Language identification in the limit. Information and Control, 10, 447–474.
Kim, J.-T. & Moldovan, D. I. (1995). Acquisition of linguistic patterns for knowledge-based information extraction. IEEE Transactions on Knowledge and Data Engineering, 7(5), 713–724.
Kohavi, R. (1995). The power of decision tables. Proceedings of the European Conference on Machine Learning (ECML-95) (pp. 174–89).
Kushmerick, N. (1997). Wrapper induction for information extraction. Ph.D. Thesis, University of Washington. Tech Report UW-CSE–97–11–04.
Lewis, D. (1992). Representation and learning in information retrieval. Ph.D. Thesis, Univ. of Massachusetts. CS Tech. Report 91–93.
Lewis, D.D. (1997). Reference list to accompany the SIGIR-97 Tutorial on Machine Learning for Information Retrieval. http://www.research.att.com/lewis/papers/lewis98.ps.
Maron, M. (1961). Automatic indexing: An experimental inquiry, Journal of the Association for Computing Machinery, 8, 404–417.
McCarthy, J.F. & Lehnert, W.G. (1995). Using decision trees for coreference resolution. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI-95).
Michalski, R. & Tecuci, G. (Eds.). (1994). Machine learning: A multistrategy approach. San Mateo, CA: Morgan Kaufmann.
Michalski, R. S. (1983). A theory and methodology of inductive learning. In R.S. Michalski, J.G. Carbonell, & T.M. Mitchell (Eds.), Machine learning: An artificial intelligence approach (pp. 83–134), Palo Alto, Ca: Tioga Publishing Company.
Mitchell, T.M. (1997). Machine learning. The McGraw-Hill Companies, Inc.
Muraki, K., Doi, S., & Ando, S. (1993). NEC: Ddescription of theVENIEXsystem as used for MUC-5. Proceedings of the Fifth Message Understanding Conference (MUC-5) (pp. 147–159).
Noah, W.W. & Weeks, R.V. (1993). TRW: Description of the DEFT system as used for MUC-5. Proceedings of the Fifth Message Understanding Conference (MUC-5) (pp. 237–248).
Quinlan, J.R. (1990). Learning logical definitions from relations. Machine Learning, 5(3), 239–266.
Quinlan, J.R. (1993). C4.5: Programs for machine learning. San Mateo, Calif: Morgan Kaufmann Publishers.
Rabiner, L.R., & Juang, B.H. (1986). An introduction to hidden Markov models. IEEE ASSP Magazine, 3(1), 4–16.
Riloff, E. (1996). Automatically generating extraction patterns from untagged text. Proceedings of the Thirteenth National Conference on Artificial Intelligence (AAAI-96) (pp. 1044–1049).
Riloff, E. & Lehnert, W. (1994). Information extraction as a basis for high-precision text classification. ACM Transactions on Information Systems, 12(3), 296–333.
Rulot, H. & Vidal, E. (1988). An efficient algorithm for the inference of circuit-free automata. In G.A. Ferrate (Ed.), Syntactic and structural pattern recognition. Springer-Verlag, Berlin.
Schaffer, C. (1993). Selecting a classification method by cross-validation. Machine Learning, 13(1), 135–143.
Soderland, S. (1996). Learning text analysis rules for domain-specific natural Language processing. Ph.D. Thesis, University of Massachusetts. CS Tech. Report 96–087.
Soderland, S. (1997). Learning to extract text-based information from the world wide web. Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining.
Soderland, S. & Lehnert, W. (1994). Wrap-Up: a trainable discourse module for information extraction. Journal of Artificial Intelligence Research, 2, 131–158.
van Rijsbergen, C.J. (1979). Information Retrieval. Boston: Butterworths, Inc.
Vidal, E. (1994). Grammatical inference: an introductory survey. In R.C. Carrasco & J. Oncina (Eds.), Grammatical Inference and Applications: Second International Colloquium, ICGI-94 (pp. 1–4) Springer-Verlag.
Weischedel, R., Ayuso, D., Boisen, S., Fox, H., Ingria, R., & Matsukawa, T., Papageorgiov, C., MacLaughlin, D., Kitagawa, M., Sakai, T., Abe, J., Hosihi, H., Miyamoto, Y., & Miller, S. (1993). BBN: Description of the PLUM system as used for MUC-5. Proceedings of the Fifth Message Understanding Conference (MUC-5) (pp. 93–107).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Freitag, D. Machine Learning for Information Extraction in Informal Domains. Machine Learning 39, 169–202 (2000). https://doi.org/10.1023/A:1007601113994
Issue Date:
DOI: https://doi.org/10.1023/A:1007601113994