Agents for Intelligent Information Extraction by Using Domain Knowledge and Token-Based Morphological Patterns
Knowledge-based information extraction is known to have flexibility in recognizing various kinds of target information by exploiting the domain knowledge to automatically generate information-extraction rules. However, most of previous knowledge-based information-extraction systems are only applicable to labeled documents, and as a result, ontology terms must appear in the document in order to guide the system to determine the existence of the target information.
To make a knowledge-based information-extraction system to be more general enough to handle both labeled and unlabeled documents, this paper proposes an enhanced scheme of knowledge-based wrapper generation by using token-based morphological patterns. Each document is represented as a sequence of tokens, rather than a sequence of logical lines, in order to capture the meaning of data fragments more correctly and recognize the target information contextually. The newly implemented system XTROS + is presented and its performance is demonstrated.
KeywordsDomain Knowledge Information Extraction Target Item Ontological Term Morphological Pattern
Unable to display preview. Download preview PDF.
- 1.Ambite, J., Ashish, N., Barish, G., Knoblock, C., Minton, S., Modi, P., Muslea, I., Philpot, A., Tejada, S.: ARIADNE: A system for constructing mediators for Internet sources. In: Haas, L.M., Tiwary, A. (eds.) Proc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA, June 2-4, pp. 561–563. ACM Press, New York (1998)Google Scholar
- 2.Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proc. 11th Conf. on Computational Learning Theory, Madison, WI, July 24–26, pp. 92–100. ACM Press, New York (1998)Google Scholar
- 4.Kushmerick, N., Weld, D., Doorenbos, R.: Wrapper induction for information extraction. In: Proc. 15th Int. Joint Conf. on Artif. Intell., Nagoya, Japan, August 23-29, pp. 729–735. Morgan Kaufmann, San Francisco (1997)Google Scholar
- 5.Kushmerick, N.: Gleaning the Web. IEEE Intelligent Systems 14, 20–22 (1999)Google Scholar
- 8.Riloff, E.: Automatically constructing a dictionary for information extraction tasks. In: Proc. 11th Nat. Conf. on Artif. Intell., Washington, DC, July 11-15, pp. 811–816. AAAI Press/The MIT Press (1993)Google Scholar
- 11.Yang, J., Choi, J.: Knowledge-based wrapper induction for intelligent Web information extraction. In: Zhong, N., Liu, J., Yao, Y. (eds.) Web Intelligence, Springer, Heidelberg (2003)Google Scholar