Agents for Intelligent Information Extraction by Using Domain Knowledge and Token-Based Morphological Patterns

  • Jaeyoung Yang
  • Joongmin Choi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2891)


Knowledge-based information extraction is known to have flexibility in recognizing various kinds of target information by exploiting the domain knowledge to automatically generate information-extraction rules. However, most of previous knowledge-based information-extraction systems are only applicable to labeled documents, and as a result, ontology terms must appear in the document in order to guide the system to determine the existence of the target information.

To make a knowledge-based information-extraction system to be more general enough to handle both labeled and unlabeled documents, this paper proposes an enhanced scheme of knowledge-based wrapper generation by using token-based morphological patterns. Each document is represented as a sequence of tokens, rather than a sequence of logical lines, in order to capture the meaning of data fragments more correctly and recognize the target information contextually. The newly implemented system XTROS +  is presented and its performance is demonstrated.


Domain Knowledge Information Extraction Target Item Ontological Term Morphological Pattern 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ambite, J., Ashish, N., Barish, G., Knoblock, C., Minton, S., Modi, P., Muslea, I., Philpot, A., Tejada, S.: ARIADNE: A system for constructing mediators for Internet sources. In: Haas, L.M., Tiwary, A. (eds.) Proc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA, June 2-4, pp. 561–563. ACM Press, New York (1998)Google Scholar
  2. 2.
    Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proc. 11th Conf. on Computational Learning Theory, Madison, WI, July 24–26, pp. 92–100. ACM Press, New York (1998)Google Scholar
  3. 3.
    Doorenbos, R., Etzioni, O., Weld, D.: A scalable comparison-shopping agent for the World Wide Web. In: Proc. 1st Int. Conf. on Autonomous Agents, Marina del Rey, CA, Feburary 5-8, pp. 39–48. ACM Press, New York (1997)CrossRefGoogle Scholar
  4. 4.
    Kushmerick, N., Weld, D., Doorenbos, R.: Wrapper induction for information extraction. In: Proc. 15th Int. Joint Conf. on Artif. Intell., Nagoya, Japan, August 23-29, pp. 729–735. Morgan Kaufmann, San Francisco (1997)Google Scholar
  5. 5.
    Kushmerick, N.: Gleaning the Web. IEEE Intelligent Systems 14, 20–22 (1999)Google Scholar
  6. 6.
    Kushmerick, N.: Wrapper induction: efficiency and expressiveness. Artif. Intell. 118, 15–68 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Muslea, I., Minton, S., Knoblock, C.: A hierarchical approach to wrapper induction. In: Proc. 3rd Int. Conf. on Autonomous Agents, Seattle, WA, May 1-5, pp. 190–197. ACM Press, New York (1999)CrossRefGoogle Scholar
  8. 8.
    Riloff, E.: Automatically constructing a dictionary for information extraction tasks. In: Proc. 11th Nat. Conf. on Artif. Intell., Washington, DC, July 11-15, pp. 811–816. AAAI Press/The MIT Press (1993)Google Scholar
  9. 9.
    Soderland, S.: Learning information extraction rules for semi-structured and free text. Machine Learning 34, 233–272 (1999)zbMATHCrossRefGoogle Scholar
  10. 10.
    Yang, J., Lee, E., Choi, J.: A shopping agent that automatically constructs wrappers for semi-structured online vendors. In: Leung, K.-S., Chan, L., Meng, H. (eds.) IDEAL 2000. LNCS, vol. 1983, pp. 368–373. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  11. 11.
    Yang, J., Choi, J.: Knowledge-based wrapper induction for intelligent Web information extraction. In: Zhong, N., Liu, J., Yao, Y. (eds.) Web Intelligence, Springer, Heidelberg (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Jaeyoung Yang
    • 1
  • Joongmin Choi
    • 1
  1. 1.Dept. of Computer Science and EngineeringHanyang UniversityKyunggi-DoKorea

Personalised recommendations