Skip to main content

Agents for Intelligent Information Extraction by Using Domain Knowledge and Token-Based Morphological Patterns

  • Conference paper
Intelligent Agents and Multi-Agent Systems (PRIMA 2003)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2891))

Included in the following conference series:

Abstract

Knowledge-based information extraction is known to have flexibility in recognizing various kinds of target information by exploiting the domain knowledge to automatically generate information-extraction rules. However, most of previous knowledge-based information-extraction systems are only applicable to labeled documents, and as a result, ontology terms must appear in the document in order to guide the system to determine the existence of the target information.

To make a knowledge-based information-extraction system to be more general enough to handle both labeled and unlabeled documents, this paper proposes an enhanced scheme of knowledge-based wrapper generation by using token-based morphological patterns. Each document is represented as a sequence of tokens, rather than a sequence of logical lines, in order to capture the meaning of data fragments more correctly and recognize the target information contextually. The newly implemented system XTROS +  is presented and its performance is demonstrated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ambite, J., Ashish, N., Barish, G., Knoblock, C., Minton, S., Modi, P., Muslea, I., Philpot, A., Tejada, S.: ARIADNE: A system for constructing mediators for Internet sources. In: Haas, L.M., Tiwary, A. (eds.) Proc. ACM SIGMOD Int. Conf. on Management of Data, Seattle, WA, June 2-4, pp. 561–563. ACM Press, New York (1998)

    Google Scholar 

  2. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proc. 11th Conf. on Computational Learning Theory, Madison, WI, July 24–26, pp. 92–100. ACM Press, New York (1998)

    Google Scholar 

  3. Doorenbos, R., Etzioni, O., Weld, D.: A scalable comparison-shopping agent for the World Wide Web. In: Proc. 1st Int. Conf. on Autonomous Agents, Marina del Rey, CA, Feburary 5-8, pp. 39–48. ACM Press, New York (1997)

    Chapter  Google Scholar 

  4. Kushmerick, N., Weld, D., Doorenbos, R.: Wrapper induction for information extraction. In: Proc. 15th Int. Joint Conf. on Artif. Intell., Nagoya, Japan, August 23-29, pp. 729–735. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  5. Kushmerick, N.: Gleaning the Web. IEEE Intelligent Systems 14, 20–22 (1999)

    Google Scholar 

  6. Kushmerick, N.: Wrapper induction: efficiency and expressiveness. Artif. Intell. 118, 15–68 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  7. Muslea, I., Minton, S., Knoblock, C.: A hierarchical approach to wrapper induction. In: Proc. 3rd Int. Conf. on Autonomous Agents, Seattle, WA, May 1-5, pp. 190–197. ACM Press, New York (1999)

    Chapter  Google Scholar 

  8. Riloff, E.: Automatically constructing a dictionary for information extraction tasks. In: Proc. 11th Nat. Conf. on Artif. Intell., Washington, DC, July 11-15, pp. 811–816. AAAI Press/The MIT Press (1993)

    Google Scholar 

  9. Soderland, S.: Learning information extraction rules for semi-structured and free text. Machine Learning 34, 233–272 (1999)

    Article  MATH  Google Scholar 

  10. Yang, J., Lee, E., Choi, J.: A shopping agent that automatically constructs wrappers for semi-structured online vendors. In: Leung, K.-S., Chan, L., Meng, H. (eds.) IDEAL 2000. LNCS, vol. 1983, pp. 368–373. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  11. Yang, J., Choi, J.: Knowledge-based wrapper induction for intelligent Web information extraction. In: Zhong, N., Liu, J., Yao, Y. (eds.) Web Intelligence, Springer, Heidelberg (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yang, J., Choi, J. (2003). Agents for Intelligent Information Extraction by Using Domain Knowledge and Token-Based Morphological Patterns. In: Lee, J., Barley, M. (eds) Intelligent Agents and Multi-Agent Systems. PRIMA 2003. Lecture Notes in Computer Science(), vol 2891. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39896-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39896-7_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20460-2

  • Online ISBN: 978-3-540-39896-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics