Advertisement

Adaptive Information Extraction: Core Technologies for Information Agents

  • Nicholas Kushmerick
  • Bernd Thomas
Chapter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2586)

Abstract

For the purposes of this chapter, an information agent can be described as a distributed system that receives a goal through its user interface, gathers information relevant to this goal from a variety of sources, processes this content as appropriate, and delivers the results to the users. We focus on the second stage in this generic architecture. We survey a variety of information extraction techniques that enable information agents to automatically gather information from heterogeneous sources.

Keywords

Hide Markov Model Information Extraction Inductive Logic Programming Extraction Rule Document Representation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    F. Bergadano and D. Gunetti. Inductive Logic Programming. MIT Press, 1996.Google Scholar
  2. 2.
    G. Beuster, B. Thomas, and C. Wolff. MIA-A Ubiquitous Multi-Agent Web Information System. In Proceedings of International ICSC Symposium on Multi-Agents and MobileAgents in Virtual Organizations and E-Commerce (MAMA’2000), December 2000.Google Scholar
  3. 3.
    D. Bikel, S. Miller, R. Schwartz, and R. Weischedel. Nymble: A high-performance learning name-finder. In Proc. Conf. on Applied Natural Language Processing, 1997.Google Scholar
  4. 4.
    S. Brin. Extracting patterns and relations from the World Wide Web. In Proc. SIGMOD Workshop on Databases and the Web, 1998.Google Scholar
  5. 5.
    M. E. Califf. Relational Learning Techniques for Natural Language Information Extraction. PhD thesis, University of Texas at Austin, August 1998.Google Scholar
  6. 6.
    F. Ciravegna. Learning to Tag for Information Extraction from Text. In Workshop Machine Learning for Information Extraction, European Conference on Artifical Intelligence ECCAI, August 2000. Berlin, Germany.Google Scholar
  7. 7.
    P. Clark and T. Niblett. The CN2 induction algorithm. Machine Learning, 3:261–283, 1989.Google Scholar
  8. 8.
    W. Cohen and L. Jensen. A structured wrapper induction system for extracting information from semi-structured documents.Google Scholar
  9. 9.
    V. Crescenzi, G. Mecca, and P. Merialdo. Roadrunner: Towards automatic data extraction from large web sites. In The VLDB Journal, pages 109–118, 2001.Google Scholar
  10. 10.
    D. Freitag. Machine Learning for Information Extraction in Informal Domains. PhD thesis, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, November 1998.Google Scholar
  11. 11.
    D. Freitag and N. Kushmerick. Boosted Wrapper Induction. In Proceedings of the Seventh National Conference on Artificial, pages 577–583, July 30–August 3 2000. Austin, Texas.Google Scholar
  12. 12.
    D. Freitag and A. McCallum. Information Extraction withHMMstructures learned by stochastic optimization. In Proceedings of the Seventh National Conference on Artificial, July 30–August 3 2000. Austin, Texas.Google Scholar
  13. 13.
    G. Grieser, K. P. Jantke, S. Lange, and B. Thomas. A Unifying Approach to HTML Wrapper Representation and Learning. In Proceedings of the Third International Conference on Discovery Science, December 2000. Kyoto, Japan.Google Scholar
  14. 14.
    C. Hsu and M. Dung. Generating finite-state transducers for semistructured data extraction from the web. J. Information Systems, 23(8):521–538, 1998.CrossRefGoogle Scholar
  15. 15.
    L. Jensen and W. Cohen. Grouping extracted fields. In Proc. IJCAI-01Workshop on Adaptive Text Extraction and Mining, 2001.Google Scholar
  16. 16.
    M. Junker, M. Sintek, and M. Rinck. Learning for Text Categorization and Information Extraction with ILP. In Proc. Workshop on Learning Language in Logic, June 1999. Bled, Slovenia.Google Scholar
  17. 17.
    N. Kushmerick. Wrapper Induction for Information Extraction. PhD thesis, University of Washington, 1997.Google Scholar
  18. 18.
    N. Kushmerick. Regression testing for wrapper maintenance. In Proc. National Conference on Artificial Intelligence, pages 74–79, 1999.Google Scholar
  19. 19.
    N. Kushmerick. Wrapper induction: Efficiency and expressiveness. Artificial Intelligence, 118(1–2):15–68, 2000.zbMATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    N. Kushmerick. Wrapper verification. World Wide Web Journal, 3(2):79–94, 2000.zbMATHCrossRefGoogle Scholar
  21. 21.
    N. Kushmerick, D. S. Weld, and R. Doorenbos. Wrapper Induction for Information Extraction. In M. E. Pollack, editor, Fifteenth International Joint Conference on Artificial Intelligence, volume 1, pages 729–735, August 1997. Japan.Google Scholar
  22. 22.
    T. Leek. Information extraction using hidden Markov models. Master’s thesis, University of California, San Diego, 1997.Google Scholar
  23. 23.
    K. Lerman and S. Minton. Learning the common structure of data. In Proc. National Conference on Artificial Intelligence, 2000.Google Scholar
  24. 24.
    T. M. Mitchell. Machine Learning. McGraw-Hill, 1997.Google Scholar
  25. 25.
    S. Muggleton and L. D. Raedt. Inductive logic programming: Theory and methods. Journal of Logic Programming, 19(20):629–679, 1994.CrossRefMathSciNetGoogle Scholar
  26. 26.
    I. Muslea. Extraction patterns for information extraction tasks: A survey. In Proc. AAAI-99 Workshop on Machine Learning for Information Extraction, 1999.Google Scholar
  27. 27.
    I. Muslea, S. Minton, and C. Knoblock. A hierarchical approach to wrapper induction. In Proc. Third International Conference on Autonomous Agents, pages 190–197, 1999.Google Scholar
  28. 28.
    I. Muslea, S. Minton, and C. Knoblock. Selective sampling with redundant views. In Proc. National Conference on Artificial Intelligence, 2000.Google Scholar
  29. 29.
    J. R. Quinlan. Learning logical definitions from relations. Machine Learning, 5:239–266, 1990.Google Scholar
  30. 30.
    E. M. Riloff. Information Extraction as a Basis for Portable Text Classification Systems. PhD thesis, University of Massachusetts Amherst, 1994.Google Scholar
  31. 31.
    K. Seymore, A. McCallum, and R. Rosenfeld. Learning hidden Markov model structure for information extraction. In Proc. AAAI-99 Workshop on Machine Learning for Information Extraction, 1999.Google Scholar
  32. 32.
    S. Soderland. Learning information extraction rules for semi-structured and free text. Machine Learning, 34(1–3):233–272, 1999.zbMATHCrossRefGoogle Scholar
  33. 33.
    S. G. Soderland. Learning Text Analysis Rules for Domain-Specific Natural Language Processing.PhD thesis, University of Massachusetts Amherst, 1997.Google Scholar
  34. 34.
    B. Thomas. Anti-Unification Based Learning of T-Wrappers for Information Extraction. In Proc. AAAI-99 Workshop on Machine Learning for Information Extraction, 1999.Google Scholar
  35. 35.
    B. Thomas. Token-Templates and Logic Programs for Intelligent Web Search. Intelligent Information Systems, 14(2/3):241–261, March-June 2000. Special Issue: Methodologies for Intelligent Information Systems.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Nicholas Kushmerick
    • 1
  • Bernd Thomas
    • 2
  1. 1.Computer Science DepartmentUniversity College DublinDublin
  2. 2.Institut für InformatikUniversität Koblenz-LandauKoblenz-Landau

Personalised recommendations