Mechanisms of Knowledge Evolution for Web Information Extraction

  • Carsten Müller
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3847)


The knowledge that is needed in Web information extraction can, under certain assumptions, be characterized as the knowledge held by wrappers that are used to extract the semantics of documents. The evolution of this knowledge can be divided into the phase of initial learning of the wrappers and the later phase of wrapper maintenance. In this paper we will focus only on the initial learning phase. Based on the LExIKON System, the principal structure of learning algorithms for island wrappers is explained.


Learning Algorithm Minimal Length Information Extraction Data Preparation Knowledge Evolution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Angluin, D., Smith, C.H.: A survey of inductive inference: Theory and methods. Computing Surveys 15, 237–269 (1983)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Angluin, D., Smith, C.H.: Inductive inference. In: Shapiro, S.C. (ed.) Encyclopedia of Artificial Intelligence, 2nd edn., vol. 1, pp. 672–682. John Wiley & Sons, Inc., Chichester (1992)Google Scholar
  3. 3.
    Chidlovskii, B.: Information extraction from tree documents by learning subtree delimiters. In: Kambhampati, S., Knoblock, C.A. (eds.) Proceedings of IJCAI 2003 Workshop on Information Integration on the Web (IIWeb 2003), Acapulco, Mexico, August 9-10, pp. 3–8 (2003)Google Scholar
  4. 4.
    Gold, E.M.: Limiting recursion. Journal of Symbolic Logic 30, 28–48 (1965)zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Gold, E.M.: Language identification in the limit. Information and Control 10, 447–474 (1967)zbMATHCrossRefGoogle Scholar
  6. 6.
    Grieser, G., Jantke, K.P., Lange, S.: Consistency queries in information extraction. In: Cesa-Bianchi, N., Numao, M., Reischuk, R. (eds.) ALT 2002. LNCS (LNAI), vol. 2533, pp. 173–187. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  7. 7.
    Grieser, G., Jantke, K.P., Lange, S., Niehoff, W.: LExIKON – Systemarchitekturen zur Extraktion von Information aus dem Internet. In: Proc. 42nd IWK, pp. 913–918. Technische Universität Ilmenau (2000)Google Scholar
  8. 8.
    Grieser, G., Jantke, K.P., Lange, S., Thomas, B.: A unifying approach to HTML wrapper representation and learning. In: Morishita, S., Arikawa, S. (eds.) DS 2000. LNCS (LNAI), vol. 1967, pp. 50–64. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  9. 9.
    Grieser, G., Lange, S.: Learning approaches to wrapper induction. In: Proc. 14th Int. Florida AI Research Society Conference, pp. 249–253. AAAI Press, Menlo Park (2001)Google Scholar
  10. 10.
    Grieser, G., Lange, S.: Changing the perspective: Interaction scenarios that take the needs of the machine into account. In: Kaschek, R. (ed.) Intelligent Assistant Systems: Concepts, Techniques and Technologies, Idea Group Inc. (2005)Google Scholar
  11. 11.
    Hirschberg, D.: A linear space algorithm for computing maximal common subsequences. Communications of the ACM 18, 341–343 (1975)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Jain, S., Osherson, D., Royer, J.S., Sharma, A.: Systems That Learn. The MIT Press, Cambridge (1999)Google Scholar
  13. 13.
    Jantke, K.P.: Formalisms underlying intuitiveness in human-computer interaction. In: Tanaka, Y. (ed.) Proc. 3rd International Workshop on Access Architectures for Organizing and Accessing Intellectual Assets, Sapporo, Japan, March 5–7, Meme Media Laboratory, Hokkaido University (2003)Google Scholar
  14. 14.
    Jantke, K.P.: Informationsbeschaffung im Internet. Lerntechnologien für die Extraktion von Information aus semistrukturierten Dokumenten. electrosuisse Bulletin SEV/VSE 94(1), 15–22 (2003)MathSciNetGoogle Scholar
  15. 15.
    Jantke, K.P.: Wissensmangement im Internet. auf dem Weg zum Digitalen Assistenten für das e-Learning. Global Journal of Engineering Education 7(3), 259–266 (2003)MathSciNetGoogle Scholar
  16. 16.
    Jantke, K.P., Grieser, G., Lange, S.: Adaptation to the learners’ needs and desires by induction and negotiation of hypotheses. In: Auer, M.E., Auer, U. (eds.) International Conference on Interactive Computer Aided Learning, ICL 2004, Villach, Austria, September 29 – October 1 (2004) (CD-ROM)Google Scholar
  17. 17.
    Jantke, K.P., Müller, C.: Wrapper induction programs as information extraction assistance. In: Hartmann, S., Kaschek, R., Kinshuk, Schewe, K.-D., Turull-Torres, J.M., Whiddett, R. (eds.) First International Workshop on Perspectives of Intelligent Systems Assistance, PISA 2005, Palmerston North, New Zealand, March 3-5, pp. 86–101. Massey University, Dept. Information Systems (2005)Google Scholar
  18. 18.
    Klette, R., Wiehagen, R.: Research in the theory of inductive inference by GDR mathematicians - A survey. Information Sciences 22, 149–169 (1980)zbMATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    Kushmerick, N.: Regression testing for wrapper maintenance. In: AAAI/IAAI, pp. 74–79 (1999)Google Scholar
  20. 20.
    Kushmerick, N.: Wrapper induction: Efficiency and expressiveness. Artificial Intelligence 118, 15–68 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    Lange, S., Grieser, G., Jantke, K.P.: Advanced elementary formal systems. TCS 298, 51–70 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  22. 22.
    Lange, S., Jantke, K.P., Grieser, G., Niehoff, W.: LExIKON – Lernszenarios für die Extraktion von Information aus dem Internet. In: Proc. 42nd IWK, pp. 901–906. Technische Universität Ilmenau (2000)Google Scholar
  23. 23.
    Lerman, K., Minton, S.N., Knoblock, C.A.: Wrapper maintenance: A machine learning approach. Journal of Artificial Intelligence Research 18, 149–181 (2003)zbMATHGoogle Scholar
  24. 24.
    Maedche, A., Staab, S.: Ontology learning for the semantic web. IEEE Intelligent Systems 16, 72–79 (2001)CrossRefGoogle Scholar
  25. 25.
    Muslea, I., Minton, S.N., Knoblock, C.A.: Hierarchical wrapper induction for semistructured information sources. Autonomous Agents and Multi-Agent Systems 4, 93–114 (2001)CrossRefGoogle Scholar
  26. 26.
    Soderland, S.: Learning information extraction rules for semi-structured and free text. Machine Learning 34(1-3), 233–272 (1999)Google Scholar
  27. 27.
    Stephan, A., Jantke, K.P.: Wissensextraktion aus dem Internet mit Hilfe gelernter Extraktionsmechanismen. In: Online 2002, Düsseldorf, Proc., vol. VI, pp. C612.01–C612.12. ONLINE GmbH (2002)Google Scholar
  28. 28.
    Thomas, B.: Anti unification based learning of T-wrappers for information extraction. In: Proc. of AAAI Workshop on Machine Learning for IE, pp. 190–198. AAAI, Menlo Park (1999)Google Scholar
  29. 29.
    Thomas, B.: Token-templates and logic programs for intelligent web search. Journal of Intelligent Information Systems 14, 241–261 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Carsten Müller
    • 1
  1. 1.SAP AGWalldorfGermany

Personalised recommendations