Entity Extraction from the Web with WebKnox

Conference paper
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 67)


This paper describes a system for entity extraction from the web. The system uses three different extraction techniques which are tightly coupled with mechanisms for retrieving entity rich web pages. The main contributions of this paper are a new entity retrieval approach, a comparison of different extraction techniques and a more precise entity extraction algorithm. The presented approach allows to extract domain-independent information from the web requiring only minimal human effort.


Information Extraction Web Mining Ontologies 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open Information Extraction from the Web. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 2670–2676 (2007)Google Scholar
  2. 2.
    Chang, C.H., Kayed, M., Girgis, M.R., Shaalan, K.F.: A Survey of Web Information Extraction Systems. IEEE Transactions on Knowledge and Data Engineering 18(10), 1411–1428 (2006)CrossRefGoogle Scholar
  3. 3.
    Downey, D., Etzioni, O., Soderland, S., Weld, D.S.: Learning Text Patterns for Web Information Extraction and Assessment. In: AAAI 2004 Workshop on Adaptive Text Extraction and Mining (2004)Google Scholar
  4. 4.
    Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the Web: An experimental study. Artificial Intelligence 165(1), 91–134 (2005)CrossRefGoogle Scholar
  5. 5.
    Gatterbauer, W., Bohunsky, P., Herzog, M., Krüpl, B., Pollak, B.: Towards domain-independent information extraction from web tables. In: Proceedings of the 16th international conference on World Wide Web, pp. 71–80. ACM, New York (2007)CrossRefGoogle Scholar
  6. 6.
    Popov, B., Kiryakov, A., Manov, D., Kirilov, A., Ognyanoff, D., Goranov, M.: Towards semantic web information extraction. In: Workshop on Human Language Technology for the Semantic Web and Web Services (2003)Google Scholar
  7. 7.
    Urbansky, D., Thom, J.A., Feldmann, M.: WebKnox: Web Knowledge Extraction. In: Proceedings of the Thirteenth Australasian Document Computing Symposium, pp. 27–34 (2008)Google Scholar
  8. 8.
    Wang, R.C., Cohen, W.W.: Language-Independent Set Expansion of Named Entities Using the Web. In: The 2007 IEEE International Conference on Data Mining, pp. 342–350 (2007)Google Scholar
  9. 9.
    Yates, A.: Information Extraction from the Web: Techniques and Applications. Ph.D. thesis, University of Washington, Computer Science and Engineering (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  1. 1.University of Technology Dresden 
  2. 2.RMIT 

Personalised recommendations