Advances in Intelligent Web Mastering - 2 pp 209-218 | Cite as
Entity Extraction from the Web with WebKnox
Conference paper
- 3 Citations
- 403 Downloads
Abstract
This paper describes a system for entity extraction from the web. The system uses three different extraction techniques which are tightly coupled with mechanisms for retrieving entity rich web pages. The main contributions of this paper are a new entity retrieval approach, a comparison of different extraction techniques and a more precise entity extraction algorithm. The presented approach allows to extract domain-independent information from the web requiring only minimal human effort.
Keywords
Information Extraction Web Mining OntologiesPreview
Unable to display preview. Download preview PDF.
References
- 1.Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open Information Extraction from the Web. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 2670–2676 (2007)Google Scholar
- 2.Chang, C.H., Kayed, M., Girgis, M.R., Shaalan, K.F.: A Survey of Web Information Extraction Systems. IEEE Transactions on Knowledge and Data Engineering 18(10), 1411–1428 (2006)CrossRefGoogle Scholar
- 3.Downey, D., Etzioni, O., Soderland, S., Weld, D.S.: Learning Text Patterns for Web Information Extraction and Assessment. In: AAAI 2004 Workshop on Adaptive Text Extraction and Mining (2004)Google Scholar
- 4.Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the Web: An experimental study. Artificial Intelligence 165(1), 91–134 (2005)CrossRefGoogle Scholar
- 5.Gatterbauer, W., Bohunsky, P., Herzog, M., Krüpl, B., Pollak, B.: Towards domain-independent information extraction from web tables. In: Proceedings of the 16th international conference on World Wide Web, pp. 71–80. ACM, New York (2007)CrossRefGoogle Scholar
- 6.Popov, B., Kiryakov, A., Manov, D., Kirilov, A., Ognyanoff, D., Goranov, M.: Towards semantic web information extraction. In: Workshop on Human Language Technology for the Semantic Web and Web Services (2003)Google Scholar
- 7.Urbansky, D., Thom, J.A., Feldmann, M.: WebKnox: Web Knowledge Extraction. In: Proceedings of the Thirteenth Australasian Document Computing Symposium, pp. 27–34 (2008)Google Scholar
- 8.Wang, R.C., Cohen, W.W.: Language-Independent Set Expansion of Named Entities Using the Web. In: The 2007 IEEE International Conference on Data Mining, pp. 342–350 (2007)Google Scholar
- 9.Yates, A.: Information Extraction from the Web: Techniques and Applications. Ph.D. thesis, University of Washington, Computer Science and Engineering (2007)Google Scholar
Copyright information
© Springer-Verlag Berlin Heidelberg 2010