Abstract
Focused on the problem of numerous labeling works on the expert homepage in the procedure of Chinese expert entity homepage recognition, in this paper, a method of Chinese expert entity homepage recognition based on the Co-EM proposed. In detail, firstly, collect the names of Chinese expert entity and the corresponding web pages, and then label a small quantity of web pages. Secondly for Chinese entity characteristics, extract the hyperlink features and the web page content features as two independent feature sets. Thirdly train the hyperlink classifier using the hyperlink feature set and label the all the expert entity homepages, and then train the content classifier using the web page content feature set and the labels which were labeled by the hyperlink classifier. Use the labels which were labeled by the content classifier to update the hyperlink classifier. Repeat the procedure until the two classifiers converge. Finally, experiments were done by employing the method of 10-fold cross validation. The results show that the method based on the Co-EM semi-supervised algorithm can uses the unlabeled web pages effectively and there is an increase of accuracy of recognition compared with using the labeled web pages only.
This paper is supported by National Nature Science Foundation (60863011), Yunnan Nature Science Foundation (2008CC023), Yunnan Young and Middle-Aged Science and Technology Leaders Foundation (2007PY01-11).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Davenport, T.: Knowledge management at Hewlett Packard, Center for Business Innovation (1996)
Yimam-Seid, A.: Expert finding systems for organizations: Problem and domain analysis and the DEMOIR approach. Journal of Organizational Computing and Electronic Commerce 13(1), 1–24 (2003)
Campbell, C.S., Maglio, P.P., Cozzi, A., et al.: Expertise identification using email communications. In: CIKM 2003: Proceedings of the Twelfth International Conference on Information and Knowledge Management, pp. 528–531. ACM Press, New York (2003)
Macdonald, C., Ounis, I.: Voting for candidates: adapting data fusion techniques for an expert search task. In: CIKM 2006: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 387–396 (2006)
Balog, K., Azzopardi, L., de Rijke, M.: Formal models for expert finding in enterprise corpora. In: SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, USA, pp. 43–50 (2006)
Li, L., Yu, Z., Zou, J., Su, L., Xian, Y., Mao, C.: Research on the Method of Entity Homepage Recognition. Journal of Computational Information Systems (2009)
Fang, Y., Si, L., Mathur, A.: FacFinder: Search for Expertise in Academic Institutions, Technical Report, SERC-TR-294 and Department of Computer Science, Purdue University (2008)
Fang, Y., Si, L., Mathur, A.: Learning to Rank Expertise Information in Heterogeneous Information Sources. In: SIGIR 2009 Workshop on Learning to Rank for Information Retrieval (SIGIR Workshop), Boston, USA (July 2009)
Davenport, T., Prusak, L.: Working Knowledge: How Organizations Manage What They Know. Harvard Business School Press, Boston (1998)
Lin, C., Griffiths-Fisher, V., Ehrlich, K., Desforges, C.: SmallBlue: People Mining for Expertise Search and Social Network Analysis. IEEE Multimedia Magazine (2008)
Li, L., Yu, Z., Wang, Y., Mao, C., Guo, J.: Research on the Method of Chinese Expert Entity Homepage Recognition. Journal of Guangxi Normal University(Natural Science Edition) (March 2011)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, Wisconsin, MI, pp. 92–100 (1998)
Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Ninth International Conference on Information and Knowledge Management, pp. 86–93 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, L., Yu, Z., Li, L. (2011). Chinese Expert Entity Homepage Recognition Based on Co-EM. In: Gong, Z., Luo, X., Chen, J., Lei, J., Wang, F.L. (eds) Web Information Systems and Mining. WISM 2011. Lecture Notes in Computer Science, vol 6988. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23982-3_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-23982-3_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23981-6
Online ISBN: 978-3-642-23982-3
eBook Packages: Computer ScienceComputer Science (R0)