Skip to main content

Chinese Expert Entity Homepage Recognition Based on Co-EM

  • Conference paper
Web Information Systems and Mining (WISM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6988))

Included in the following conference series:

  • 1289 Accesses

Abstract

Focused on the problem of numerous labeling works on the expert homepage in the procedure of Chinese expert entity homepage recognition, in this paper, a method of Chinese expert entity homepage recognition based on the Co-EM proposed. In detail, firstly, collect the names of Chinese expert entity and the corresponding web pages, and then label a small quantity of web pages. Secondly for Chinese entity characteristics, extract the hyperlink features and the web page content features as two independent feature sets. Thirdly train the hyperlink classifier using the hyperlink feature set and label the all the expert entity homepages, and then train the content classifier using the web page content feature set and the labels which were labeled by the hyperlink classifier. Use the labels which were labeled by the content classifier to update the hyperlink classifier. Repeat the procedure until the two classifiers converge. Finally, experiments were done by employing the method of 10-fold cross validation. The results show that the method based on the Co-EM semi-supervised algorithm can uses the unlabeled web pages effectively and there is an increase of accuracy of recognition compared with using the labeled web pages only.

This paper is supported by National Nature Science Foundation (60863011), Yunnan Nature Science Foundation (2008CC023), Yunnan Young and Middle-Aged Science and Technology Leaders Foundation (2007PY01-11).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. http://trec.nist.gov/

  2. Davenport, T.: Knowledge management at Hewlett Packard, Center for Business Innovation (1996)

    Google Scholar 

  3. Yimam-Seid, A.: Expert finding systems for organizations: Problem and domain analysis and the DEMOIR approach. Journal of Organizational Computing and Electronic Commerce 13(1), 1–24 (2003)

    Article  Google Scholar 

  4. Campbell, C.S., Maglio, P.P., Cozzi, A., et al.: Expertise identification using email communications. In: CIKM 2003: Proceedings of the Twelfth International Conference on Information and Knowledge Management, pp. 528–531. ACM Press, New York (2003)

    Chapter  Google Scholar 

  5. Macdonald, C., Ounis, I.: Voting for candidates: adapting data fusion techniques for an expert search task. In: CIKM 2006: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 387–396 (2006)

    Google Scholar 

  6. Balog, K., Azzopardi, L., de Rijke, M.: Formal models for expert finding in enterprise corpora. In: SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, USA, pp. 43–50 (2006)

    Google Scholar 

  7. Li, L., Yu, Z., Zou, J., Su, L., Xian, Y., Mao, C.: Research on the Method of Entity Homepage Recognition. Journal of Computational Information Systems (2009)

    Google Scholar 

  8. Fang, Y., Si, L., Mathur, A.: FacFinder: Search for Expertise in Academic Institutions, Technical Report, SERC-TR-294 and Department of Computer Science, Purdue University (2008)

    Google Scholar 

  9. Fang, Y., Si, L., Mathur, A.: Learning to Rank Expertise Information in Heterogeneous Information Sources. In: SIGIR 2009 Workshop on Learning to Rank for Information Retrieval (SIGIR Workshop), Boston, USA (July 2009)

    Google Scholar 

  10. http://www2.itap.purdue.edu/indure/

  11. Davenport, T., Prusak, L.: Working Knowledge: How Organizations Manage What They Know. Harvard Business School Press, Boston (1998)

    Google Scholar 

  12. Lin, C., Griffiths-Fisher, V., Ehrlich, K., Desforges, C.: SmallBlue: People Mining for Expertise Search and Social Network Analysis. IEEE Multimedia Magazine (2008)

    Google Scholar 

  13. Li, L., Yu, Z., Wang, Y., Mao, C., Guo, J.: Research on the Method of Chinese Expert Entity Homepage Recognition. Journal of Guangxi Normal University(Natural Science Edition) (March 2011)

    Google Scholar 

  14. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, Wisconsin, MI, pp. 92–100 (1998)

    Google Scholar 

  15. Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Ninth International Conference on Information and Knowledge Management, pp. 86–93 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, L., Yu, Z., Li, L. (2011). Chinese Expert Entity Homepage Recognition Based on Co-EM. In: Gong, Z., Luo, X., Chen, J., Lei, J., Wang, F.L. (eds) Web Information Systems and Mining. WISM 2011. Lecture Notes in Computer Science, vol 6988. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23982-3_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23982-3_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23981-6

  • Online ISBN: 978-3-642-23982-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics