Result Disambiguation in Web People Search

  • Richard Berendsen
  • Bogomil Kovachev
  • Evangelia-Paraskevi Nastou
  • Maarten de Rijke
  • Wouter Weerkamp
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7224)

Abstract

We study the problem of disambiguating the results of a web people search engine: given a query consisting of a person name plus the result pages for this query, find correct referents for all mentions by clustering the pages according to the different people sharing the name. While the problem has been studied extensively, we discover that the increasing availability of results retrieved from social media platforms causes state-of-the-art methods to break down. We analyze the problem and propose a dual strategy where we distinguish between results obtained from social media platforms and those obtained from other sources. In our dual strategy, the two types of documents are disambiguated separately, using different strategies, and their results are then merged. We study several instantiations for the different stages in our proposed strategy and manage to achieve state-of-the-art performance.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval 12, 461–486 (2009)CrossRefGoogle Scholar
  2. 2.
    Amigó, E., Gonzalo, J., Artiles, J., Verdejo, M.F.: Combining evaluation metrics via the unanimous improvement ratio and its application to clustering tasks. Journal of Articial Intelligence Research 42, 689–718 (2011)MATHGoogle Scholar
  3. 3.
    Artiles, J., Gonzalo, J., Sekine, S.: The semeval-2007 weps evaluation: Establishing a benchmark for the web people search task. In: Proceedings of the 4th International Workshop on Semantic Evaluations, pp. 64–69 (2007)Google Scholar
  4. 4.
    Artiles, J., Gonzalo, J., Sekine, S.: Weps 2 evaluation campaign: overview of the web people search clustering task. In: 2nd Web People Search Evaluation Workshop (WePS 2009), 18th WWW Conference (2009)Google Scholar
  5. 5.
    Artiles, J., Borthwick, A., Gonzalo, J., Sekine, S., Amigó, E.: WePS-3 Evaluation Campaign: Overview of the Web People Search Clustering and Attribute Extraction Tasks. In: CLEF 2010 Working Notes (2010)Google Scholar
  6. 6.
    Balog, K., Azzopardi, L., de Rijke, M.: Resolving person names in web people search. In: Weaving Services, Location, and People on the WWW (2009)Google Scholar
  7. 7.
    Carpineto, C., Osiński, S., Romano, G., Weiss, D.: A survey of web clustering engines. ACM Computing Surveys 41(3), 17:1–17:38 (2009)CrossRefGoogle Scholar
  8. 8.
    Chen, Y., Lee, S., Huang, C.: Polyuhk: A robust information extraction system for web personal names. In: 2nd Web People Search Evaluation Workshop (WePS 2009), 18th WWW Conference (2009)Google Scholar
  9. 9.
    Crabtree, D., Gao, X., Andreae, P.: Improving web clustering by cluster selection. In: WI 2005, pp. 172–178 (2005)Google Scholar
  10. 10.
    Ferragina, P., Gulli, A.: A personalized search engine based on web-snippet hierarchical clustering. Software: Practice and Experience 38(2), 189–225 (2008)CrossRefGoogle Scholar
  11. 11.
    Geraci, F., Pellegrini, M., Pisati, P., Sebastiani, F.: A scalable algorithm for high-quality clustering of web snippets. In: SAC 2006, pp. 1058–1062 (2006)Google Scholar
  12. 12.
    Ikeda, M., Ono, S., Sato, I., Yoshida, M., Nakagawa, H.: Person Name Disambiguation on the Web by Two-Stage Clustering. In: 2nd Web People Search Evaluation Workshop (WePS 2009), 18th WWW Conference (2009)Google Scholar
  13. 13.
    Janruang, J., Kreesuradej, W.: A new web search result clustering based on true common phrase label discovery. In: Int. Conf. Comp. Intell. for Modelling, Control and Automation (2006)Google Scholar
  14. 14.
    Mecca, G., Raunich, S., Pappalardo, A.: A new algorithm for clustering search results. Data & Knowledge Engineering 62(3), 504–522 (2007)CrossRefGoogle Scholar
  15. 15.
    Monz, C., Weerkamp, W.: A comparison of retrieval-based hierarchical clustering approaches to person name disambiguation. In: SIGIR 2009 (2009)Google Scholar
  16. 16.
    On, B., Lee, I., Lee, D.: Scalable clustering methods for the name disambiguation problem. Knowledge and Information Systems 0219-1377, 1–23 (2011)Google Scholar
  17. 17.
    Pantel, P., Pennacchiotti, M.: Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In: ACL 2006, pp. 113–120 (2006)Google Scholar
  18. 18.
    Pilz, A., Paaß, G.: From names to entities using thematic context distance. In: CIKM 2011 (2011)Google Scholar
  19. 19.
    Smucker, M., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: CIKM 2007 (2007)Google Scholar
  20. 20.
    Weerkamp, W., Balog, K., de Rijke, M., Berendsen, R., Kovachev, B., Meij, E.: People searching for people: Analysis of a people search engine log. In: SIGIR 2011 (2011)Google Scholar
  21. 21.
    Yoshida, M., Ikeda, M., Ono, S., Sato, I., Nakagawa, H.: Person name disambiguation by bootstrapping. In: SIGIR 2010, pp. 10–17. ACM (2010)Google Scholar
  22. 22.
    Zhang, D., Dong, Y.: Semantic, Hierarchical, Online Clustering of Web Search Results. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 69–78. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  23. 23.
    Zhu, D., Dreher, H.: Improving Web Search by Categorization, Clustering, and Personalization. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds.) ADMA 2008. LNCS (LNAI), vol. 5139, pp. 659–666. Springer, Heidelberg (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Richard Berendsen
    • 1
  • Bogomil Kovachev
    • 1
  • Evangelia-Paraskevi Nastou
    • 1
  • Maarten de Rijke
    • 1
  • Wouter Weerkamp
    • 1
  1. 1.ISLAUniversity of AmsterdamAmsterdamThe Netherlands

Personalised recommendations