Result Disambiguation in Web People Search

Berendsen, Richard; Kovachev, Bogomil; Nastou, Evangelia-Paraskevi; de Rijke, Maarten; Weerkamp, Wouter

doi:10.1007/978-3-642-28997-2_13

Richard Berendsen²²,
Bogomil Kovachev²²,
Evangelia-Paraskevi Nastou²²,
Maarten de Rijke²² &
…
Wouter Weerkamp²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7224))

Included in the following conference series:

European Conference on Information Retrieval

2802 Accesses
8 Citations

Abstract

We study the problem of disambiguating the results of a web people search engine: given a query consisting of a person name plus the result pages for this query, find correct referents for all mentions by clustering the pages according to the different people sharing the name. While the problem has been studied extensively, we discover that the increasing availability of results retrieved from social media platforms causes state-of-the-art methods to break down. We analyze the problem and propose a dual strategy where we distinguish between results obtained from social media platforms and those obtained from other sources. In our dual strategy, the two types of documents are disambiguated separately, using different strategies, and their results are then merged. We study several instantiations for the different stages in our proposed strategy and manage to achieve state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval 12, 461–486 (2009)
Article Google Scholar
Amigó, E., Gonzalo, J., Artiles, J., Verdejo, M.F.: Combining evaluation metrics via the unanimous improvement ratio and its application to clustering tasks. Journal of Articial Intelligence Research 42, 689–718 (2011)
MATH Google Scholar
Artiles, J., Gonzalo, J., Sekine, S.: The semeval-2007 weps evaluation: Establishing a benchmark for the web people search task. In: Proceedings of the 4th International Workshop on Semantic Evaluations, pp. 64–69 (2007)
Google Scholar
Artiles, J., Gonzalo, J., Sekine, S.: Weps 2 evaluation campaign: overview of the web people search clustering task. In: 2nd Web People Search Evaluation Workshop (WePS 2009), 18th WWW Conference (2009)
Google Scholar
Artiles, J., Borthwick, A., Gonzalo, J., Sekine, S., Amigó, E.: WePS-3 Evaluation Campaign: Overview of the Web People Search Clustering and Attribute Extraction Tasks. In: CLEF 2010 Working Notes (2010)
Google Scholar
Balog, K., Azzopardi, L., de Rijke, M.: Resolving person names in web people search. In: Weaving Services, Location, and People on the WWW (2009)
Google Scholar
Carpineto, C., Osiński, S., Romano, G., Weiss, D.: A survey of web clustering engines. ACM Computing Surveys 41(3), 17:1–17:38 (2009)
Article Google Scholar
Chen, Y., Lee, S., Huang, C.: Polyuhk: A robust information extraction system for web personal names. In: 2nd Web People Search Evaluation Workshop (WePS 2009), 18th WWW Conference (2009)
Google Scholar
Crabtree, D., Gao, X., Andreae, P.: Improving web clustering by cluster selection. In: WI 2005, pp. 172–178 (2005)
Google Scholar
Ferragina, P., Gulli, A.: A personalized search engine based on web-snippet hierarchical clustering. Software: Practice and Experience 38(2), 189–225 (2008)
Article Google Scholar
Geraci, F., Pellegrini, M., Pisati, P., Sebastiani, F.: A scalable algorithm for high-quality clustering of web snippets. In: SAC 2006, pp. 1058–1062 (2006)
Google Scholar
Ikeda, M., Ono, S., Sato, I., Yoshida, M., Nakagawa, H.: Person Name Disambiguation on the Web by Two-Stage Clustering. In: 2nd Web People Search Evaluation Workshop (WePS 2009), 18th WWW Conference (2009)
Google Scholar
Janruang, J., Kreesuradej, W.: A new web search result clustering based on true common phrase label discovery. In: Int. Conf. Comp. Intell. for Modelling, Control and Automation (2006)
Google Scholar
Mecca, G., Raunich, S., Pappalardo, A.: A new algorithm for clustering search results. Data & Knowledge Engineering 62(3), 504–522 (2007)
Article Google Scholar
Monz, C., Weerkamp, W.: A comparison of retrieval-based hierarchical clustering approaches to person name disambiguation. In: SIGIR 2009 (2009)
Google Scholar
On, B., Lee, I., Lee, D.: Scalable clustering methods for the name disambiguation problem. Knowledge and Information Systems 0219-1377, 1–23 (2011)
Google Scholar
Pantel, P., Pennacchiotti, M.: Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In: ACL 2006, pp. 113–120 (2006)
Google Scholar
Pilz, A., Paaß, G.: From names to entities using thematic context distance. In: CIKM 2011 (2011)
Google Scholar
Smucker, M., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: CIKM 2007 (2007)
Google Scholar
Weerkamp, W., Balog, K., de Rijke, M., Berendsen, R., Kovachev, B., Meij, E.: People searching for people: Analysis of a people search engine log. In: SIGIR 2011 (2011)
Google Scholar
Yoshida, M., Ikeda, M., Ono, S., Sato, I., Nakagawa, H.: Person name disambiguation by bootstrapping. In: SIGIR 2010, pp. 10–17. ACM (2010)
Google Scholar
Zhang, D., Dong, Y.: Semantic, Hierarchical, Online Clustering of Web Search Results. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 69–78. Springer, Heidelberg (2004)
Chapter Google Scholar
Zhu, D., Dreher, H.: Improving Web Search by Categorization, Clustering, and Personalization. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds.) ADMA 2008. LNCS (LNAI), vol. 5139, pp. 659–666. Springer, Heidelberg (2008)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

ISLA, University of Amsterdam, Science Park 904, 1098 XH, Amsterdam, The Netherlands
Richard Berendsen, Bogomil Kovachev, Evangelia-Paraskevi Nastou, Maarten de Rijke & Wouter Weerkamp

Authors

Richard Berendsen
View author publications
You can also search for this author in PubMed Google Scholar
Bogomil Kovachev
View author publications
You can also search for this author in PubMed Google Scholar
Evangelia-Paraskevi Nastou
View author publications
You can also search for this author in PubMed Google Scholar
Maarten de Rijke
View author publications
You can also search for this author in PubMed Google Scholar
Wouter Weerkamp
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Yahoo! Research, Diagonal 177, 08018, Barcelona, Spain
Ricardo Baeza-Yates & B. Barla Cambazoglu &
Centrum Wiskunde & Informatica, Science Park 123, Amsterdam, The Netherlands
Arjen P. de Vries
Websays, Nàpols 294 7-4, 08025, Barcelona, Spain
Hugo Zaragoza
Yahoo! Research, Diagnoal 177, 08018, Barcelona, Spain
Vanessa Murdock
Yahoo! Labs, Tower 3, Matam Park, 31905, Haifa, Israel
Ronny Lempel
ISTI-CNR, via G. Moruzzi, 1, 56124, Pisa, Italy
Fabrizio Silvestri

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Berendsen, R., Kovachev, B., Nastou, EP., de Rijke, M., Weerkamp, W. (2012). Result Disambiguation in Web People Search. In: Baeza-Yates, R., et al. Advances in Information Retrieval. ECIR 2012. Lecture Notes in Computer Science, vol 7224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28997-2_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-28997-2_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28996-5
Online ISBN: 978-3-642-28997-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics