Abstract
Aggregated search is the task of incorporating results from different specialized search services, or verticals, into Web search results. While most prior work focuses on deciding which verticals to present, the task of deciding where in the Web results to embed the vertical results has received less attention. We propose a methodology for evaluating an aggregated set of results. Our method elicits a relatively small number of human judgements for a given query and then uses these to facilitate a metric-based evaluation of any possible presentation for the query. An extensive user study with 13 verticals confirms that, when users prefer one presentation of results over another, our metric agrees with the stated preference. By using Amazon’s Mechanical Turk, we show that reliable assessments can be obtained quickly and inexpensively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arguello, J., Diaz, F., Callan, J., Crespo, J.-F.: Sources of evidence for vertical selection. In: SIGIR 2009, pp. 315–322. ACM, New York (2009)
Arguello, J., Diaz, F., Paiement, J.-F.: Vertical selection in the presence of unlabeled verticals. In: SIGIR 2010, pp. 691–698. ACM, New York (2010)
Carterette, B., Bennett, P.N., Chickering, D.M., Dumais, S.T.: Here or there: preference judgments for relevance. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 16–27. Springer, Heidelberg (2008)
Cohen, J.: A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1), 37–46 (1960)
Diaz, F.: Integration of news content into web results. In: WSDM 2009, pp. 182–191. ACM, New York (2009)
Diaz, F., Arguello, J.: Adaptation of offline vertical selection predictions in the presence of user feedback. In: SIGIR 2009, pp. 323–330. ACM, New York (2009)
Fleiss, J.: Measuring nominal scale agreement among many raters. Psychological Bulletin. 76(5), 378–382 (1971)
Kumar, R., Vassilvitskii, S.: Generalized distances between rankings. In: WWW 2010, pp. 571–580. ACM, New York (2010)
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977)
Li, X., Wang, Y.-Y., Acero, A.: Learning query intent from regularized click graphs. In: SIGIR 2008, pp. 339–346. ACM, New York (2008)
Sanderson, M., Paramita, M.L., Clough, P., Kanoulas, E.: Do user preferences and evaluation measures line up? In: SIGIR 2010, pp. 555–562. ACM, New York (2010)
Schulze, M.: A new monotonic, clone-independent, reversal symmetric, and condorcet-consistent single-winner election method. Social Choice and Welfare (July 2010)
Sushmita, S., Joho, H., Lalmas, M., Villa, R.: Factors affecting click-through behavior in aggregated search interfaces. In: CIKM 2010, pp. 519–528. ACM, New York (2010)
Thomas, P., Hawking, D.: Evaluation by comparing result sets in context. In: CIKM 2006, pp. 94–101. ACM, New York (2006)
Zhu, D., Carterette, B.: An analysis of assessor behavior in crowdsourced preference judgements. In: SIGIR Workshop on Crowdsourcing for Search Evaluation, pp. 21–26. ACM, New York (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Arguello, J., Diaz, F., Callan, J., Carterette, B. (2011). A Methodology for Evaluating Aggregated Search Results. In: Clough, P., et al. Advances in Information Retrieval. ECIR 2011. Lecture Notes in Computer Science, vol 6611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20161-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-20161-5_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20160-8
Online ISBN: 978-3-642-20161-5
eBook Packages: Computer ScienceComputer Science (R0)