Abstract
It has been recently proposed to consider relevance assessment as a stochastic process where relevance judgements are modeled as binomial random variables and, consequently, evaluation measures become random evaluation measures, removing the distinction between binary and multi-graded evaluation measures.
In this paper, we adopt this stochastic view of relevance judgments and we investigate how this can be applied in the crowd-sourcing context. In particular, we show that injecting some randomness in the judgments by crowd assessors improves their correlation with the gold standard and we introduce a new merging approach, based on binomial random variables, which is competitive with respect to state-of-the-art at low numbers of merged assessors.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Alonso, O., Mizzaro, S.: Using crowdsourcing for TREC relevance assessment. Inf. Process. Manage. 48(6), 1053–1066 (2012)
Bashir, M., et al.: Northeastern university runs at the TREC12 crowdsourcing track. In: Voorhees, E.M., Buckland, L.P. (eds.) The Twenty-First Text REtrieval Conference Proceedings (TREC 2012). National Institute of Standards and Technology (NIST), Special Publication 500–298, Washington, USA (2013)
Buckley, C., Voorhees, E.M.: Retrieval system evaluation. In: Harman, D.K., Voorhees, E.M. (eds.) TREC. Experiment and Evaluation in Information Retrieval, pp. 53–78. MIT Press, Cambridge (2005)
Ferrante, M., Ferro, N., Maistro, M.: AWARE: exploiting evaluation measures to combine multiple assessors. ACM Trans. Inf. Syst. (TOIS) 36(2), 20:1–20:38 (2017)
Ferrante, M., Ferro, N., Pontarollo, S.: Modelling randomness in relevance judgments and evaluation measures. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 197–209. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_15
Hosseini, M., Cox, I.J., Milić-Frayling, N., Kazai, G., Vinay, V.: On aggregating labels from multiple crowd workers to infer relevance of documents. In: Baeza-Yates, R., et al. (eds.) ECIR 2012. LNCS, vol. 7224, pp. 182–194. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28997-2_16
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. (TOIS) 20(4), 422–446 (2002)
Kendall, M.G.: Rank Correlation Methods. Griffin, Oxford (1948)
King, I., Chen, K.T., Alonso, O., Larson, M.: Special issue: crowd in intelligent systems. ACM Trans. Intell. Syst. Technol. (TIST) 7(4), 77 (2016)
Lease, M., Yilmaz, E.: Crowdsourcing for information retrieval: introduction to the special issue. Inf. Retrieval 16(2), 91–100 (2013)
Marcus, A., Parameswaran, A.: Crowdsourced data management: industry and academic perspectives. Found. Trends Databases (FnTDB) 6(1–2), 1–161 (2015)
Moffat, A., Zobel, J.: Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. (TOIS) 27(1), 2:1–2:27 (2008)
Smucker, M.D., Kazai, G., Lease, M.: Overview of the TREC 2012 crowdsourcing track. In: Voorhees, E.M., Buckland, L.P. (eds.) The Twenty-First Text REtrieval Conference Proceedings (TREC 2012). National Institute of Standards and Technology (NIST), Special Publication 500–298, Washington, USA (2013)
Voorhees, E.M.: Variations in relevance judgments and the measurement of retrieval effectiveness. In: Croft, W.B., Moffat, A., van Rijsbergen, C.J., Wilkinson, R., Zobel, J. (eds.) Proceedings 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), pp. 315–323. ACM Press, New York (1998)
Voorhees, E.M.: Overview of the TREC 2004 robust track. In: Voorhees, E.M., Buckland, L.P. (eds.) The Thirteenth Text REtrieval Conference Proceedings (TREC 2004). National Institute of Standards and Technology (NIST), Special Publication 500–261, Washington, USA (2004)
Voorhees, E.M., Harman, D.K.: Overview of the eighth Text Retrieval Conference (TREC-8). In: Voorhees, E.M., Harman, D.K. (eds.) The Eighth Text REtrieval Conference (TREC-8), pp. 1–24. National Institute of Standards and Technology (NIST), Special Publication 500–246, Washington, USA (1999)
Webber, W., Chandar, P., Carterette, B.A.: Alternative assessor disagreement and retrieval depth. In: Chen, X., Lebanon, G., Wang, H., Zaki, M.J. (eds.) Proceedings of 21st International Conference on Information and Knowledge Management (CIKM 2012), pp. 125–134. ACM Press, New York (2012)
Yilmaz, E., Aslam, J.A., Robertson, S.E.: A new rank correlation coefficient for information retrieval. In: Chua, T.S., Leong, M.K., Oard, D.W., Sebastiani, F. (eds.) Proceedings of 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008), pp. 587–594. ACM Press, New York (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ferrante, M., Ferro, N., Losiouk, E. (2019). Stochastic Relevance for Crowdsourcing. In: Azzopardi, L., Stein, B., Fuhr, N., Mayr, P., Hauff, C., Hiemstra, D. (eds) Advances in Information Retrieval. ECIR 2019. Lecture Notes in Computer Science(), vol 11437. Springer, Cham. https://doi.org/10.1007/978-3-030-15712-8_50
Download citation
DOI: https://doi.org/10.1007/978-3-030-15712-8_50
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15711-1
Online ISBN: 978-3-030-15712-8
eBook Packages: Computer ScienceComputer Science (R0)