Advertisement

Stochastic Relevance for Crowdsourcing

  • Marco Ferrante
  • Nicola FerroEmail author
  • Eleonora Losiouk
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11437)

Abstract

It has been recently proposed to consider relevance assessment as a stochastic process where relevance judgements are modeled as binomial random variables and, consequently, evaluation measures become random evaluation measures, removing the distinction between binary and multi-graded evaluation measures.

In this paper, we adopt this stochastic view of relevance judgments and we investigate how this can be applied in the crowd-sourcing context. In particular, we show that injecting some randomness in the judgments by crowd assessors improves their correlation with the gold standard and we introduce a new merging approach, based on binomial random variables, which is competitive with respect to state-of-the-art at low numbers of merged assessors.

References

  1. 1.
    Alonso, O., Mizzaro, S.: Using crowdsourcing for TREC relevance assessment. Inf. Process. Manage. 48(6), 1053–1066 (2012)CrossRefGoogle Scholar
  2. 2.
    Bashir, M., et al.: Northeastern university runs at the TREC12 crowdsourcing track. In: Voorhees, E.M., Buckland, L.P. (eds.) The Twenty-First Text REtrieval Conference Proceedings (TREC 2012). National Institute of Standards and Technology (NIST), Special Publication 500–298, Washington, USA (2013)Google Scholar
  3. 3.
    Buckley, C., Voorhees, E.M.: Retrieval system evaluation. In: Harman, D.K., Voorhees, E.M. (eds.) TREC. Experiment and Evaluation in Information Retrieval, pp. 53–78. MIT Press, Cambridge (2005)Google Scholar
  4. 4.
    Ferrante, M., Ferro, N., Maistro, M.: AWARE: exploiting evaluation measures to combine multiple assessors. ACM Trans. Inf. Syst. (TOIS) 36(2), 20:1–20:38 (2017)CrossRefGoogle Scholar
  5. 5.
    Ferrante, M., Ferro, N., Pontarollo, S.: Modelling randomness in relevance judgments and evaluation measures. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 197–209. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-76941-7_15CrossRefGoogle Scholar
  6. 6.
    Hosseini, M., Cox, I.J., Milić-Frayling, N., Kazai, G., Vinay, V.: On aggregating labels from multiple crowd workers to infer relevance of documents. In: Baeza-Yates, R., et al. (eds.) ECIR 2012. LNCS, vol. 7224, pp. 182–194. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-28997-2_16CrossRefGoogle Scholar
  7. 7.
    Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. (TOIS) 20(4), 422–446 (2002)CrossRefGoogle Scholar
  8. 8.
    Kendall, M.G.: Rank Correlation Methods. Griffin, Oxford (1948)zbMATHGoogle Scholar
  9. 9.
    King, I., Chen, K.T., Alonso, O., Larson, M.: Special issue: crowd in intelligent systems. ACM Trans. Intell. Syst. Technol. (TIST) 7(4), 77 (2016)Google Scholar
  10. 10.
    Lease, M., Yilmaz, E.: Crowdsourcing for information retrieval: introduction to the special issue. Inf. Retrieval 16(2), 91–100 (2013)CrossRefGoogle Scholar
  11. 11.
    Marcus, A., Parameswaran, A.: Crowdsourced data management: industry and academic perspectives. Found. Trends Databases (FnTDB) 6(1–2), 1–161 (2015)Google Scholar
  12. 12.
    Moffat, A., Zobel, J.: Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. (TOIS) 27(1), 2:1–2:27 (2008)CrossRefGoogle Scholar
  13. 13.
    Smucker, M.D., Kazai, G., Lease, M.: Overview of the TREC 2012 crowdsourcing track. In: Voorhees, E.M., Buckland, L.P. (eds.) The Twenty-First Text REtrieval Conference Proceedings (TREC 2012). National Institute of Standards and Technology (NIST), Special Publication 500–298, Washington, USA (2013)Google Scholar
  14. 14.
    Voorhees, E.M.: Variations in relevance judgments and the measurement of retrieval effectiveness. In: Croft, W.B., Moffat, A., van Rijsbergen, C.J., Wilkinson, R., Zobel, J. (eds.) Proceedings 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), pp. 315–323. ACM Press, New York (1998)Google Scholar
  15. 15.
    Voorhees, E.M.: Overview of the TREC 2004 robust track. In: Voorhees, E.M., Buckland, L.P. (eds.) The Thirteenth Text REtrieval Conference Proceedings (TREC 2004). National Institute of Standards and Technology (NIST), Special Publication 500–261, Washington, USA (2004)Google Scholar
  16. 16.
    Voorhees, E.M., Harman, D.K.: Overview of the eighth Text Retrieval Conference (TREC-8). In: Voorhees, E.M., Harman, D.K. (eds.) The Eighth Text REtrieval Conference (TREC-8), pp. 1–24. National Institute of Standards and Technology (NIST), Special Publication 500–246, Washington, USA (1999)Google Scholar
  17. 17.
    Webber, W., Chandar, P., Carterette, B.A.: Alternative assessor disagreement and retrieval depth. In: Chen, X., Lebanon, G., Wang, H., Zaki, M.J. (eds.) Proceedings of 21st International Conference on Information and Knowledge Management (CIKM 2012), pp. 125–134. ACM Press, New York (2012)Google Scholar
  18. 18.
    Yilmaz, E., Aslam, J.A., Robertson, S.E.: A new rank correlation coefficient for information retrieval. In: Chua, T.S., Leong, M.K., Oard, D.W., Sebastiani, F. (eds.) Proceedings of 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008), pp. 587–594. ACM Press, New York (2008)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Marco Ferrante
    • 1
  • Nicola Ferro
    • 2
    Email author
  • Eleonora Losiouk
    • 1
  1. 1.Department of Mathematics “Tullio Levi-Civita”University of PaduaPaduaItaly
  2. 2.Department of Information EngineeringUniversity of PaduaPaduaItaly

Personalised recommendations