Information Retrieval

, Volume 16, Issue 2, pp 179–209 | Cite as

Identifying top news using crowdsourcing

Crowd Sourcing

Abstract

The influential Text REtrieval Conference (TREC) retrieval conference has always relied upon specialist assessors or occasionally participating groups to create relevance judgements for the tracks that it runs. Recently however, crowdsourcing has been championed as a cheap, fast and effective alternative to traditional TREC-like assessments. In 2010, TREC tracks experimented with crowdsourcing for the very first time. In this paper, we report our successful experience in creating relevance assessments for the TREC Blog track 2010 top news stories task using crowdsourcing. In particular, we crowdsourced both real-time newsworthiness assessments for news stories as well as traditional relevance assessments for blog posts. We conclude that crowdsourcing not only appears to be a feasible, but also cheap and fast means to generate relevance assessments. Furthermore, we detail our experiences running the crowdsourced evaluation of the TREC Blog track, discuss the lessons learned, and provide best practices.

Keywords

Crowdsourcing TREC Blog Top news Relevance assessment 

References

  1. Al-Maskari, A., Sanderson, M., & Clough, P. (2008). Relevance judgments between TREC and non-TREC assessors. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 08).Google Scholar
  2. Alonso, O., & Baeza-Yates, R. (2011). Design and implementation of relevance assessments using crowdsourcing. Advances in Information Retrieval, 6611, 153–164. doi:10.1007/978-3-642-20161-5_16.CrossRefGoogle Scholar
  3. Alonso, O., & Mizzaro, S. (2009). Can we get rid of TREC assessors? Using Mechanical Turk for relevance assessment. In Proceedings of the ACM SIGIR 2009 workshop on the future of IR evaluation.Google Scholar
  4. Alonso, O., Rose, D., & Stewart, B. (2008). Crowdsourcing for relevance evaluation. SIGIR Forum, 42, 9–15. doi:10.1145/1480506.1480508.CrossRefGoogle Scholar
  5. Aslam, J., & Pavlu, V. (2007). A practical sampling strategy for efficient retrieval evaluation. Technical Report, North Eastern University.Google Scholar
  6. Atwood, J. (2010). Is Amazon’s Mechanical Turk a failure? http://www.codinghorror.com/blog/2007/04/is-amazons-mechanical-turk-a-failure.html. Accessed on June 2, 2010.
  7. Bailey, P., Craswell, N., Soboroff, I., Thomas, P., de Vries, A., & Yilmaz, E. (2008). Relevance assessment: Are judges exchangeable and does it matter? In Proceedings of the 31st annual international ACM SIGIR Conference on research and development in information retrieval (SIGIR 08).Google Scholar
  8. Callison-Burch, C. (2009). Fast, cheap, and creative: Evaluating translation quality using Amazon’s Mechanical Turk. In Proceedings of the 2009 conference on empirical methods in natural language processing, association for computational linguistics (EMNLP 2009).Google Scholar
  9. Downs, J., Holbrook, M., Sheng, S., & Cranor, L. (2010). Are your participants gaming the system? Screening Mechanical Turk workers. In Proceedings of the 2010 conference on human factors in computing systems (CHI 2010).Google Scholar
  10. Eickhoff, C., & de Vries, A. (2011). How crowdsourcable is your task? In Proceedings of the WSDM 2011 workshop on crowdsourcing for search and data mining (CSDM 2011).Google Scholar
  11. Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382. doi:10.1037/h0031619.CrossRefGoogle Scholar
  12. He, B., Macdonald, C., & Ounis, I. (2008). Retrieval sensitivity under training using different measures. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 08).Google Scholar
  13. Howe, J. (2008). Crowdsourcing: Why the power of the crowd is driving the future of business. Century.Google Scholar
  14. Howe, J. (2010). The rise of crowdsourcing. http://www.wired.com/wired/archive/14.06/crowds.html. Accessed on June 2, 2010.
  15. Ipeirotis, P. G. (2011). Crowdsourcing using Mechanical Turk: Quality management and scalability. In Proceedings of the WSDM 2011 workshop on crowdsourcing for search and data mining (CSDM 2011).Google Scholar
  16. Kittur, A., Chi, E. H., & Suh, B. (2008). Crowdsourcing user studies with Mechanical Turk. In Proceedings of the 2008 conference on human factors in computing systems (CHI 2008).Google Scholar
  17. Leidner, J. L. (2010). Thomson Reuters releases TRC2 news corpus through NIST. http://jochenleidner.posterous.com/thomson-reuters-releases-research-collection. Accessed on January 16, 2010.
  18. Le, J., Edmonds, A., Hester, V., & Biewald, L. (2010). Ensuring quality in crowdsourced search relevance evaluation. In Proceedings of the ACM SIGIR 2010 workshop on crowdsourcing for search evaluation (CSE 2010).Google Scholar
  19. Macdonald, C., Ounis, I., & Soboroff, I. (2010). Overview of the TREC 2010 Blog track. In Proceedings of the 19th text REtrieval conference (TREC 2010).Google Scholar
  20. Macdonald, C., Soboroff, I., & Ounis, I. (2009). Overview of TREC-2009 Blog track. In Proceedings of the 18th Text REtrieval Conference (TREC 2009).Google Scholar
  21. McCreadie, R., Macdonald, C., & Ounis, I. (2010). Crowdsourcing a news query classification dataset. In Proceedings of the ACM SIGIR 2010 workshop on crowdsourcing for search evaluation (CSE 2010).Google Scholar
  22. Snow, R., O’Connor, B., Jurafsky, D., & Ng, A. (2008). Cheap and fast—but is it good?: Evaluating non-expert annotations for natural language tasks. In Proceedings of the conference on empirical methods in natural language processing, association for computational linguistics (EMNLP 08).Google Scholar
  23. Sparck-Jones, K., & van Rijsbergen, C. J. (1975). Report on the need for and provision of an “ideal” judgements retrieval test collection. Technical Report 5266, British Library Research and Development Report, computer Laboratory, University of Cambridge.Google Scholar
  24. Voorhees, E. (2000). Variations in relevance judgments and the measurement of retrieval effectiveness. Information Processing & Management (IPM), 35(5), 697–716.CrossRefGoogle Scholar
  25. Voorhees, E. (2001). Evaluation by highly relevant documents. In Proceedings of the 24th annual international ACM SIGIR Cconference on research and development in information retrieval (SIGIR 01).Google Scholar
  26. Voorhees, E., & Harman, D. (2005). TREC: Experiment and evaluation in information retrieval. USA: MIT press.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Richard McCreadie
    • 1
  • Craig Macdonald
    • 1
  • Iadh Ounis
    • 1
  1. 1.University of GlasgowGlasgowUK

Personalised recommendations