Skip to main content

Aggregation of Multiple Judgments for Evaluating Ordered Lists

  • Conference paper
Advances in Information Retrieval (ECIR 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5993))

Included in the following conference series:

Abstract

Many tasks (e.g., search and summarization) result in an ordered list of items. In order to evaluate such an ordered list of items, we need to compare it with an ideal ordered list created by a human expert for the same set of items. To reduce any bias, multiple human experts are often used to create multiple ideal ordered lists. An interesting challenge in such an evaluation method is thus how to aggregate these different ideal lists to compute a single score for an ordered list to be evaluated. In this paper, we propose three new methods for aggregating multiple order judgments to evaluate ordered lists: weighted correlation aggregation, rank-based aggregation, and frequent sequential pattern-based aggregation. Experiment results on ordering sentences for text summarization show that all the three new methods outperform the state of the art average correlation methods in terms of discriminativeness and robustness against noise. Among the three proposed methods, the frequent sequential pattern-based method performs the best due to the flexible modeling of agreements and disagreements among human experts at various levels of granularity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lapata, M.: Probabilistic text structuring: experiments with sentence ordering. In: Proceedings of ACL 2003, pp. 545–552. Association for Computational Linguistics (2003)

    Google Scholar 

  2. Lapata, M.: Automatic evaluation of information ordering: Kendall’s tau. Comput. Linguist. 32(4), 471–484 (2006)

    Article  Google Scholar 

  3. Okazaki, N., Matsuo, Y., Ishizuka, M.: Improving chronological sentence ordering by precedence relation. In: Proceedings of COLING 2004, Morristown, NJ, USA, p. 750. Association for Computational Linguistics (2004)

    Google Scholar 

  4. Bollegala, D., Okazaki, N., Ishizuka, M.: A bottom-up approach to sentence ordering for multi-document summarization. In: Proceedings of ACL 2006, Morristown, NJ, USA, pp. 385–392. Association for Computational Linguistics (2006)

    Google Scholar 

  5. Bollegala, D., Okazaki, N., Ishizuka, M.: A machine learning approach to sentence ordering for multidocument summarization and its evaluation. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 624–635. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  6. Barzilay, R., Elhadad, N., McKeown, K.R.: Inferring strategies for sentence ordering in multidocument news summarization. Journal of Artificial Intelligence Research 17, 35–55 (2002)

    MATH  Google Scholar 

  7. Reidsma, D., op den Akker, R.: Exploiting ’subjective’ annotations. In: Proceedings of HumanJudge 2008, Morristown, NJ, USA, pp. 8–16. Association for Computational Linguistics (2008)

    Google Scholar 

  8. Wilson, T.: Annotating subjective content in meetings. In: Proceedings of LREC 2008, Marrakech, Morocco, European Language Resources Association, ELRA (2008), http://www.lrec-conf.org/proceedings/lrec2008/

  9. Beigman Klebanov, B., Beigman, E., Diermeier, D.: Analyzing disagreements. In: Proceedings of HumanJudge 2008, Manchester, UK, pp. 2–7. International Committee on Computational Linguistics (2008)

    Google Scholar 

  10. Passonneu, R., Lippincott, T., Yano, T., Klavans, J.: Relation between agreement measures on human labeling and machine learning performance: Results from an art history domain. In: Proceedings of LREC 2008, Marrakech, Morocco (2008)

    Google Scholar 

  11. Wiebe, J.M., Bruce, R.F., O’Hara, T.P.: Development and use of a gold-standard data set for subjectivity classifications. In: Proceedings of ACL 1999, Morristown, NJ, USA, pp. 246–253. Association for Computational Linguistics (1999)

    Google Scholar 

  12. Lang, J.: Vote and aggregation in combinatorial domains with structured preferences. In: Proceedings of IJCAI 2007, pp. 1366–1371. Morgan Kaufmann Publishers Inc, San Francisco (2007)

    Google Scholar 

  13. Dietrich, F., List, C.: Judgment aggregation by quota rules. Public Economics 0501005, EconWPA (2005)

    Google Scholar 

  14. Hartmann, S., Sprenger, J.: Judgment aggregation and the problem of tracking the truth (2008)

    Google Scholar 

  15. Drissi, M., Truchon, M.: Maximum likelihood approach to vote aggregation with variable probabilities. Technical report (2002)

    Google Scholar 

  16. Fox, E.A., Shaw, J.A.: Combination of multiple searches. In: TREC, pp. 243–252 (1993)

    Google Scholar 

  17. Lillis, D., Toolan, F., Collier, R., Dunnion, J.: Probfuse: a probabilistic approach to data fusion. In: Proceedings of SIGIR 2006, pp. 139–146. ACM, New York (2006)

    Chapter  Google Scholar 

  18. Efron, M.: Generative model-based metasearch for data fusion in information retrieval. In: Proceedings of JCDL 2009, pp. 153–162. ACM, New York (2009)

    Chapter  Google Scholar 

  19. Nenkova, A., Passonneau, R., McKeown, K.: The pyramid method: Incorporating human content selection variation in summarization evaluation. ACM Trans. Speech Lang. Process. 4(2), 4 (2007)

    Article  Google Scholar 

  20. Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 3–17. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  21. Zaki, M.J.: Spade: An efficient algorithm for mining frequent sequences. Mach. Learn. 42(1-2), 31–60 (2001)

    Article  MATH  Google Scholar 

  22. Pei, J., Han, J., Mortazavi-asl, B., Pinto, H., Chen, Q., Dayal, U.: Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of ICDE 2001, p. 215. IEEE Computer Society, Washington (2001)

    Google Scholar 

  23. Yan, X., Han, J., Afshar, R.: Clospan: Mining closed sequential patterns in large datasets. In: Proceedings of SDM 2003, pp. 166–177 (2003)

    Google Scholar 

  24. Barzilay, R., Elhadad, N., McKeown, K.R.: Sentence ordering in multidocument summarization. In: Proceedings of HLT 2001, Morristown, NJ, USA, pp. 1–7. Association for Computational Linguistics (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kim, H.D., Zhai, C., Han, J. (2010). Aggregation of Multiple Judgments for Evaluating Ordered Lists. In: Gurrin, C., et al. Advances in Information Retrieval. ECIR 2010. Lecture Notes in Computer Science, vol 5993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12275-0_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12275-0_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12274-3

  • Online ISBN: 978-3-642-12275-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics