Evaluation of IR Applications with Constrained Real Estate

  • Yuanhua Lv
  • Ariel Fuxman
  • Ashok K. Chandra
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8416)


Traditional IR applications assume that there is always enough space (“real estate”) available to display as many results as the system returns. Consequently, traditional evaluation metrics were typically designed to take a length cutoff k of the result list as a parameter. For example, one computes DCG@k, Prec@k, etc., based on the top-k results in the ranking list. However, there are important modern ranking applications where the result real estate is constrained to a small fixed space, such as the search verticals aggregated in the Web search results and the recommendation systems. For such applications, the following tradeoff arises: given a fixed amount of real estate, shall we show a small number of results with rich captions and details, or a larger number of results with less informative captions? In other words, there is a tradeoff between the length of the result list (i.e., quantity) and the informativeness of the results (i.e., quality). This tradeoff has important implications for evaluation metrics, since it leads the length cutoff k hard to be determined a priori. In order to tackle this problem, we propose two desirable formal constraints to capture the heuristics of regulating the quantity-quality tradeoff, inspired by the axiomatic approach to IR. We then present a general method to normalize the well-known Discounted Cumulative Gain (DCG) metric for balancing the quantity-quality tradeoff, yielding a new metric, that we call Length-adjusted Discounted Cumulative Gain (LDCG). LDCG is shown to be able to automatically balance the length and the informativeness of a ranking list without requiring an explicit parameter k, while still preserving the good properties of DCG.


Evaluation Aggregated Search Constrained Real Estate Quantity-Quality Tradeoff LDCG LNDCG 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arguello, J., Diaz, F., Callan, J., Carterette, B.: A methodology for evaluating aggregated search results. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 141–152. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  2. 2.
    Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: Proceedings of the 22nd International Conference on Machine Learning, ICML 2005, pp. 89–96 (2005)Google Scholar
  3. 3.
    Busin, L., Mizzaro, S.: Axiometrics: An axiomatic approach to information retrieval effectiveness metrics. In: Proceedings of the 4th International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory, ICTIR 2013 (2013)Google Scholar
  4. 4.
    Chapelle, O., Metlzer, D., Zhang, Y., Grinspan, P.: Expected reciprocal rank for graded relevance. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, pp. 621–630 (2009)Google Scholar
  5. 5.
    Chuklin, A., Schuth, A., Hofmann, K., Serdyukov, P., de Rijke, M.: Evaluating aggregated search using interleaving. In: Proceedings of the 22nd ACM International Conference on Conference on Information and Knowledge Management, CIKM 2013, pp. 669–678 (2013)Google Scholar
  6. 6.
    Craswell, N., Zoeter, O., Taylor, M., Ramsey, B.: An experimental comparison of click position-bias models. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, WSDM 2008, pp. 87–94 (2008)Google Scholar
  7. 7.
    Fang, H., Zhai, C.X.: An exploration of axiomatic approaches to information retrieval. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2005, pp. 480–487 (2005)Google Scholar
  8. 8.
    Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)CrossRefGoogle Scholar
  9. 9.
    Lv, Y., Zhai, C.: Lower-bounding term frequency normalization. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011, pp. 7–16 (2011)Google Scholar
  10. 10.
    Robertson, S.E., Kanoulas, E., Yilmaz, E.: Extending average precision to graded relevance judgments. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2010, pp. 603–610 (2010)Google Scholar
  11. 11.
    Zhou, K., Cummins, R., Lalmas, M., Jose, J.: Evaluating large-scale distributed vertical search. In: Proceedings of the 9th Workshop on Large-Scale and Distributed Informational Retrieval, LSDS-IR 2011, pp. 9–14 (2011)Google Scholar
  12. 12.
    Zhou, K., Cummins, R., Lalmas, M., Jose, J.M.: Evaluating aggregated search pages. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012, pp. 115–124 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Yuanhua Lv
    • 1
  • Ariel Fuxman
    • 1
  • Ashok K. Chandra
    • 1
  1. 1.Microsoft ResearchMountain ViewUSA

Personalised recommendations