Examining the Robustness of Evaluation Metrics for Patent Retrieval with Incomplete Relevance Judgements

Magdy, Walid; Jones, Gareth J. F.

doi:10.1007/978-3-642-15998-5_10

Walid Magdy²¹ &
Gareth J. F. Jones²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6360))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

662 Accesses
5 Citations

Abstract

Recent years have seen a growing interest in research into patent retrieval. One of the key issues in conducting information retrieval (IR) research is meaningful evaluation of the effectiveness of the retrieval techniques applied to task under investigation. Unlike many existing well explored IR tasks where the focus is on achieving high retrieval precision, patent retrieval is to a significant degree a recall focused task. The standard evaluation metric used for patent retrieval evaluation tasks is currently mean average precision (MAP). However this does not reflect system recall well. Meanwhile, the alternative of using the standard recall measure does not reflect user search effort, which is a significant factor in practical patent search environments. In recent work we introduce a novel evaluation metric for patent retrieval evaluation (PRES) [‎13]. This is designed to reflect both system recall and user effort. Analysis of PRES demonstrated its greater effectiveness in evaluating recall-oriented applications than standard MAP and Recall. One dimension of the evaluation of patent retrieval which has not previously been studied is the effect on reliability of the evaluation metrics when relevance judgements are incomplete. We provide a study comparing the behaviour of PRES against the standard MAP and Recall metrics for varying incomplete judgements in patent retrieval. Experiments carried out using runs from the CLEF-IP 2009 datasets show that PRES and Recall are more robust than MAP for incomplete relevance sets for this task with a small preference to PRES as the most robust evaluation metric for patent retrieval with respect to the completeness of the relevance set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aslam, J.A., Yilmaz, E.: Estimating average precision with incomplete and imperfect judgments. In: Proceedings of the 15th ACM international conference on Information and knowledge management CIKM, Arlington, Virginia, USA, pp. 102–111 (2006)
Google Scholar
Baeza-Yates, J., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Google Scholar
Bompad, T., Chang, C.-C., Chen, J., Kumar, R., Shenoy, R.: On the robustness of relevance measures with incomplete judgements. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands. ACM, New York (2007)
Google Scholar
Buckley, C., Voorhees, E.: Evaluating evaluation measure stability. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 33–40. ACM, New York (2000)
Google Scholar
Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, South Yorkshire, UK, pp. 25–32 (2004)
Google Scholar
Buckley, C., Dimmick, D., Soboroff, I., Voorhees, E.: Bias and the limits of pooling. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, USA, pp. 619–620. ACM, New York (2006)
Google Scholar
Fujii, A., Iwayama, M., Kando, N.: Overview of patent retrieval task at NTCIR-4. In: Proceedings of the Fourth NTCIR Workshop on Evaluation of Information Retrieval, Automatic Text Summarization and Question Answering, Tokyo, Japan (2004)
Google Scholar
Fujii, A., Iwayama, M., Kando, N.: Overview of the patent retrieval task at the NTCIR-6 workshop. In: Proceedings of the 6th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-lingual Information Access, Tokyo, Japan, pp. 359–365 (2007)
Google Scholar
Graf, E., Azzopardi, L.: A methodology for building a patent test collection for prior art search. In: Proceedings of The Second International Workshop on Evaluating Information Access (EVIA 2008), Tokyo, Japan (2008)
Google Scholar
Iwayama, M., Fujii, A., Kando, N., Takano, A.: Overview of patent retrieval task at NTCIR-3. In: Proceedings of the 3rd NTCIR Workshop on Evaluation of Information Retrieval, Automatic Text Summarization and Question Answering, Tokyo, Japan (2003)
Google Scholar
Kendall, M.: A new measure of rank correlation. Biometrika 30(1/2), 81–93 (1938)
Article MATH Google Scholar
Leong, M.K.: Patent Data for IR Research and Evaluation. In: Proceedings of the 2nd NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-lingual Information Access, Tokyo, Japan, pp. 359–365 (2001)
Google Scholar
Magdy, W., Jones, G.J.F.: PRES: a score metric for evaluating recall-oriented information retrieval applications. In: Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Geneva, Switzerland. ACM, New York (2010)
Google Scholar
Van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworths (1979)
Google Scholar
Robertson, S.E.: The parametric description of the retrieval tests. Part 2: Overall measures. Journal of Documentation 25(2), 93–107 (1969)
Google Scholar
Roda, G., Tait, J., Piroi, F., Zenz, V.: CLEF-IP 2009: Retrieval experiments in the Intellectual Property domain. In: CLEF 2009 Working Notes, Corfu, Greece (2009)
Google Scholar
Voorhees, E.M.: Special Issue: The Sixth Text REtrieval Conference (TREC-6). Information Processing and Management, 36(1) (2000)
Google Scholar
Voorhees, E.M.: Evaluation by highly relevant documents. In: Proceedings of the 24th ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, U.S.A., pp. 74–82 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Next Generation Localization, School of Computing, Dublin City University, Dublin 9, Ireland
Walid Magdy & Gareth J. F. Jones

Authors

Walid Magdy
View author publications
You can also search for this author in PubMed Google Scholar
Gareth J. F. Jones
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information Engineering, University of Padua, Via Gradenigo 6/a, 35131, Padova, Italy
Maristella Agosti
University of Padua, Padua, Italy
Nicola Ferro
ISTI-CNR, Area Ricerca CNR, Via Moruzzi, 1, 56124, Pisa, Italy
Carol Peters
ISLA, University of Amsterdam, Amsterdam, The Netherlands
Maarten de Rijke
Dublin City University, Dublin, Ireland
Alan Smeaton

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Magdy, W., Jones, G.J.F. (2010). Examining the Robustness of Evaluation Metrics for Patent Retrieval with Incomplete Relevance Judgements. In: Agosti, M., Ferro, N., Peters, C., de Rijke, M., Smeaton, A. (eds) Multilingual and Multimodal Information Access Evaluation. CLEF 2010. Lecture Notes in Computer Science, vol 6360. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15998-5_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-15998-5_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15997-8
Online ISBN: 978-3-642-15998-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics