Skip to main content

Evaluating Information Retrieval in the Intellectual Property Domain: The ClefIp Campaign

  • Chapter
Current Challenges in Patent Information Retrieval

Part of the book series: The Information Retrieval Series ((INRE,volume 29))

Abstract

The Clef–Ip track ran for the first time within the Clef 2009 campaign. The purpose of the track was twofold: (a) to encourage and facilitate research in the area of patent retrieval by providing a large clean data set for experimentation; (b) to create a large test collection of patents in the three main European languages for the evaluation of cross-lingual information access. The track focused on the task of prior art search, to which a second task was added in 2010, the patent classification task. The participating teams deployed a variety of Information Retrieval techniques, adapted or custom-made, to tackle with this specific domain and tasks. This chapter reports on activities undertaken to provide a set of topics for the two tasks, to extract the relevance assessments for the provided topics, and on evaluating the effectiveness of the employed retrieval methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    It is our direct experience that these explanations helped Ir researchers the most in understanding the relationships between the different kinds of patent documents constituting a patent.

  2. 2.

    For EP patents, documents at different stages have the same numeric identifier. For other patent offices this is not always the case. For example, the patent document US-6689545-B2 represents a US granted patent with its application document publication number US-2003011722-A1.

  3. 3.

    For a complete list of kind codes used by various patent offices see http://tinyurl.com/EPO-kindcodes.

  4. 4.

    See http://www.wipo.int/classifications/ipc/en/.

  5. 5.

    Although the Marec collection was created after the first Clef–Ip campaign was set up in 2009, the documents in the Clef–Ip’09 corpus are included in the Marec collection, and use the same Dtd.

  6. 6.

    http://www.wipo.int/pct/en/.

  7. 7.

    http://www.alfresco.com/.

  8. 8.

    http://docasu.sourceforge.net/.

  9. 9.

    trec–eval version 8.0 http://trec.nist.gov/trec_eval.

References

  1. Conference on Multilingual and Multimodal Information Access Evaluation (2010). http://clef2010.org/

  2. Cross Language Evaluation Forum. http://www.clef-campaign.org

  3. European Patent Convention (EPC). http://www.epo.org/patents/law/legal-texts. URL http://www.epo.org/patents/law/legal-texts/epc.html

  4. Fujii A, Iwayama M, Kando N (2007) Overview of the patent retrieval task at the NTCIR-6 workshop. In: Kando N, Evans DK (eds) Proceedings of the sixth NTCIR workshop meeting on evaluation of information access technologies: information retrieval, question answering, and cross-lingual information access. National Institute of Informatics, Tokyo, pp 359–365

    Google Scholar 

  5. Graf E, Azzopardi L (2008) A methodology for building a patent test collection for prior art search. In: Proceedings of the second international workshop on evaluating information access (EVIA)

    Google Scholar 

  6. Guidelines for Examination in the European Patent Office (2009). http://www.epo.org/patents/law/legal-texts/guidelines.html.

  7. Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst 20(4):422–446

    Article  Google Scholar 

  8. Magdy W, Jones GJF (2010) PRES: A score metric for evaluating recall-oriented information retrieval applications. In: SIGIR

    Google Scholar 

  9. NTCIR Project (2010) Evaluation of information access technologies research infrastructure for comparative evaluation of information retrieval and access technologies. http://research.nii.ac.jp/ntcir/index-en.html

  10. Peters C, Di Nunzio G, Kurimo M, Mostefa D, Penas A, Roda G (eds) (2010) Multilingual information access evaluation I. Text retrieval experiments. Lecture notes in computer science, vol 6241. Springer, Berlin

    Google Scholar 

  11. Piroi F, Tait J (2010) CLEF–IP 2010: Retrieval experiments in the intellectual property domain. Tech Rep IRF-TR-2010-0005, Information Retrieval Facility, Vienna, Austria. URL http://www.ir-facility.org/research/publications-reports/technical-reports/files/irf-tr-2010-00005.pdf

  12. Piroi F, Roda G, Zenz V (2009) CLEF-IP 2009 evaluation summary. Tech Rep IRF-TR-2009-00001, Information Retrieval Facility, Vienna, Austria. URL http://www.ir-facility.org/research/technical-reports/files/irf_tr_2009_00001.pdf

  13. Roda G, Tait J, Piroi F, Zenz V (2010) CLEF-IP 2009: Retrieval experiments in the intellectual property domain. In: Peters C, Di Nunzio G, Kurimo M, Mostefa D, Penas A, Roda G (eds) Multilingual information access evaluation I. Text retrieval experiments. Lecture notes in computer science, vol 6241. Springer, Berlin, pp 385–409. doi:10.1007/978-3-642-15754-7_47

    Chapter  Google Scholar 

  14. Suzan Verberne Eva D’hondt, NOCHK

    Google Scholar 

  15. Text Retrieval Conference. http://trec.nist.gov

  16. The MAtrixware REsearch Collection (2010). http://ir-facility.net/prototypes/marec/description/overview/

Download references

Acknowledgements

We thank Matrixware Information Systems GmbH for making available the patent corpus for this track, and for co-organizing the first evaluation campaign. We also thank Judy Hickey and Henk Tomas for sharing their know-how on prior art searches and patent life-cycles with us.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Florina Piroi .

Editor information

Editors and Affiliations

Appendix

Appendix

Table 4.2 Indexing the data. Below, x indicates a field is used, – not used, x! indicates special treatment and ? indicates a lack of information on field usage
Table 4.3 Query generation, retrieval systems, and ranking. Below, x indicates a field is used, - not used, x! indicates special treatment and ? indicates a lack of information on field usage
Table 4.4 Systems, methods and document fields used in the Classification task

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Piroi, F., Zenz, V. (2011). Evaluating Information Retrieval in the Intellectual Property Domain: The ClefIp Campaign. In: Lupu, M., Mayer, K., Tait, J., Trippe, A. (eds) Current Challenges in Patent Information Retrieval. The Information Retrieval Series, vol 29. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19231-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19231-9_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19230-2

  • Online ISBN: 978-3-642-19231-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics