Skip to main content

Evaluation of Chemical Information Retrieval Tools

  • Chapter
Current Challenges in Patent Information Retrieval

Part of the book series: The Information Retrieval Series ((INRE,volume 29))

Abstract

It has been noted before in this book that patent retrieval is different from, and more complicated than “standard” information retrieval. Evaluation of patent retrieval engines has also been shown to require specific attention. In this chapter, we continue making this point, but emphasize the efforts undertaken in a specific domain, namely chemistry. We approached this issue from two different perspectives. First, there is the issue of scalability. Largely similar to the CLEF-IP efforts, it targets the problem of having to handle a large number of documents and, potentially, a large number of queries. Second, there are the issues generated by the specific characteristics of chemistry documents. We describe here how we manually created a set of topics to reflect the kind of requests for information that a patent searcher, or a general researcher, might have. The results of the first year’s track are presented as well, together with directions and desiderata for the next years.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

  1. 1.

    http://journals.iucr.org/.

  2. 2.

    http://www.hindawi.com/.

  3. 3.

    http://ukcatalogue.oup.com/.

  4. 4.

    http://www.mdpi.org/.

  5. 5.

    The MAREC DTDs are publicly available together with the MAREC data, conditioned on the signing of a license agreement.

  6. 6.

    To note that we refer to as a ‘topic’ what we give the participants, and as a ‘query’ what they actually put into their system to obtain results.

References

  1. Cetintas S, Si L (2009) Strategies for effective chemical information retrieval. In: Proc of TREC

    Google Scholar 

  2. Gobeill J, Teodoro D, Patsche E, Ruch P (2009) Report on the TREC 2009 experiments: Chemical IR track. In: Proc of TREC

    Google Scholar 

  3. Gurulingappa H, Müller B, Klinger R, Mevissen HT, Hofmann-Apitius M, Fluck J, Friedrich C (2009) Patent retrieval in chemistry based on semantically tagged named entities. In: Proc. of TREC

    Google Scholar 

  4. Hersh W, Voorhees E (2008) TREC genomics special issue overview. Inf Retr

    Google Scholar 

  5. Hirschman L, Yeh A, Blaschke C, Valencia A (2005) Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinf 6(S1)

    Google Scholar 

  6. Jin S, Ye Z, Lin H (2009) DUTIR at TREC 2009: Chemical IR track. In: Proc of TREC

    Google Scholar 

  7. Jones KS (1981) Information retrieval experiment. Butterworths, Stoneham

    Google Scholar 

  8. Lupu M, Piroi F, Huang J, Zhu J, Tait J (2009) Overview of the TREC chemical IR track. In: Proc of TREC

    Google Scholar 

  9. Lupu M, Huang J, Zhu J, Tait J TREC chemical information retrieval—an evaluation effort for chemical IR systems. World Pat Inf, to appear

    Google Scholar 

  10. Lupu M, Piroi F, Hanbury A (2010) Aspects and analysis of patent test collections. In: Proc of PaIR

    Google Scholar 

  11. Mejova Y, Thuc VH, Foster S, Harris C, Arens B, Srinivasan P (2009) TREC blog and TREC chem: a view from the corn fields. In: Proc of TREC

    Google Scholar 

  12. Pubmed central. http://www.ncbi.nlm.nih.gov/pmc/

  13. Soboroff I (2010) Test collection diagnosis and treatment. In: Proc of EVIA

    Google Scholar 

  14. Urbain J (2009) TREC chemical IR track 2009: a distributed dimensional indexing model for chemical patent search. In: Proc of TREC

    Google Scholar 

  15. Voorhees E, Harman D (eds) (2005) TREC experiment and evaluation in information retrieval. MIT Press, Cambridge

    Google Scholar 

  16. Yilmaz E, Kanoulas E, Aslam JA (2008) A simple and efficient sampling method for estimating AP and NDCG. In: SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 603–610. http://doi.acm.org/10.1145/1390334.1390437

    Google Scholar 

  17. Zhao J, Huang X, Ye Z, Zhu J (2009) York University at TREC 2009: Chemical track. In: Proc of TREC

    Google Scholar 

  18. Zhao L, Callan J (2009) Formulating simple structured queries using temporal and distributional cues in patents. In: Proc of TREC

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank the NIST TREC organizers for supporting this evaluation campaign, Matrixware Information Services GmBH for the patent corpus, Richard Kidd from the Royal Society of Chemistry for providing the initial collection of scientific articles, and all the other editors of the journals that have provided articles in the second year campaign. Last, but certainly not least, the authors express their gratitude to the domain experts who volunteered to provide the manual topics and to evaluate the results of the participants: Teresa Loughbrough, Henk Tomas, Monika Hanelt, Anthony Trippe, Madeleine Marley and her team, and Carlos Faerman.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mihai Lupu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Lupu, M., Huang, J., Zhu, J. (2011). Evaluation of Chemical Information Retrieval Tools. In: Lupu, M., Mayer, K., Tait, J., Trippe, A. (eds) Current Challenges in Patent Information Retrieval. The Information Retrieval Series, vol 29. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19231-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19231-9_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19230-2

  • Online ISBN: 978-3-642-19231-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics