Skip to main content

Learning-Based Pseudo-Relevance Feedback for Patent Retrieval

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNISA,volume 7356)

Abstract

Pseudo-relevance feedback (PRF) is an effective approach in Information Retrieval but unfortunately many experiments have shown that PRF is ineffective in patent retrieval. This is because the quality of initial results in the patent retrieval is poor and therefore estimating a relevance model via PRF often hurts the retrieval performance due to off-topic terms. We propose a learning to rank framework for estimating the effectiveness of a patent document in terms of its performance in PRF. Specifically, the knowledge of effective feedback documents on past queries is used to estimate effective feedback documents for new queries. This is achieved by introducing features correlated with feedback document effectiveness. We use patent-specific contents to define such features. We then apply regression to predict document effectiveness given the proposed features. We evaluated the effectiveness of the proposed method on the patent prior art search collection CLEF-IP 2010. Our experimental results show significantly improved retrieval accuracy over a PRF baseline which expands the query using all top-ranked documents.

Keywords

  • Patent Retrieval
  • Pseudo-Relevance Feedback
  • Query Modeling
  • Prior-art Search

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cao, G., Nie, J.-Y., Gao, J., Robertson, S.: Selecting good expansion terms for pseudo-relevance feedback. In: Proceedings of SIGIR, pp. 243–250 (2008)

    Google Scholar 

  2. Collins-Thompson, K.: Reducing the risk of query expansion via robust constrained optimization. In: Proceedings of CIKM, pp. 837–846 (2009)

    Google Scholar 

  3. Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Proceedings of SIGIR, pp. 299–306 (2002)

    Google Scholar 

  4. Dillon, J.V., Collins-Thompson, K.: A unified optimization framework for robust pseudo-relevance feedback algorithms. In: Proceedings of CIKM, pp. 1069–1078 (2010)

    Google Scholar 

  5. Friedman, J.H.: Stochastic gradient boosting. Computational Statistics and Data Analysis 38, 367–378 (1999)

    CrossRef  Google Scholar 

  6. Fujii, A.: Enhancing patent retrieval by citation analysis. In: Proceedings of SIGIR, pp. 793–794 (2007)

    Google Scholar 

  7. Ganguly, D., Leveling, J., Magdy, W., Jones, G.J.F.: Patent query reduction based on pseudo-relevant documents. In: Proceedings of CIKM, pp. 1953–1956 (2011)

    Google Scholar 

  8. He, B., Ounis, I.: Finding good feedback documents. In: Proceedings of CIKM, pp. 2011–2014 (2009)

    Google Scholar 

  9. Itoh, H., Mano, H., Ogawa, Y.: Term distillation in patent retrieval. In: Proceedings of the ACL 2003 Workshop on Patent Corpus Processing, pp. 41–45 (2003)

    Google Scholar 

  10. Iwayama, M., Fujii, A., Kando, N., Takano, A.: Overview of the third NTCIR workshop. In: Proceedings of the ACL 2003 Workshop on Patent Corpus Processing, pp. 24–32 (2003)

    Google Scholar 

  11. Keikha, M., Seo, J., Croft, W.B., Crestani, F.: Predicting document effectiveness in pseudo relevance feedback. In: Proceedings of CIKM, pp. 2061–2064 (2011)

    Google Scholar 

  12. Konishi, K.: Query terms extraction from patent document for invalidity search. In: Proceedings of NTCIR 2005 (2005)

    Google Scholar 

  13. Lavrenko, V., Croft, W.B.: Relevance-based language models. In: Proceedings of SIGIR, pp. 120–127 (2001)

    Google Scholar 

  14. Lv, Y., Zhai, C.: Positional language models for information retrieval. In: Proceedings of SIGIR, pp. 299–306 (2009)

    Google Scholar 

  15. Magdy, W., Jones, G.J.F.: PRES: A score metric for evaluating recall-oriented information retrieval applications. In: Proceedings of SIGIR, pp. 611–618 (2010)

    Google Scholar 

  16. Magdy, W., Leveling, J., Jones, G.J.F.: Exploring Structured Documents and Query Formulation Techniques for Patent Retrieval. In: Peters, C., Di Nunzio, G.M., Kurimo, M., Mostefa, D., Penas, A., Roda, G. (eds.) CLEF 2009. LNCS, vol. 6241, pp. 410–417. Springer, Heidelberg (2010)

    CrossRef  Google Scholar 

  17. Magdy, W., Lopez, P., Jones, G.J.F.: Simple vs. sophisticated approaches for patent prior-art search. In: Proceedings of ECIR, pp. 725–728 (2010)

    Google Scholar 

  18. Mahdabi, P., Keikha, M., Gerani, S., Landoni, M., Crestani, F.: Building Queries for Prior-Art Search. In: Hanbury, A., Rauber, A., de Vries, A.P. (eds.) IRFC 2011. LNCS, vol. 6653, pp. 3–15. Springer, Heidelberg (2011)

    CrossRef  Google Scholar 

  19. Piroi, F., Tait, J.: CLEF-IP 2010: Retrieval experiments in the intellectual property domain. In: Workshop of the Cross-Language Evaluation Forum, LABs and Workshops, Notebook Papers, CLEF 2010 (2010)

    Google Scholar 

  20. Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: Proceedings of SIGIR, pp. 232–241 (1994)

    Google Scholar 

  21. Takeuchi, H., Uramoto, N., Takeda, K.: Experiments on Patent Retrieval at NTCIR-5 Workshop (2005)

    Google Scholar 

  22. Xu, J., Croft, B.: Query expansion using local and global document analysis. In: Proceedings of SIGIR, pp. 4–11 (1996)

    Google Scholar 

  23. Xue, X., Croft, W.B.: Transforming patents into prior-art queries. In: Proceedings of SIGIR, pp. 808–809 (2009)

    Google Scholar 

  24. Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of SIGIR, pp. 334–342 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mahdabi, P., Crestani, F. (2012). Learning-Based Pseudo-Relevance Feedback for Patent Retrieval. In: Salampasis, M., Larsen, B. (eds) Multidisciplinary Information Retrieval. IRFC 2012. Lecture Notes in Computer Science, vol 7356. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31274-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31274-8_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31273-1

  • Online ISBN: 978-3-642-31274-8

  • eBook Packages: Computer ScienceComputer Science (R0)