Abstract
Pseudo-relevance feedback (PRF) is an effective approach in Information Retrieval but unfortunately many experiments have shown that PRF is ineffective in patent retrieval. This is because the quality of initial results in the patent retrieval is poor and therefore estimating a relevance model via PRF often hurts the retrieval performance due to off-topic terms. We propose a learning to rank framework for estimating the effectiveness of a patent document in terms of its performance in PRF. Specifically, the knowledge of effective feedback documents on past queries is used to estimate effective feedback documents for new queries. This is achieved by introducing features correlated with feedback document effectiveness. We use patent-specific contents to define such features. We then apply regression to predict document effectiveness given the proposed features. We evaluated the effectiveness of the proposed method on the patent prior art search collection CLEF-IP 2010. Our experimental results show significantly improved retrieval accuracy over a PRF baseline which expands the query using all top-ranked documents.
Keywords
- Patent Retrieval
- Pseudo-Relevance Feedback
- Query Modeling
- Prior-art Search
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Cao, G., Nie, J.-Y., Gao, J., Robertson, S.: Selecting good expansion terms for pseudo-relevance feedback. In: Proceedings of SIGIR, pp. 243–250 (2008)
Collins-Thompson, K.: Reducing the risk of query expansion via robust constrained optimization. In: Proceedings of CIKM, pp. 837–846 (2009)
Cronen-Townsend, S., Zhou, Y., Croft, W.B.: Predicting query performance. In: Proceedings of SIGIR, pp. 299–306 (2002)
Dillon, J.V., Collins-Thompson, K.: A unified optimization framework for robust pseudo-relevance feedback algorithms. In: Proceedings of CIKM, pp. 1069–1078 (2010)
Friedman, J.H.: Stochastic gradient boosting. Computational Statistics and Data Analysis 38, 367–378 (1999)
Fujii, A.: Enhancing patent retrieval by citation analysis. In: Proceedings of SIGIR, pp. 793–794 (2007)
Ganguly, D., Leveling, J., Magdy, W., Jones, G.J.F.: Patent query reduction based on pseudo-relevant documents. In: Proceedings of CIKM, pp. 1953–1956 (2011)
He, B., Ounis, I.: Finding good feedback documents. In: Proceedings of CIKM, pp. 2011–2014 (2009)
Itoh, H., Mano, H., Ogawa, Y.: Term distillation in patent retrieval. In: Proceedings of the ACL 2003 Workshop on Patent Corpus Processing, pp. 41–45 (2003)
Iwayama, M., Fujii, A., Kando, N., Takano, A.: Overview of the third NTCIR workshop. In: Proceedings of the ACL 2003 Workshop on Patent Corpus Processing, pp. 24–32 (2003)
Keikha, M., Seo, J., Croft, W.B., Crestani, F.: Predicting document effectiveness in pseudo relevance feedback. In: Proceedings of CIKM, pp. 2061–2064 (2011)
Konishi, K.: Query terms extraction from patent document for invalidity search. In: Proceedings of NTCIR 2005 (2005)
Lavrenko, V., Croft, W.B.: Relevance-based language models. In: Proceedings of SIGIR, pp. 120–127 (2001)
Lv, Y., Zhai, C.: Positional language models for information retrieval. In: Proceedings of SIGIR, pp. 299–306 (2009)
Magdy, W., Jones, G.J.F.: PRES: A score metric for evaluating recall-oriented information retrieval applications. In: Proceedings of SIGIR, pp. 611–618 (2010)
Magdy, W., Leveling, J., Jones, G.J.F.: Exploring Structured Documents and Query Formulation Techniques for Patent Retrieval. In: Peters, C., Di Nunzio, G.M., Kurimo, M., Mostefa, D., Penas, A., Roda, G. (eds.) CLEF 2009. LNCS, vol. 6241, pp. 410–417. Springer, Heidelberg (2010)
Magdy, W., Lopez, P., Jones, G.J.F.: Simple vs. sophisticated approaches for patent prior-art search. In: Proceedings of ECIR, pp. 725–728 (2010)
Mahdabi, P., Keikha, M., Gerani, S., Landoni, M., Crestani, F.: Building Queries for Prior-Art Search. In: Hanbury, A., Rauber, A., de Vries, A.P. (eds.) IRFC 2011. LNCS, vol. 6653, pp. 3–15. Springer, Heidelberg (2011)
Piroi, F., Tait, J.: CLEF-IP 2010: Retrieval experiments in the intellectual property domain. In: Workshop of the Cross-Language Evaluation Forum, LABs and Workshops, Notebook Papers, CLEF 2010 (2010)
Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: Proceedings of SIGIR, pp. 232–241 (1994)
Takeuchi, H., Uramoto, N., Takeda, K.: Experiments on Patent Retrieval at NTCIR-5 Workshop (2005)
Xu, J., Croft, B.: Query expansion using local and global document analysis. In: Proceedings of SIGIR, pp. 4–11 (1996)
Xue, X., Croft, W.B.: Transforming patents into prior-art queries. In: Proceedings of SIGIR, pp. 808–809 (2009)
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: Proceedings of SIGIR, pp. 334–342 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mahdabi, P., Crestani, F. (2012). Learning-Based Pseudo-Relevance Feedback for Patent Retrieval. In: Salampasis, M., Larsen, B. (eds) Multidisciplinary Information Retrieval. IRFC 2012. Lecture Notes in Computer Science, vol 7356. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31274-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-31274-8_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31273-1
Online ISBN: 978-3-642-31274-8
eBook Packages: Computer ScienceComputer Science (R0)
