Abstract
Prior-art search is an important task in patent retrieval. The success of this task relies upon the selection of relevant search queries. Typically terms for prior-art queries are extracted from the claim fields of query patents. However, due to the complex technical structure of patents, and presence of terms mismatch and vague terms, selecting relevant terms for queries is a difficult task. During evaluating the patents retrievability coverage of prior-art queries generated from query patents, a large bias toward a subset of the collection is experienced. A large number of patents either have a very low retrievability score or can not be discovered via any query. To increase the retrievability of patents, in this paper we expand prior-art queries generated from query patents using query expansion with pseudo relevance feedback. Missing terms from query patents are discovered from feedback patents, and better patents for relevance feedback are identified using a novel approach for checking their similarity with query patents. We specifically focus on how to automatically select better terms from query patents based on their proximity distribution with prior-art queries that are used as features for computing similarity. Our results show, that the coverage of prior-art queries can be increased significantly by incorporating relevant queries terms using query expansion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Azzopardi, L., Vinay, V.: Retrievability: an evaluation measure for higher order information access tasks. In: Proc. of CIKM 2008, Napa Valley, California, USA, October 26-30, pp. 561–570 (2008)
Bashir, S., Rauber, A.: Analyzing Document Retrievability in Patent Retrieval Settings. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA 2009. LNCS, vol. 5690, pp. 753–760. Springer, Heidelberg (2009)
Bashir, S., Rauber, A.: Improving retrievability of patents with cluster-based pseudo-relevance feedback documents selection. In: Proc. of CIKM 2009, Hong Kong, China, November 2-6, pp. 1863–1866 (2009)
Cao, G., Nie, J.-Y., Gao, J., Robertson, S.: Selecting good expansion terms for pseudo-relevance feedback. In: Proc. of SIGIR 2008, Singapore, pp. 243–250 (2008)
Cummins, R., O’Riordan, C.: Learning in a pairwise term-term proximity framework for information retrieval. In: Proc. of SIGIR 2009, Boston, MA, USA, pp. 251–258 (2009)
Custis, T., Al-Kofahi, K.: A new approach for evaluating query expansion: query-document term mismatch. In: Proc. of SIGIR 2007, Amsterdam, The Netherlands, July 23-27, pp. 575–582 (2007)
Fall, C.J., Torcsvari, A., Benzineb, K., Karetka, G.: Automated categorization in the international patent classification. ACM SIGIR Forum 37(1), 10–25 (Spring 2003)
Fujii, A.: Enhancing patent retrieval by citation analysis. In: Proc. of SIGIR 2007, Amsterdam, The Netherlands, pp. 793–794 (2007)
Itoh, H., Mano, H., Ogawa, Y.: Term distillation in patent retrieval. In: ACL 2003: Proceedings of the ACL-2003 workshop on Patent corpus processing, Sapporo, Japan, pp. 41–45 (2003)
Konishi, K.: Query terms extraction from patent document for invalidity search. In: Proc. of NTCIR 2005: NTCIR-5 Workshop Meeting, Tokyo, Japan (2005)
Konishi, K., Kitauchi, A., Takaki, T.: Invalidity patent search system at NTT data. In: Proc. of NTCIR-4 Workshop Meeting, Tokyo, Japan (2004)
Larkey, L.S.: A Patent Search and Classification System. In: Proc. of 4th ACM Conference on Digital Libraries, Berkeley, CA, USA, pp. 179–187 (1999)
Lavrenko, V., Croft, W.B.: Relevance based language models. In: Proc. of SIGIR 2001, New Orleans, Louisiana, USA, pp. 120–127 (2001)
Lee, K.S., Croft, W.B., Allan, J.: A cluster-based resampling method for pseudo-relevance feedback. In: Proc. of SIGIR 2008, Singapore, pp. 235–242 (2008)
Mase, H., Matsubayashi, T., Ogawa, Y., Iwayama, M., Oshio, T.: Proposal of two-stage patent retrieval method considering the claim structure. ACM Transactions on Asian Language Information Processing 4(2), 190–206 (2005)
Murata, M., Kanamaru, T., Shirado, T., Isahara, H.: Using the k-nearest neighbor method and SMART weighting in the patent document categorization subtask at NTCIR-6. In: Proc. NTCIR-6 Workshop Meeting, Tokyo, Japan (2007)
Osborn, M., Strzalkowski, T., Marinescu, M.: Evaluating Document Retrieval in Patent Database: A Preliminary Report. In: Proc. of CIKM 1997, Las Vegas, Nevada, USA, pp. 216–221 (1997)
Tao, T., Zhai, C.: An exploration of proximity measures in information retrieval. In: Proc. of SIGIR 2007, Amsterdam, The Netherlands, pp. 295–302 (2007)
Xue, X., Croft, W.B.: Transforming patents into prior-art queries. In: Proc. of SIGIR 2009, Boston, MA, USA, pp. 808–809 (2009)
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22(2), 179–214 (2004)
Zhao, J., Yun, Y.: A proximity language model for information retrieval. In: Proc. of SIGIR 2009, Boston, MA, USA, pp. 291–298 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bashir, S., Rauber, A. (2010). Improving Retrievability of Patents in Prior-Art Search. In: Gurrin, C., et al. Advances in Information Retrieval. ECIR 2010. Lecture Notes in Computer Science, vol 5993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12275-0_40
Download citation
DOI: https://doi.org/10.1007/978-3-642-12275-0_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12274-3
Online ISBN: 978-3-642-12275-0
eBook Packages: Computer ScienceComputer Science (R0)