Abstract
Patent retrieval is a branch of Information Retrieval (IR) that aims to enable the challenging task of retrieving highly technical and often complicated patents. Typically, patent granting bodies translate patents into several major foreign languages, so that language boundaries do not hinder their accessibility. Given such multilingual patent collections, we posit that the patent translations can be exploited for facilitating patent retrieval.
Specifically, we focus on the translation of patent queries from German and French, the morphology of which poses an extra challenge to retrieval. We compare two translation approaches that expand the query with (i) translated terms and (ii) translated phrases. Experimental evaluation on a standard CLEF-IP European Patent Office dataset reveals a novel finding: phrase translation may be more suited to French, and term translation may be more suited to German. We trace this finding to language morphology, and we conclude that tailoring the query translation per language can lead to improved results in patent retrieval.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Atkinson, K.H.: Toward a more rational patent search paradigm. In: 1st ACM Workshop on Patent IR, pp. 37–40 (2008)
Azzopardi, L., Vanderbauwhede, W., Joho, H.: Search system requirements of patent analysts. In: SIGIR, pp. 775–776 (2010)
Ballesteros, L., Croft, W.B.: Phrasal translation and query expansion techniques for cross-language information retrieval. In: SIGIR, pp. 84–91 (1997)
Bashir, S., Rauber, A.: Improving retrievability of patents with cluster-based pseudo-relevance feedback documents selection. In: CIKM, pp. 1863–1866 (2009)
Bashir, S., Rauber, A.: Improving retrievability of patents in prior-art search. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 457–470. Springer, Heidelberg (2010)
Braune, F., Fraser, A.: Improved unsupervised sentence alignment for symmetrical and asymmetrical parallel corpora. In: COLING (2010)
Chinnakotla, M.K., Raman, K., Bhattacharyya, P.: Multilingual prf: english lends a helping hand. In: SIGIR, pp. 659–666 (2010)
Croft, W.B., Lafferty, J.: Language Modeling for Information Retrieval. Kluwer Academic Publishers, Dordrecht (2003)
Fujii, A., Utiyama, M., Yamamoto, M., Utsuro, T.: Overview of the patent translation task at the NTCIR-7 workshop. In: NTCIR (2008)
Gao, W., Niu, C., Nie, J.-Y., Zhou, M., Wong, K.-F., Hon, H.-W.: Exploiting query logs for cross-lingual query suggestions. TOIS 28(2) (2010)
Jochim, C., Lioma, C., Schütze, H., Koch, S., Ertl, T.: Preliminary study into query translation for patent retrieval. In: PaIR, Toronto, Canada. ACM, New York (2010)
Kettunen, K.: Choosing the best MT programs for CLIR purposes – can MT metrics be helpful? In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 706–712. Springer, Heidelberg (2009)
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: ACL, pp. 177–180 (2007)
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: NAACL, pp. 48–54 (2003)
Larkey, L.S., Connell, M.E.: Structured queries, language modeling, and relevance modeling in cross-language information retrieval. Inf. Process. Manage. 41(3), 457–473 (2005), doi:10.1016/j.ipm.2004.06.008
Lavrenko, V., Croft, W.B.: Relevance-based language models. In: SIGIR, pp. 120–127 (2001)
Oard, D.W., Diekema, A.R.: Cross-language information retrieval. Annual Review of Information Science and Technology 33, 223–256 (1998)
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)
Roda, G., Tait, J., Piroi, F., Zenz, V.: CLEF-IP 2009: Retrieval experiments in the intellectual property domain. In: Peters, C., Di Nunzio, G.M., Kurimo, M., Mostefa, D., Penas, A., Roda, G. (eds.) CLEF 2009. LNCS, vol. 6241, pp. 385–409. Springer, Heidelberg (2010)
Tait, J. (ed.): 1st ACM Workshop on Patent IR (2008)
Tait, J. (ed.): 2nd ACM Workshop on Patent IR (2009)
Wang, J., Oard, D.W.: Combining bidirectional translation and synonymy for cross-language information retrieval. In: SIGIR, pp. 202–209 (2006)
Xue, X., Croft, W.B.: Automatic query generation for patent search. In: CIKM, pp. 2037–2040 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jochim, C., Lioma, C., Schütze, H. (2011). Expanding Queries with Term and Phrase Translations in Patent Retrieval. In: Hanbury, A., Rauber, A., de Vries, A.P. (eds) Multidisciplinary Information Retrieval. IRFC 2011. Lecture Notes in Computer Science, vol 6653. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21353-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-21353-3_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21352-6
Online ISBN: 978-3-642-21353-3
eBook Packages: Computer ScienceComputer Science (R0)