Information Retrieval

, Volume 17, Issue 5–6, pp 452–470 | Cite as

Using query logs of USPTO patent examiners for automatic query expansion in patent searching

  • Wolfgang Tannebaum
  • Andreas Rauber
Information Retrieval in the Intellectual Property Domain


In the patent domain significant efforts are invested to assist researchers in formulating better queries, preferably via automated query expansion. Currently, automatic query expansion in patent search is mostly limited to computing co-occurring terms for the searchable features of the invention. Additional query terms are extracted automatically from patent documents based on entropy measures. Learning synonyms in the patent domain for automatic query expansion has been a difficult task. No dedicated sources providing synonyms for the patent domain, such as patent domain specific lexica or thesauri, are available. In this paper we focus on the highly professional search setting of patent examiners. In particular, we use query logs to learn synonyms for the patent domain. For automatic query expansion, we create term networks based on the query logs specifically for several USPTO patent classes. Experiments show good performance in automatic query expansion using these automatically generated term networks. Specifically, with a larger number of query logs for a specific patent US class available the performance of the learned term networks increases.


Patent searching Query expansion Query log analysis 


  1. Alberts, D., Yang, C., Fobare-DePonio, D., Koubek, K., Robins, S., Rodgers, M., Simmons, E., & De Marco, D. (2011). Introduction to patent searching. In M. Lupu, K. Mayer, J. Tait, & A. J. Trippe (Eds.), Current challenges in patent information retrieval. The information retrieval series (Vol. 29, pp. 3–43). Springer.Google Scholar
  2. Amitay, E., & Broder, A. (2008). Introduction to special issue on query log analysis: Technology and ethics. In ACM Trans. Web 2, Article 18.Google Scholar
  3. Azzopardi, L., Vanderbauwhede, W., & Joho, H. (2010). Search system requirements of patent analysts. In Proceeding of the 33rd international ACM SIGIR conference on research and development in information retrieval (SIGIR 2010). Geneva, Switzerland, pp. 775–776.Google Scholar
  4. Bashar, A., & Myaeng, S. (2011). Query phrase expansion using wikipedia in patent class search. In Proceedings of the 7th Asia conference on information retrieval technology (AIRS’11). Dubai, United Arab Emirates, pp. 115–126.Google Scholar
  5. Clough, P., & Berendt, B. (2009). Report on the treble CLEF query log analysis workshop 2009. SIGIR Forum, 43, 71–77.CrossRefGoogle Scholar
  6. De Marco, D. (2011). Plumbing the depths of examiner search (il)-logic: A patent searching perspective. Presentation given at PIUG 2011 Northeast Conference, New Brunswick (New Jersey), USA.
  7. Fujita, S. (2007). Technology survey and invalidity search: An comparative study of different tasks for Japanese patent document retrieval. Information Processing and Management, An International Journal, 42(5), 1154–1172.CrossRefGoogle Scholar
  8. Garside, R., & Smith, N. (1997). A hybrid grammatical tagger: CLAWS4. In R. Garside, G. Leech, & A. McEnery (Eds.), Corpus annotation: Linguistic information from computer text corpora (pp. 102–121). London: Longman.Google Scholar
  9. Hang, C., Ji-Rong, W., Jian-Yun, N., & Wei-Ying, M. (2002) Probabilistic query expansion using query logs. In Proceedings of the 11th international conference on world wide web (WWW 2002). Hawaii, USA, pp. 325–332.Google Scholar
  10. Herbert, B., Szarvas, G., & Gurevych, I. (2009). Prior art search using international patent classification codes and all-claims-queries. In Proceedings of the 10th cross-language evaluation forum conference on multilingual information access evaluation (CLEF 2009). Corfu, Greece, pp. 452–459.Google Scholar
  11. Hunt, D., Nyugen, L., & Rodgers, M. (2007). Patent searching: Tools & techniques. Hoboken: Wiley.Google Scholar
  12. Jochim, C., Lioma, C., Schütze, H. (2011). Expanding queries with term and phrase translations in patent retrieval. In Proceedings of the second international conference on multidisciplinary information retrieval facility (IRFC 2011). Vienna, Austria, pp. 16–29.Google Scholar
  13. Jochim, C., Lioma, C., Schütze, H., Koch, S., & Ertl, T. (2010). Preliminary study into query translation for patent retrieval. In Proceedings of the patent information retriveal workshop (PaIR 2011). Toronto, Canada, pp. 57–66.Google Scholar
  14. Jürgens, J., Hansen, P., & Womser-Hacker, C. (2012). Going beyond CLEF-IP: The ‘reality’ for patent searchers? In Proceedings of the third international conference of the CLEF initiative (CLEF 2012). Rome, Italy, pp. 30–35.Google Scholar
  15. Kato, M., Sakai, T., & Tanaka, K. (2013). When do people use query suggestion? A query suggestion log analysis. Information Retrieval, 16(6), 1–22.Google Scholar
  16. Konishi, K. (2005). Query terms extraction form patent documents for invalidity search. In Proceedings of NTCIR 2005: NTCIR-5 workshop meeting. Tokyo, Japan.Google Scholar
  17. Kunpeng, Z., Xiaolong, W., & Yuanchao, L. (2009). A new query expansion method based on query logs mining. International Journal on Asian Language Processing, 19, 1–12.Google Scholar
  18. Magdy, W., & Jones, G. J. F. (2011). A study of query expansion methods for patent retrieval. In Proceedings of PaIR 2011. Glasgow, Scotland, pp. 19–24.Google Scholar
  19. Mahdabi, P., & Crestani, F. (2011). Learning-based pseudo-relevance feedback for patent retrieval. In Proceedings of the second international conference on multidisciplinary information retrieval facility (IRFC 2011). Vienna, Austria, pp. 1–11.Google Scholar
  20. Mahdabi, P., Keikha, M., Gerani, S., Landoni, M., & Crestani, F. (2011). Building queries for prior-art search. In Proceedings of the second international conference on multidisciplinary information retrieval facility (IRFC 2011). Vienna, Austria, pp. 3–15.Google Scholar
  21. Miller, G. (1995). WordNet: A lexical database for english. Communications of the ACM, 38(11), 39–41.CrossRefGoogle Scholar
  22. Russo, D. 2011. Knowledge extraction from patent: Achievements and open problems. A multidisciplinary approach to find functions. In Proceedings of the 20th CIRP design conference (CIRP Design 2012). Nantes, France, pp. 567–576.Google Scholar
  23. Sekine, S., & Suzuki, H. (2007). Acquiring ontological knowledge from query logs. In Proceedings of the 16th international conference on world wide web (WWW 2007). Banff, Canada, pp. 1223–1224.Google Scholar
  24. Silvestri, F. (2010). Mining query logs: Turning search usage data into knowledge. Foundations and Trends in Information Retrieval, 4(1–2), 1–174.CrossRefzbMATHGoogle Scholar
  25. Tannebaum, W., & Rauber, A. (2012). Acquiring lexical knowledge from query logs for query expansion in patent searching. In The IEEE sixth international conference on semantic computing (IEEE ICSC 2012), Italy, Palermo, pp. 336–338.Google Scholar
  26. Tannebaum, W., & Rauber, A. (2012). Analyzing query logs of USPTO examiners to identify useful query terms in patent documents: A preliminary study. In Proceedings of the information retrieval facility conference (IRFC 2012). Vienna, Austria, pp. 127–136.Google Scholar
  27. Xue, X., & Croft, W. (2009). Automatic query generation for patent search. In Proceedings of CIKM 2009. Hong Kong, China, pp. 2037–2040.Google Scholar
  28. Xue, X., Croft, W. (2009). Transforming patents into prior-art queries. In Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. Boston, USA, pp 808–880.Google Scholar
  29. Zhang, J., Xiong, M., & Yu, Y. (2006). Mining query log to assist ontology learning from relational database. In Proceedings of the 8th Asia Pacific web conference (APWeb 2006). Harbin, China, pp. 437–448.Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Institute of Software Technology and Interactive SystemsVienna University of TechnologyViennaAustria

Personalised recommendations