Skip to main content
Log in

Completing keyword patent search with semantic patent search: introducing a semiautomatic iterative method for patent near search based on semantic similarities

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Patent search is a substantial basis for many operational questions and scientometric evaluations. We consider it as a sequence of distinct stages. The “patent wide search” involves a definition of system boundaries by means of classifications and a keyword search producing a patent set with a high recall level (see Schmitz in Patentinformetrie: Analyse und Verdichtung von technischen Schutzrechtsinformationen, DGI, Frankfurt (Main), 2010 with an overview of searchable patent meta data). In this set of patents a “patent near search” takes place, producing a patent set with high(er) precision. Hence, the question arises how the researcher has to operate within this patent set to efficiently identify patents that contain paraphrased descriptions of the sought inventive elements in contextual information and whether this produces different results compared to a conventional search. We present a semiautomatic iterative method for the identification of such patents, based on semantic similarity. In order to test our method we generate an initial dataset in the course of a patent wide search. This dataset is then analyzed by means of the semiautomatic iterative method as well as by an alternative method emulating the conventional process of keyword refinement. It thus becomes obvious that both methods have their particular “raison d’être”, and that the semiautomatic iterative method seems to be able to support a conventional patent search very effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. See Appendix “Identified reference documents” with a list of the ten reference documents.

  2. To prove the robustness of our method, we compared its results with a relaxed approach (see Appendix “Robustness check”).

References

  • Aamodt, A., & Plaza, E. (1994). Case-based reasoning: foundational issues, methodological variations, and system approaches. AI Communications, 7(1), 39–59.

    Google Scholar 

  • Abercrombie, R. K., Udoeyop, A. W., & Schlicher, B. G. (2012). A study of scientometric methods to identify emerging technologies via modeling of milestones. Scientometrics, 91(2), 327–342.

    Article  Google Scholar 

  • Alberts, D., Yang, C. B., Fobare-DePonio, D., Koubek, K., Robins, S., Rodgers, M., et al. (2011). Introduction to patent searching. In M. Lupu, K. Mayer, J. Tait, & A. J. Trippe (Eds.), Current challenges in patent information retrieval (pp. 3–44). Heidelberg: Springer-Verlag.

    Chapter  Google Scholar 

  • Benson, C. L., & Magee, C. L. (2013). A hybrid keyword and patent class methodology for selecting relevant sets of patents for a technological field. Scientometrics, 96(1), 69–82.

    Article  Google Scholar 

  • Breitzmann, A., & Thomas, P. (2002). Using patent citation analysis to target/value M&A candidates. Research Technology Management, 45(5), 28–36.

    Google Scholar 

  • Carpineto, C., & Romano, G. (2012). A survey of automatic query expansion in information retrieval. ACM Computing Surveys, 44(1), 1–50.

    Article  Google Scholar 

  • Carterette, B., & Voorhees, E. M. (2013). Overview of information retrieval evaluation. In M. Lupu, K. Mayer, J. Tait, & A. J. Trippe (Eds.), Current challenges in patent information retrieval (pp. 69–86). Springer: Berlin.

    Google Scholar 

  • Cascini, G., Fantechi, A., & Spinicci, E. (2004). Natural language processing of patents and technical documentation. Lecture Notes in Computer Science, 3163, 508–520.

    Article  Google Scholar 

  • Cascini, G., & Zini, M. (2008). Measuring patent similarity by comparing inventions functional trees. Computer-Aided Innovation (CAI), 277, 31–42.

    Article  Google Scholar 

  • Choi, S., Yoon, J., Kim, K., Lee, J. Y., & Kim, C. (2011). SAO network analysis of patents for technology trends identification: A case study of polymer electrolyte membrane technology in proton exchange membrane fuel cells. Scientometrics, 88(3), 863–883.

    Article  Google Scholar 

  • DIN Deutsches Institut für Normung e.V. (2010). DIN SPEC 1060: Dienstleistungsqualität im intellectual property management. Berlin: Beuth Verlag.

    Google Scholar 

  • Dirnberger, D. (2011). A guide to efficient keyword, sequence and classification search strategies for biopharmaceutical drug-centric patent landscape searches—A human recombinant insulin patent landscape case study. World Patent Information, 33, 128–143.

    Article  Google Scholar 

  • Dixon, R. M. W. (1992). A new approach to English grammar, on semantiv principles. Oxford: Oxford University Press.

    Google Scholar 

  • Dong, H., Hussain, F. K., & Chang, H. (2011). A context-aware semantic similarity model for ontology environments. Concurrency and Computation: Practice and Experience, 23(5), 505–524.

    Article  Google Scholar 

  • Ernst, H. (2001). Patent applications and subsequent changes of performance: Evidence from time-series cross-section analysis on the firm level. Research Policy, 30, 143–157.

    Article  MathSciNet  Google Scholar 

  • Ervilia, F. T., & Herstatt, C. (2009). Exploring the relation of patent ownership and market success—Cases from the LCD flat panel display industry. International Journal of Technology Intelligence and Planning, 5(1), 90–109.

    Article  Google Scholar 

  • Field, A. (2009). Discovering statistics using SPSS. London et al.: Sage Publications.

    Google Scholar 

  • Gambardella, A., & McGahan, A. M. (2010). Business-model innovation: General purpose technologies and their implications for industry structure. Long Range Planning, 43, 262–271.

    Article  Google Scholar 

  • Große, D., Fey, G., & Drechsler, R. (2007). SATRIX: Algorithmen für Boolsche Erfüllbarkeit. Herzogenrath: Shaker Verlag.

    Google Scholar 

  • Harhoff, D., et al. (2003). Citations, family size, opposition and the value of patent rights. Research Policy, 32, 1343–1363.

    Article  Google Scholar 

  • Jang, S.-L., Yu, Y.-C., & Wang, T.-Y. (2011). Emerging firms in an emerging field: An analysis of patent citations in electronic-paper display technology. Scientometrics, 89(1), 259–272.

    Article  Google Scholar 

  • Kim, Y., Suh, J., & Park, S. (2008). Visualization of patent analysis for emerging technology. Expert Systems with Applications, 34(3), 1804–1812.

    Article  Google Scholar 

  • Krause, J. (Ed.). (1987). Inhaltserschließung von Massendaten: Zur Wirksamkeit informationslinguistischer Verfahren am Beispiel des Deutschen Patentinformationssystems. Hildesheim: Georg Olms.

    Google Scholar 

  • Lee, S. (2013). Linking technology roadmapping to patent analysis. In M. G. Moehrle, R. Isenmann, & R. Phaal (Eds.), Technology roadmapping for strategy and innovation (pp. 267–284). Berlin: Springer.

    Chapter  Google Scholar 

  • Mayring, P. (2003). Qualitative inhaltsanalyse. Grundlagen und Techniken. Weinheim: Beltz.

  • Mitchell, M., & Jolley, J. M. (2012). Research design explained (8th ed.). Wadsworth: Cengage Learning Emea.

    Google Scholar 

  • Moehrle, M. G. (2010). Measures for textual patent similarities: A guided way to select appropriate approaches. Scientometrics, 85(1), 95–109.

    Article  Google Scholar 

  • Moehrle, M. G., & Gerken, J. (2012). Measuring textual patent similarity on basis of combined concepts: Design decisions and their consequences. Scientometrics, 91, 805–826.

    Article  Google Scholar 

  • Moehrle, M. G., & Walter, L. (2009). Patentierung von Geschäftsprozessen. Monitoring—strategien—schutz. Berlin: Springer.

    Book  Google Scholar 

  • Moehrle, M. G., et al. (2010). Patinformatics as a business process: A guideline through patent research task and tools. World Patent Information, 32, 291–299.

    Article  Google Scholar 

  • Mogee, M., & Breitzmann, A. (2002). The many applications of patent analysis. Journal of Information Science, 28, 187–205.

    Article  Google Scholar 

  • Moskovkin, V. M., Shigorina, N. A., & Popov, D. (2012). The possibility of using google patents search tool in patentometric analysis. Scientific and Technical Information Processing, 39, 107–112.

    Article  Google Scholar 

  • Niemann, H., Moehrle, M. G., & Walter, L. (2013). The development of business method patenting in the logistics industry—Insights from the case of intelligent sensor networks. International Journal of Technology Management, 61(2), 177–197.

    Article  Google Scholar 

  • Nijhof, E. (2007). Subject analysis and search strategies—Has the searcher become the bottleneck in the search process? World Patent Information, 29(1), 20–25.

    Article  Google Scholar 

  • Salton, G. (1988). A simple blueprint for automatic Boolean query processing. Information Processing and Management, 24(3), 269–280.

    Article  Google Scholar 

  • Salton, G., & McGill, M. J. (1988). Information retrieval. Grundlegendes für Informationswissenschaftler. Hamburg: McGraw-Hill Book Company GmbH.

    Google Scholar 

  • Sánchez, D., Batet, M., Isern, D., & Valls, A. (2012). Ontology-based semantic similarity: A new feature-based approach. Expert Systems with Applications, 39(9), 7718–7728.

    Article  Google Scholar 

  • Schmitz, J. (2010). Patentinformetrie: Analyse und Verdichtung von technischen Schutzrechtsinformationen. Frankfurt (Main): DGI.

    Google Scholar 

  • Stefanov, V., & Tait, J. I. (2011). An introduction to contemporary search technology. In M. Lupu, K. Mayer, J. Tait, & A. J. Trippe (Eds.), Current challenges in patent information retrieval (pp. 45–68). Berlin: Springer-Verlag.

    Chapter  Google Scholar 

  • Stock, W. G. (2007). Information retrieval: Informationen suchen und finden. München: Oldenbourg.

    Google Scholar 

  • Teece, D. J. (2010). Business models, business strategy and innovation. Long Range Planning, 43, 172–194.

    Article  Google Scholar 

  • Tinsley, H. E. A., & Weiss, D. J. (2000). Interrater reliability and agreement. In H. E. A. Tinsley & S. D. Brown (Eds.), Handbook of applied multivariate statistics and mathematical modeling (pp. 95–124). San Diego: Academic Press.

    Chapter  Google Scholar 

  • Trajtenberg, M., et al. (1997). University versus corporate patents: A window on the basicness of invention. Economics of Innovation and New Technology, 5, 19–50.

    Article  Google Scholar 

  • Trippe, A. J. (2003). Patinformatics: Tasks and tools. World Patent Information, 25(3), 211–221.

    Article  Google Scholar 

  • Van der Drift, J. (1991). Effective strategies for searching existing patent rights. World Patent Information, 13, 67–71.

    Article  Google Scholar 

  • Verhaegen, P. A., et al. (2011). Searching for similar products through patent analysis. Procedia Engineering, 9, 431–441.

    Article  Google Scholar 

  • von Proff, S., & Dettmann, A. (2012). Inventor collaboration over distance: A comparison of academic and corporate patents. Scientometrics,. doi:10.1007/s11192-012-0812-8:1-22.

    Google Scholar 

  • von Wartburg, I., et al. (2005). Inventive progress measured by multi-stage patent citation analysis. Research Policy, 34, 1591–1607.

    Article  Google Scholar 

  • Yin, R. K. (2009). Case study research (4th ed.). Thousand Oaks et al.: Sage.

    Google Scholar 

  • Yoon, J., & Kim, K. (2011). Identifying rapidly evolving technological trends for R&D planning using SAO-based semantic patent networks. Scientometrics, 88(1), 213–228.

    Article  Google Scholar 

  • Yoon, B., & Park, Y. (2004). A text-mining-based patent network: Analytical tool for high-technology trend. The Journal of High Technology Management Research, 15(1), 37–50.

    Article  Google Scholar 

Download references

Acknowledgments

The authors wish to thank Dr. Lothar Walter and Dr. Jan M. Gerken for their critical comments and constructive ideas, particularly regarding the keyword search emulating method. We also would like to acknowledge the contributions of two anonymous reviewers, whose suggestions helped enhancing this paper’s readability and general quality.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ansgar Moeller.

Appendix

Appendix

Identified reference documents

Table 9.

Table 9 Reference documents identified by means of the semi-automatic iterative method and/or the keyword search emulating method

Robustness check

Analogous to experimental design we create an additional procedure. We interpret the ranking of patents in our semi-automatic iterative method as treatment for an “experimental group” and define a second procedure without treatment for a “control group” (see Mitchell and Jolley 2012 for the design of experiments). In other words, the second procedure represents a relaxation of our method, neglecting the ranking of the dataset created by similarity measurement. For this purpose pure coincidence was chosen as a comparative value for the qualitative evaluation of the presented method. This is based on the assumption that as many patents are randomly reviewed as were marked during the examination of the dataset by means of the presented method. In the course of the semiautomatic iterative method, the review of the semantically most similar documents produced an average quantity of 63 marked documents. If the same quantity were reviewed randomly, the resulting average recall would amount to 63 % (see Table 10). The precision of this purely coincidental approach can be calculated by means of the given formula but would merely represent the share of relevant documents in proportion to the total number of documents (in this case 36 %). Thus, the semiautomatic iterative method presented here proves to be robust, as it is unrestrictedly superior to the random selection of patents in terms of recall and precision.

Table 10 Comparison between the results of the semi-automatic iterative method and the comparative value coincidence regarding the indicators precision and recall

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moeller, A., Moehrle, M.G. Completing keyword patent search with semantic patent search: introducing a semiautomatic iterative method for patent near search based on semantic similarities. Scientometrics 102, 77–96 (2015). https://doi.org/10.1007/s11192-014-1446-9

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-014-1446-9

Keywords

Navigation