Skip to main content
Log in

Measures for textual patent similarities: a guided way to select appropriate approaches

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

The measurement of textual patent similarities is crucial for important tasks in patent management, be it prior art analysis, infringement analysis, or patent mapping. In this paper the common theory of similarity measurement is applied to the field of patents, using solitary concepts as basic textual elements of patents. After unfolding the term ‘similarity’ in a content and formal oriented level and presenting a basic model of understanding, a segmented approach to the measurement of underlying variables, similarity coefficients, and the criteria-related profiles of their combinations is lined out. This leads to a guided way to the application of textual patent similarities, interesting both for theory and practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. In contrast to scientific papers the citation style differs in patents (see von Wartburg et al. 2005), which makes it harder to use the classical instruments of scientometrics mentioned by Small (1999). For actual studies often no backward citation information is available, therefore other forms of similarity have to be used.

  2. According to the different characterizations of similarity also the similarity coefficients are denominated with various terms, for example similarity coefficients are also named as resemblance measure (Batagelj and Bren 1995) or association coefficients (Sneath and Sokal 1973).

  3. As concept counts are numbers of non-negative values, no Gaussian distribution can be used. By this fact the Bravais–Pearson coefficient of correlation is not suitable for measuring the similarity of two patents. Neither a binomial distribution would be adequate, though fitting the condition of non-negative values, as the interval of numbers is right-opened in contrast to the closed set of a binomial distribution. A possible alternative is to view the concept counts as realizations of Poisson distributions as they fit all the criteria of it. Further calculations can then be made on the background of logit models. Another way is the use of Spearman’s coefficient of correlation as of course the numbers can be put into an ordered sequence. The advantage is that the exact distance between two entries does not need to be regarded. Spearman’s coefficient of correlation is the ordinal pendant of the metric Bravais-Pearson coefficient of correlation and so a suitable measure of similarity. Also Kendalls coefficient of correlation as a second potential measure should be mentioned at least.

  4. The above mentioned coefficients have already been adopted for different fields of application. For example, Qin (2000) shows the adaption of the cosine coefficient and the Jaccard coefficient for comparison of documents. Very early, Braam et al. (1988) have used these coefficients for the co-citation cluster analysis and Rip and Courtal (1984) have described the application for the construction of co-word maps.

  5. Sometimes the size may be homogenous within a set of patents (indicated by a low ratio between standard deviation and average), sometimes it may have the form of a normal distribution (indicated by middle ratio between standard deviation and average; additionally a test of goodness-of-fit should be applied), and also sometimes there may be some outliers (indicated by middle or high ratio between standard deviation and average).

  6. An interesting aspect in this step is the selection of concepts, as they are crucial in determining similarities between documents. Normally, such general concepts as “distribution”, “input member”, “power” or “speed”, “structure” are not appropriate for characterizing special common topics, if used alone. In a case like this, “power” e.g. should be used together with “transmitting” or “transmittance”. However, for comparing related patents commonly a set of such concepts is used, allowing also utilizing such general concepts.

References

  • Batagelj, V., & Bren, M. (1995). Comparing resemblance measures. Journal of Classification, 12(1), 73–90.

    Google Scholar 

  • Bergmann, I., Butzke, D., Walter, L., Fuerste, J. P., Moehrle, M. G., & Erdmann, V. A. (2008). Evaluating the risk of patent infringement by means of semantic patent analysis: The case of DNA chips. R&D Management, 38(5), 550–562.

    Google Scholar 

  • Bonino, D., Ciaramella, A., & Corno, F. (2009). Review of the state-of-the-art in patent information and forthcoming evolutions in intelligent patent informatics. World Patent Information doi:10.1016/j.wpi.2009.05.008 (in press).

  • Braam, R. R., Moed, H. F., & van Raan, A. F. J. (1988). Mapping of science: Critical elaboration and new approaches, a case study in agricultural biochemistry. In L. Egghe & R. Rousseau (Eds.), Infometrics 87/88. Amerstdam: Elsevier Science.

  • Brosius, F. (2006). SPSS 14. Heidelberg: Redline.

  • Burke, P. F., & Reitzig, M. (2007). Measuring patent assessment quality—analyzing the degree and kind of (in)consistency in patent offices’ decision making. Research Policy, 36(9), 1404–1430.

    Article  Google Scholar 

  • Daga, R., & Pandey, G. (2008). US-Patent application 2008/0162455 A1. Determination of document similarity.

  • Dehmer, M. (2005). Strukturelle Analyse Web-basierter Dokumente. Gabler: Wiesbaden.

    Google Scholar 

  • Dressler, A. (2006). Patente in technologieorientierten mergers und acquisitions. Wiesbaden: Deutscher Universitäts-Verlag.

    Google Scholar 

  • Gerken, J. M., & Moehrle, M. G. (2010). The evolution of torque transfer technology by use of computerized patent analysis. In Proceedings of the 20th CIRP design conference 2010, Nantes, France (accepted).

  • Gower, J. C., & Legendre, P. (1986). Metric and euclidean properties of dissimilarity coefficients. Journal of Classification, 3(1), 5–48.

    Article  MathSciNet  MATH  Google Scholar 

  • Hamers, L., Hemeryck, Y., Herweyers, G., & Janssen, M. (1989). Similarity measures in scientometric research: The Jaccard index versus Salton’s Cosine formula. Information Processing & Management, 25(3), 315–318.

    Article  Google Scholar 

  • Hippel, E.v. (1994). “Sticky information” and the locus of problem solving: Implications for innovation. Management Science, 40(4), 429–439.

    Article  Google Scholar 

  • Jeong, B., Lee, D., Cho, H., & Lee, J. (2008). A novel method for measuring semantic similarity for XML schema matching. Expert Systems with Applications, 34(3), 1651–1658.

    Article  Google Scholar 

  • Kanagasabai, R., & Pan, H. (2008). US-Patent 7,346,491 B2. Method of text similarity measurement.

  • Moehrle, M. G., & Geritz, A. (2007). Developing acquisition strategies based on patent maps. In M. H. Sherif & T. M. Khalil (Eds.), Management of technology: New directions in technology management (pp. 19–29) Amsterdam: Elsevier.

  • Moens, M.-F. (2006). Information extraction: Algorithms and prospects in a retrieval context. Berlin: Springer.

  • Park, J. (2005). Evolution of industry knowledge in the public domain: Prior art searching for software patents. SCRIPT-ed, 2(1), 47–70.

    Article  Google Scholar 

  • Peters, H. P. F., & van Raan, A. F. J. (1993a). Co-word-based science maps of chemical engineering. Part I: Representations by direct multidimensional scaling. Research Policy, 22(1), 23–46.

    Article  Google Scholar 

  • Peters, H. P. F., & van Raan, A. F. J. (1993b). Co-word-based science maps of chemical engineering. Part II: Representations by combined clustering and multidimensional scaling. Research Policy, 22(1), 47–71.

    Article  Google Scholar 

  • Philipp, M. (2006). Patent filing and searching: Is deflation in quality the inevitable consequence of hyperinflation in quantity? World Patent Information, 28(2), 117–121.

    Article  MathSciNet  Google Scholar 

  • Qin, J. (2000). Semantic similarities between a keyword database and a controlled vocabulary database: An investigation in the antibiotic resistance literature. Journal of the American Society for Information Science, 51(2), 166–180.

    Article  Google Scholar 

  • Ranganathan, A., & Ronen, R. (2008). US-Patent application 2008/0243809 A1. Information-theory based measure of similarity between instances in ontology.

  • Rip, A., & Courtal, P. (1984). Co-word maps of biotechnology: An example of cognitive scientometrics. Scientometrics, 6(6), 381–400.

    Article  Google Scholar 

  • Sepkoski, J. J. (1974). Quantified coefficients of association and measurement of similarity. Mathematical Geology, 6(2), 135–152.

    Article  Google Scholar 

  • Small, H. (1999). Visualizing science by citation mapping. Journal of the American Society for Information Science and Technology, 50(9), 799–813.

    Article  Google Scholar 

  • Small, H. (2003). Paradigms, citations, and maps of science: A personal history. Journal of the American Society for Information Science and Technology, 54(5), 394–399.

    Article  MathSciNet  Google Scholar 

  • Sneath, P. H. A., & Sokal, R. R. (1973). Numerical taxonomy. San Francisco: W. H. Freeman and Company.

    MATH  Google Scholar 

  • Sternitzke, C., & Bergmann, I. (2009). Similarity measures for document mapping: A comparative study on the level of an individual scientist. Scientometrics, 78(1), 113–130.

    Article  Google Scholar 

  • Trippe, A. J. (2003). Patinformatics: Tasks and tools. World Patent Information, 25(3), 211–221.

    Article  Google Scholar 

  • Tseng, Y. H., Lin, C. J., & Lin, Y. I. (2007). Text mining techniques for patent analysis. Information Processing and Management, 43(5), 1216–1247.

    Article  Google Scholar 

  • Vinkler, P. (1999). Short term and long term impact factors and similarities of chemistry journals represented by references. Scientometrics, 46(3), 621–633.

    Article  Google Scholar 

  • von Wartburg, I., Teichert, T., & Rost, K. (2005). Inventive progress measured by multi-stage patent citation analysis. Research Policy, 10, 1591–1607.

    Article  Google Scholar 

  • Wen, G., Jiang, L., & Shadbolt, R. (2006). Ontology-based similarity between text documents on manifold. In R. Mizoguchi, Z. Shi, & F. Giunchiglia (Eds.), ASWC 2006, LNCS 4185 (pp. 113–125). Berlin: Springer.

  • Yang, Y. Y., Akers, L., Klose, T., & Barcelon, Y. C. (2008). Text mining and visualization tools—impressions of emerging capabilities. World Patent Information, 30(4), 280–293.

    Google Scholar 

  • Yanhong, L., & Runhua, T. T. (2007). A text-mining-bases patent analysis in product innovative process. In Léon-Rvira, N. (Ed.), Trends in computer aided innovation (pp. 89-96). New York: Springer.

Download references

Acknowledgement

The author wishes to thank Dipl.-Wirt.-Ing. Jan Michael Gerken, research associate at IPMI, University of Bremen, for his constructive input to several drafts of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin G. Moehrle.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Moehrle, M.G. Measures for textual patent similarities: a guided way to select appropriate approaches. Scientometrics 85, 95–109 (2010). https://doi.org/10.1007/s11192-010-0243-3

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-010-0243-3

Keywords

Mathematics Subject Classification (2000)

JEL Classification

Navigation