Measures for textual patent similarities: a guided way to select appropriate approaches

Moehrle, Martin G.

doi:10.1007/s11192-010-0243-3

Measures for textual patent similarities: a guided way to select appropriate approaches

Published: 29 May 2010

Volume 85, pages 95–109, (2010)
Cite this article

Scientometrics Aims and scope Submit manuscript

Martin G. Moehrle¹

1369 Accesses
52 Citations
Explore all metrics

Abstract

The measurement of textual patent similarities is crucial for important tasks in patent management, be it prior art analysis, infringement analysis, or patent mapping. In this paper the common theory of similarity measurement is applied to the field of patents, using solitary concepts as basic textual elements of patents. After unfolding the term ‘similarity’ in a content and formal oriented level and presenting a basic model of understanding, a segmented approach to the measurement of underlying variables, similarity coefficients, and the criteria-related profiles of their combinations is lined out. This leads to a guided way to the application of textual patent similarities, interesting both for theory and practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How to design bibliometric research: an overview and a framework proposal

Article Open access 06 March 2024

Qualitative Text Analysis: A Systematic Approach

Factors affecting number of citations: a comprehensive review of the literature

Article 15 February 2016

Notes

In contrast to scientific papers the citation style differs in patents (see von Wartburg et al. 2005), which makes it harder to use the classical instruments of scientometrics mentioned by Small (1999). For actual studies often no backward citation information is available, therefore other forms of similarity have to be used.
According to the different characterizations of similarity also the similarity coefficients are denominated with various terms, for example similarity coefficients are also named as resemblance measure (Batagelj and Bren 1995) or association coefficients (Sneath and Sokal 1973).
As concept counts are numbers of non-negative values, no Gaussian distribution can be used. By this fact the Bravais–Pearson coefficient of correlation is not suitable for measuring the similarity of two patents. Neither a binomial distribution would be adequate, though fitting the condition of non-negative values, as the interval of numbers is right-opened in contrast to the closed set of a binomial distribution. A possible alternative is to view the concept counts as realizations of Poisson distributions as they fit all the criteria of it. Further calculations can then be made on the background of logit models. Another way is the use of Spearman’s coefficient of correlation as of course the numbers can be put into an ordered sequence. The advantage is that the exact distance between two entries does not need to be regarded. Spearman’s coefficient of correlation is the ordinal pendant of the metric Bravais-Pearson coefficient of correlation and so a suitable measure of similarity. Also Kendalls coefficient of correlation as a second potential measure should be mentioned at least.
The above mentioned coefficients have already been adopted for different fields of application. For example, Qin (2000) shows the adaption of the cosine coefficient and the Jaccard coefficient for comparison of documents. Very early, Braam et al. (1988) have used these coefficients for the co-citation cluster analysis and Rip and Courtal (1984) have described the application for the construction of co-word maps.
Sometimes the size may be homogenous within a set of patents (indicated by a low ratio between standard deviation and average), sometimes it may have the form of a normal distribution (indicated by middle ratio between standard deviation and average; additionally a test of goodness-of-fit should be applied), and also sometimes there may be some outliers (indicated by middle or high ratio between standard deviation and average).
An interesting aspect in this step is the selection of concepts, as they are crucial in determining similarities between documents. Normally, such general concepts as “distribution”, “input member”, “power” or “speed”, “structure” are not appropriate for characterizing special common topics, if used alone. In a case like this, “power” e.g. should be used together with “transmitting” or “transmittance”. However, for comparing related patents commonly a set of such concepts is used, allowing also utilizing such general concepts.

References

Batagelj, V., & Bren, M. (1995). Comparing resemblance measures. Journal of Classification, 12(1), 73–90.
Google Scholar
Bergmann, I., Butzke, D., Walter, L., Fuerste, J. P., Moehrle, M. G., & Erdmann, V. A. (2008). Evaluating the risk of patent infringement by means of semantic patent analysis: The case of DNA chips. R&D Management, 38(5), 550–562.
Google Scholar
Bonino, D., Ciaramella, A., & Corno, F. (2009). Review of the state-of-the-art in patent information and forthcoming evolutions in intelligent patent informatics. World Patent Information doi:10.1016/j.wpi.2009.05.008 (in press).
Braam, R. R., Moed, H. F., & van Raan, A. F. J. (1988). Mapping of science: Critical elaboration and new approaches, a case study in agricultural biochemistry. In L. Egghe & R. Rousseau (Eds.), Infometrics 87/88. Amerstdam: Elsevier Science.
Brosius, F. (2006). SPSS 14. Heidelberg: Redline.
Burke, P. F., & Reitzig, M. (2007). Measuring patent assessment quality—analyzing the degree and kind of (in)consistency in patent offices’ decision making. Research Policy, 36(9), 1404–1430.
Article Google Scholar
Daga, R., & Pandey, G. (2008). US-Patent application 2008/0162455 A1. Determination of document similarity.
Dehmer, M. (2005). Strukturelle Analyse Web-basierter Dokumente. Gabler: Wiesbaden.
Google Scholar
Dressler, A. (2006). Patente in technologieorientierten mergers und acquisitions. Wiesbaden: Deutscher Universitäts-Verlag.
Google Scholar
Gerken, J. M., & Moehrle, M. G. (2010). The evolution of torque transfer technology by use of computerized patent analysis. In Proceedings of the 20th CIRP design conference 2010, Nantes, France (accepted).
Gower, J. C., & Legendre, P. (1986). Metric and euclidean properties of dissimilarity coefficients. Journal of Classification, 3(1), 5–48.
Article MathSciNet MATH Google Scholar
Hamers, L., Hemeryck, Y., Herweyers, G., & Janssen, M. (1989). Similarity measures in scientometric research: The Jaccard index versus Salton’s Cosine formula. Information Processing & Management, 25(3), 315–318.
Article Google Scholar
Hippel, E.v. (1994). “Sticky information” and the locus of problem solving: Implications for innovation. Management Science, 40(4), 429–439.
Article Google Scholar
Jeong, B., Lee, D., Cho, H., & Lee, J. (2008). A novel method for measuring semantic similarity for XML schema matching. Expert Systems with Applications, 34(3), 1651–1658.
Article Google Scholar
Kanagasabai, R., & Pan, H. (2008). US-Patent 7,346,491 B2. Method of text similarity measurement.
Moehrle, M. G., & Geritz, A. (2007). Developing acquisition strategies based on patent maps. In M. H. Sherif & T. M. Khalil (Eds.), Management of technology: New directions in technology management (pp. 19–29) Amsterdam: Elsevier.
Moens, M.-F. (2006). Information extraction: Algorithms and prospects in a retrieval context. Berlin: Springer.
Park, J. (2005). Evolution of industry knowledge in the public domain: Prior art searching for software patents. SCRIPT-ed, 2(1), 47–70.
Article Google Scholar
Peters, H. P. F., & van Raan, A. F. J. (1993a). Co-word-based science maps of chemical engineering. Part I: Representations by direct multidimensional scaling. Research Policy, 22(1), 23–46.
Article Google Scholar
Peters, H. P. F., & van Raan, A. F. J. (1993b). Co-word-based science maps of chemical engineering. Part II: Representations by combined clustering and multidimensional scaling. Research Policy, 22(1), 47–71.
Article Google Scholar
Philipp, M. (2006). Patent filing and searching: Is deflation in quality the inevitable consequence of hyperinflation in quantity? World Patent Information, 28(2), 117–121.
Article MathSciNet Google Scholar
Qin, J. (2000). Semantic similarities between a keyword database and a controlled vocabulary database: An investigation in the antibiotic resistance literature. Journal of the American Society for Information Science, 51(2), 166–180.
Article Google Scholar
Ranganathan, A., & Ronen, R. (2008). US-Patent application 2008/0243809 A1. Information-theory based measure of similarity between instances in ontology.
Rip, A., & Courtal, P. (1984). Co-word maps of biotechnology: An example of cognitive scientometrics. Scientometrics, 6(6), 381–400.
Article Google Scholar
Sepkoski, J. J. (1974). Quantified coefficients of association and measurement of similarity. Mathematical Geology, 6(2), 135–152.
Article Google Scholar
Small, H. (1999). Visualizing science by citation mapping. Journal of the American Society for Information Science and Technology, 50(9), 799–813.
Article Google Scholar
Small, H. (2003). Paradigms, citations, and maps of science: A personal history. Journal of the American Society for Information Science and Technology, 54(5), 394–399.
Article MathSciNet Google Scholar
Sneath, P. H. A., & Sokal, R. R. (1973). Numerical taxonomy. San Francisco: W. H. Freeman and Company.
MATH Google Scholar
Sternitzke, C., & Bergmann, I. (2009). Similarity measures for document mapping: A comparative study on the level of an individual scientist. Scientometrics, 78(1), 113–130.
Article Google Scholar
Trippe, A. J. (2003). Patinformatics: Tasks and tools. World Patent Information, 25(3), 211–221.
Article Google Scholar
Tseng, Y. H., Lin, C. J., & Lin, Y. I. (2007). Text mining techniques for patent analysis. Information Processing and Management, 43(5), 1216–1247.
Article Google Scholar
Vinkler, P. (1999). Short term and long term impact factors and similarities of chemistry journals represented by references. Scientometrics, 46(3), 621–633.
Article Google Scholar
von Wartburg, I., Teichert, T., & Rost, K. (2005). Inventive progress measured by multi-stage patent citation analysis. Research Policy, 10, 1591–1607.
Article Google Scholar
Wen, G., Jiang, L., & Shadbolt, R. (2006). Ontology-based similarity between text documents on manifold. In R. Mizoguchi, Z. Shi, & F. Giunchiglia (Eds.), ASWC 2006, LNCS 4185 (pp. 113–125). Berlin: Springer.
Yang, Y. Y., Akers, L., Klose, T., & Barcelon, Y. C. (2008). Text mining and visualization tools—impressions of emerging capabilities. World Patent Information, 30(4), 280–293.
Google Scholar
Yanhong, L., & Runhua, T. T. (2007). A text-mining-bases patent analysis in product innovative process. In Léon-Rvira, N. (Ed.), Trends in computer aided innovation (pp. 89-96). New York: Springer.

Download references

Acknowledgement

The author wishes to thank Dipl.-Wirt.-Ing. Jan Michael Gerken, research associate at IPMI, University of Bremen, for his constructive input to several drafts of this paper.

Author information

Authors and Affiliations

IPMI-Institute for Project Management and Innovation, University of Bremen, Wilhelm-Herbst-Str. 12, 28213, Bremen, Germany
Martin G. Moehrle

Authors

Martin G. Moehrle
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin G. Moehrle.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Moehrle, M.G. Measures for textual patent similarities: a guided way to select appropriate approaches. Scientometrics 85, 95–109 (2010). https://doi.org/10.1007/s11192-010-0243-3

Download citation

Received: 20 October 2009
Published: 29 May 2010
Issue Date: October 2010
DOI: https://doi.org/10.1007/s11192-010-0243-3

Keywords

Mathematics Subject Classification (2000)

68U15

JEL Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Measures for textual patent similarities: a guided way to select appropriate approaches

Abstract

Access this article

Similar content being viewed by others

How to design bibliometric research: an overview and a framework proposal

Qualitative Text Analysis: A Systematic Approach

Factors affecting number of citations: a comprehensive review of the literature

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2000)

JEL Classification

Navigation

Measures for textual patent similarities: a guided way to select appropriate approaches

Abstract

Access this article

Similar content being viewed by others

How to design bibliometric research: an overview and a framework proposal

Qualitative Text Analysis: A Systematic Approach

Factors affecting number of citations: a comprehensive review of the literature

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2000)

JEL Classification

Search

Navigation