Abstract
The measurement of textual patent similarities is crucial for important tasks in patent management, be it prior art analysis, infringement analysis, or patent mapping. In this paper the common theory of similarity measurement is applied to the field of patents, using solitary concepts as basic textual elements of patents. After unfolding the term ‘similarity’ in a content and formal oriented level and presenting a basic model of understanding, a segmented approach to the measurement of underlying variables, similarity coefficients, and the criteria-related profiles of their combinations is lined out. This leads to a guided way to the application of textual patent similarities, interesting both for theory and practice.
Similar content being viewed by others
Notes
In contrast to scientific papers the citation style differs in patents (see von Wartburg et al. 2005), which makes it harder to use the classical instruments of scientometrics mentioned by Small (1999). For actual studies often no backward citation information is available, therefore other forms of similarity have to be used.
As concept counts are numbers of non-negative values, no Gaussian distribution can be used. By this fact the Bravais–Pearson coefficient of correlation is not suitable for measuring the similarity of two patents. Neither a binomial distribution would be adequate, though fitting the condition of non-negative values, as the interval of numbers is right-opened in contrast to the closed set of a binomial distribution. A possible alternative is to view the concept counts as realizations of Poisson distributions as they fit all the criteria of it. Further calculations can then be made on the background of logit models. Another way is the use of Spearman’s coefficient of correlation as of course the numbers can be put into an ordered sequence. The advantage is that the exact distance between two entries does not need to be regarded. Spearman’s coefficient of correlation is the ordinal pendant of the metric Bravais-Pearson coefficient of correlation and so a suitable measure of similarity. Also Kendalls coefficient of correlation as a second potential measure should be mentioned at least.
The above mentioned coefficients have already been adopted for different fields of application. For example, Qin (2000) shows the adaption of the cosine coefficient and the Jaccard coefficient for comparison of documents. Very early, Braam et al. (1988) have used these coefficients for the co-citation cluster analysis and Rip and Courtal (1984) have described the application for the construction of co-word maps.
Sometimes the size may be homogenous within a set of patents (indicated by a low ratio between standard deviation and average), sometimes it may have the form of a normal distribution (indicated by middle ratio between standard deviation and average; additionally a test of goodness-of-fit should be applied), and also sometimes there may be some outliers (indicated by middle or high ratio between standard deviation and average).
An interesting aspect in this step is the selection of concepts, as they are crucial in determining similarities between documents. Normally, such general concepts as “distribution”, “input member”, “power” or “speed”, “structure” are not appropriate for characterizing special common topics, if used alone. In a case like this, “power” e.g. should be used together with “transmitting” or “transmittance”. However, for comparing related patents commonly a set of such concepts is used, allowing also utilizing such general concepts.
References
Batagelj, V., & Bren, M. (1995). Comparing resemblance measures. Journal of Classification, 12(1), 73–90.
Bergmann, I., Butzke, D., Walter, L., Fuerste, J. P., Moehrle, M. G., & Erdmann, V. A. (2008). Evaluating the risk of patent infringement by means of semantic patent analysis: The case of DNA chips. R&D Management, 38(5), 550–562.
Bonino, D., Ciaramella, A., & Corno, F. (2009). Review of the state-of-the-art in patent information and forthcoming evolutions in intelligent patent informatics. World Patent Information doi:10.1016/j.wpi.2009.05.008 (in press).
Braam, R. R., Moed, H. F., & van Raan, A. F. J. (1988). Mapping of science: Critical elaboration and new approaches, a case study in agricultural biochemistry. In L. Egghe & R. Rousseau (Eds.), Infometrics 87/88. Amerstdam: Elsevier Science.
Brosius, F. (2006). SPSS 14. Heidelberg: Redline.
Burke, P. F., & Reitzig, M. (2007). Measuring patent assessment quality—analyzing the degree and kind of (in)consistency in patent offices’ decision making. Research Policy, 36(9), 1404–1430.
Daga, R., & Pandey, G. (2008). US-Patent application 2008/0162455 A1. Determination of document similarity.
Dehmer, M. (2005). Strukturelle Analyse Web-basierter Dokumente. Gabler: Wiesbaden.
Dressler, A. (2006). Patente in technologieorientierten mergers und acquisitions. Wiesbaden: Deutscher Universitäts-Verlag.
Gerken, J. M., & Moehrle, M. G. (2010). The evolution of torque transfer technology by use of computerized patent analysis. In Proceedings of the 20th CIRP design conference 2010, Nantes, France (accepted).
Gower, J. C., & Legendre, P. (1986). Metric and euclidean properties of dissimilarity coefficients. Journal of Classification, 3(1), 5–48.
Hamers, L., Hemeryck, Y., Herweyers, G., & Janssen, M. (1989). Similarity measures in scientometric research: The Jaccard index versus Salton’s Cosine formula. Information Processing & Management, 25(3), 315–318.
Hippel, E.v. (1994). “Sticky information” and the locus of problem solving: Implications for innovation. Management Science, 40(4), 429–439.
Jeong, B., Lee, D., Cho, H., & Lee, J. (2008). A novel method for measuring semantic similarity for XML schema matching. Expert Systems with Applications, 34(3), 1651–1658.
Kanagasabai, R., & Pan, H. (2008). US-Patent 7,346,491 B2. Method of text similarity measurement.
Moehrle, M. G., & Geritz, A. (2007). Developing acquisition strategies based on patent maps. In M. H. Sherif & T. M. Khalil (Eds.), Management of technology: New directions in technology management (pp. 19–29) Amsterdam: Elsevier.
Moens, M.-F. (2006). Information extraction: Algorithms and prospects in a retrieval context. Berlin: Springer.
Park, J. (2005). Evolution of industry knowledge in the public domain: Prior art searching for software patents. SCRIPT-ed, 2(1), 47–70.
Peters, H. P. F., & van Raan, A. F. J. (1993a). Co-word-based science maps of chemical engineering. Part I: Representations by direct multidimensional scaling. Research Policy, 22(1), 23–46.
Peters, H. P. F., & van Raan, A. F. J. (1993b). Co-word-based science maps of chemical engineering. Part II: Representations by combined clustering and multidimensional scaling. Research Policy, 22(1), 47–71.
Philipp, M. (2006). Patent filing and searching: Is deflation in quality the inevitable consequence of hyperinflation in quantity? World Patent Information, 28(2), 117–121.
Qin, J. (2000). Semantic similarities between a keyword database and a controlled vocabulary database: An investigation in the antibiotic resistance literature. Journal of the American Society for Information Science, 51(2), 166–180.
Ranganathan, A., & Ronen, R. (2008). US-Patent application 2008/0243809 A1. Information-theory based measure of similarity between instances in ontology.
Rip, A., & Courtal, P. (1984). Co-word maps of biotechnology: An example of cognitive scientometrics. Scientometrics, 6(6), 381–400.
Sepkoski, J. J. (1974). Quantified coefficients of association and measurement of similarity. Mathematical Geology, 6(2), 135–152.
Small, H. (1999). Visualizing science by citation mapping. Journal of the American Society for Information Science and Technology, 50(9), 799–813.
Small, H. (2003). Paradigms, citations, and maps of science: A personal history. Journal of the American Society for Information Science and Technology, 54(5), 394–399.
Sneath, P. H. A., & Sokal, R. R. (1973). Numerical taxonomy. San Francisco: W. H. Freeman and Company.
Sternitzke, C., & Bergmann, I. (2009). Similarity measures for document mapping: A comparative study on the level of an individual scientist. Scientometrics, 78(1), 113–130.
Trippe, A. J. (2003). Patinformatics: Tasks and tools. World Patent Information, 25(3), 211–221.
Tseng, Y. H., Lin, C. J., & Lin, Y. I. (2007). Text mining techniques for patent analysis. Information Processing and Management, 43(5), 1216–1247.
Vinkler, P. (1999). Short term and long term impact factors and similarities of chemistry journals represented by references. Scientometrics, 46(3), 621–633.
von Wartburg, I., Teichert, T., & Rost, K. (2005). Inventive progress measured by multi-stage patent citation analysis. Research Policy, 10, 1591–1607.
Wen, G., Jiang, L., & Shadbolt, R. (2006). Ontology-based similarity between text documents on manifold. In R. Mizoguchi, Z. Shi, & F. Giunchiglia (Eds.), ASWC 2006, LNCS 4185 (pp. 113–125). Berlin: Springer.
Yang, Y. Y., Akers, L., Klose, T., & Barcelon, Y. C. (2008). Text mining and visualization tools—impressions of emerging capabilities. World Patent Information, 30(4), 280–293.
Yanhong, L., & Runhua, T. T. (2007). A text-mining-bases patent analysis in product innovative process. In Léon-Rvira, N. (Ed.), Trends in computer aided innovation (pp. 89-96). New York: Springer.
Acknowledgement
The author wishes to thank Dipl.-Wirt.-Ing. Jan Michael Gerken, research associate at IPMI, University of Bremen, for his constructive input to several drafts of this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Moehrle, M.G. Measures for textual patent similarities: a guided way to select appropriate approaches. Scientometrics 85, 95–109 (2010). https://doi.org/10.1007/s11192-010-0243-3
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-010-0243-3
Keywords
- Patent
- Similarity measurement
- Similarity coefficients
- Prior art analysis
- Infringement analysis
- Patent mapping