Abstract
This paper examines whether PageRank algorithms are a valid instrument for the analysis of technical progress in specific technological fields by means of patent citation data. It provides evidence for patent data in biotechnology. Recent literature has been critical with regard to the use of PageRank for the analysis of scientific citation networks. The results reported in this paper indicate, however, that with some minor adaptations and careful interpretation of the results the algorithm can be used to capture some important stylised facts of technical progress and the importance of single patents relatively well especially if compared to indicators based on direct inward citations only.
Similar content being viewed by others
Notes
Walker et al. (2007) define CiteRank as \({\mathbf {x}}^{\text {CR}} = {\mathbf {I} }{\mathbf {p}}^{\text {CR}} + \alpha {\mathbf {W}} {\mathbf {p}}^{\text {CR}} + \alpha ^2 {\mathbf {W}}^2 {\mathbf {p}}^{\text {CR}} + \cdots = \sum _{i = 0}^{\infty } \left( \alpha {\mathbf {W}} \right) ^i {\mathbf {p}}^{\text {CR}}\) with \({\mathbf {W}} = {\mathbf {A}}{\mathbf {D}}^{-1}\) in the notation of this paper.
The inverse matrix of the upper triangular matrix
$$\begin{aligned} {\mathbf {I} }- \alpha {\mathbf {A}} {\mathbf {D}}^{-1} = \begin{pmatrix} 1 & -\frac{\alpha A_{1,2}}{D_{2,2}} & \cdots & -\frac{\alpha A_{1 ,n}}{D_{n,n}} \\ 0 & \ddots & & \vdots \\ 0 & \cdots & 1 & -\frac{\alpha A_{n-1,n}}{D_{n,n}} \\ 0 & \cdots & 0 & 1 \end{pmatrix} \end{aligned}$$can be simple calculated from bottom up, so that for the calculation of row i of the inverse only the rows \(j \ge i\) are needed.
Another approach to get a time consistent PageRank would be the calculation of the PageRank on a subsample of the patents from patents newer than the date d. If in that calculation the matrix \({\mathbf {D}}\) is taken from the original graph (meaning that the count of outgoing citations to patents older than the threshold date d are still considered) and sliced to the dimensions of the subsample adjacency matrix \({\mathbf {A}}\) we obtain an identical ranking.
Assigning a weight to each patent is necessary to obtain meaningful centrality measures in acyclic graphs.
In the “Appendix” we present Pareto plots for the EPO sample. Qualitatively the differences between the distribution of inward citations and PageRank scores are largely identical to the ones reported here for the global sample. However, the Pareto plot for PageRank scores for the EPO sample shows a slight downward sloping curvature for extreme values indicating that extreme outliers are less frequent in the EPO sample than in the global sample.
In the “Appendix” we list the top 20 patents in biotechnology for EPO applications only.
Erdi et al. (2013) propose an alternative similarity indicator based on a distance measure obtained from an inward citation vector for each patent across 36 technological subcategories. This measure is less granular and focuses on “long jumps” in the knowledge space, whereas the indicator used here is able to capture also local recombinant developments. For the level of aggregation chosen for this study, this method seems not so well suited as the number of clusters resulting from the analysis of the citation vectors is extremely large, and it is not possible to identify an objective criterion to determine the correct number of cluster.
For an analysis of inward citations count data models (such as Poisson or NegBin models) would be more adequate than OLS. PageRank scores do not show properties of count data for this reason OLS was used. For the EPO sample PageRank scores (\(\alpha = 0.5\); \(\text{ PR } \times 10^{6}\)) range between a minimum of 2158 and a maximum of 192,478 with an average of 2600.
As an EPO patent has to be validated by national partner patent offices and patent owners can then decide on a country-by-country basis whether to renew or not the patent the database contains several renewal dates. This is true for all other patents extended to foreign patent offices through the Paris convention as well. We take therefore the average number of renewals.
The country codes used in Figs. 5 and 6 follow the International Standard for country codes ISO 3166. Codes can be accessed at https://www.iso.org/iso-3166-country-codes.html.
References
Abrams, D., Akcigit, U., & Popadak, J. (2013). Understanding the link between patent value and citations: Creative destruction or defensive disruption. mimeo., University of Pennsylvania.
Acemoglu, D., Akcigit, U., & Kerr, W. (2016). Innovation network. Working Paper 22783, National Bureau of Economic Research.
Alcacer, J., Gittelman, M., & Sampat, B. (2009). Applicant and examiner citations in U.S. patents: An overview and analysis. Research Policy, 38, 415–427.
Atallah, G., & Rodriguez, G. (2006). Indirect patent citations. Scientometrics, 67, 437–65.
Avrachenko, L., Litvak, N., & Pham, K. (2008). A singular pertubation approach for choosing the PageRank damping factor. Internet Mathematics, 5, 47–70.
Bessen, J. (2008). The value of U.S. patents by owner and patent characteristics. Research Policy, 37, 932–945.
Boldi, P., Santini, S., & Vigna, S. (2005). PageRank as a function of the damping factor. In Proceedings of the 14th international conference on the world wide web, Association for Computing Machinery, New York.
Bonacich, P., & Lloyd, P. (2001). Eigenvector-like measures of centrality for asymmetric relations. Social Networks, 23, 191–201.
Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30, 107–117.
Bruck, P., Rethy, I., Szente, J., Tobochnik, J., & Erdi, P. (2016). Recognition of emerging technology trends: Cass-selective study of citations in the U.S. patent citation network. Scientometrics, 107, 1465–1475.
Castaldi, C., Frenken, K., & Los, B. (2014). Related variety, unrelated variety and technological breakthroughs: An analysis of US state-level patenting. Regional Studies, 49, 767–781.
Chen, P., Xie, H., Maslov, S., & Redner, S. (2007). Finding scientific gems with Google. Journal of Informetrics, 8, 8–15.
de Rassenfosse, G., Dernis, H., & Boedt, G. (2014). Patent citation data in social science research: Overview and best practices. Working Paper 8/14, Melbourne Institute Working Paper.
Ding, Y., Yan, E., Frazho, A., & Caverlee, J. (2009). PageRank for ranking authors in co-citation networks. Journal of the American Society for Information Science and Technology, 60, 2229–2243.
Erdi, P., Makovi, K., Somogyvari, Z., Strandburg, K., Tobochnik, J., Volf, P., et al. (2013). Predicition of emerging technologies based on analysis of the US patent citation network. Scientometrics, 95, 225–242.
Hagedoorn, J., & Cloodt, M. (2003). Measuring innovative performance: is there an advantage in using multiple indicators? Research Policy, 32, 1365–1379.
Hall, B., Griliches, Z., & Pakes, A. (1991). R&D, patents, and market value revisited: Is there a second (technological opportunity) factor? Economics of Innovation and New Technology, 1(1), 183–202.
Hall, B., Jaffe, A., & Trajtenberg, M. (2005). Market value and patent citations. RAND Journal of Economics, 36, 16–38.
Harhoff, D., Narin, F., & Vopel, K. (1999). Citation frequency and the value of patented inventions. Review of Economics and Statistics, 812, 511–515.
Jaffe, A., & de Rassenfosse, G. (2016). Patent citation data in social science research: Overview and best practices. Working Paper 21868, National Bureau of Economic Research.
Katila, R. (2000). Using patent data to measure innovation performance. International Journal of Business Performance Management, 2, 180–193.
Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika, 18, 39–43.
Kogler, D. F., Rigby, D. L., & Tucker, I. (2013). Mapping knowledge space and technological relatedness in US cities. European Planning Studies, 21, 1374–1391.
Langville, A., & Meyer, C. (2003). Survey: Deeper inside PageRank. Internet Mathematics, 1, 335–380.
Lanjouw, J., & Schankerman, M. (2001). Characteristics of patent litigation: A window on competition. RAND Journal of Economics, 32, 129–51.
Mariani, M. S., Medo, M., & Zhang, Y.-C. (2015). Ranking nodes in growing networks: When PageRank fails. Scientific Reports, 5, 16181.
Maslov, S., & Redner, S. (2008). Promise and pitfalls of extending google’s PageRank algorithm to citation networks. Journal of Neuroscience, 28(44), 11103–11105.
Scherer, F., & Harhoff, D. (2000). Technology policy for a world of skew-distributed outcomes. Research Policy, 29, 559–566.
Shaffer, M. (2011). Entrepreneurial innovation: Patent rank and marketing science. PhD dissertation, Washington State University.
Silverberg, G. (2002). The discrete charm of the bourgeoisie: Quantum and continuous perspectives on innovation and growth. Research Policy, 31, 1275–1289.
Silverberg, G., & Verspagen, B. (2007). The size distribution of innovation revisited: An application of extreme value statistics to citation and value measures of patent significance. Journal of Econometrics, 139, 318–339.
Trajtenberg, M. (1990). A penny for your quote: Patent citations and the value of innovations. RAND Journal of Economics, 21, 172–187.
Valverde, S., Sole, R. V., Bedau, M. A., & Packard, N. (2007). Topology and evolution of technology innovation networks. Physical Review E, 76(5), 056118.
Verspagen, B. (2007). Mapping technological trajectories as patent citation networks: A study on the history of fuel cell research. Advances in Complex Systems, 10, 93–115.
Walker, D., Xie, H., Yan, K.-K., & Maslov, S. (2007). Ranking scientific publications using a model of network traffic. Journal of Statistical Mechanics: Theory and Experiment, 2007(06), P06010.
Acknowledgements
The research leading to this paper has received funding in the context of the Austria 2025 Research Project funded by the Federal Ministry of Transport, Innovation and Technology (BMVIT), the Federal Ministry Economic Affairs and Research (BMWFW), and the Austrian National Bank (OeNB). Support by the Austrian Council for Research and Technology Development is also acknowledged. We thank Kathrin Hoffmann for research assistance.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
IPC classes
List of IPC labels of classes used in the network graphs in Fig. 4 (Table 6):
IPC code | Label |
---|---|
A01H | New plants or processes for obtaining them; plant reproduction by tissue culture techniques |
A61K | Preparations for medical, dental, or toilet purposes (devices or methods specially adapted for bringing pharmaceutical products into particular physical or administering forms A61J0003000000; chemical aspects of, or use of materials for deodorisation of air, for disinfection or sterilisation, or for bandages, dressings, absorbent pads or surgical articles A61L) |
C02F | Treatment of water, waste water, sewage, or sludge (processes for making harmful chemical substances harmless, or less harmful, by effecting a chemical change in the substances A62D0003000000; separation, settling tanks or filter devices B01D; special arrangements on waterborne vessels of installations for treating water, waste water or sewage, e.g. for producing fresh water, B63J; adding materials to water to prevent corrosion C23F; treating radioactively-contaminated liquids G21F0009040000) |
C07G | Compounds of unknown constitution (sulfonated fats, oils or waxes of undetermined constution C07C0309620000) |
C07K | PEPTIDES (peptides containing-lactam rings C07D; cyclic dipeptides not having in their molecule any other peptide link than those which form their ring, e.g. piperazine-2,5-diones, C07D; ergot alkaloids of the cyclic peptide type C07D0519020000; genetic engineering processes for obtaining peptides C12N0015000000) |
C12M | Apparatus for enzymology or microbiology (installations for fermenting manure A01C0003020000; preservation of living parts of humans or animals A01N0001020000; brewing apparatus C12C; fermentation apparatus for wine C12G; apparatus for preparing vinegar C12J0001100000) |
C12N | Micro-organisms or enzymes; compositions thereof (biocides, pest repellants or attractants, or plant growth regulators containing micro-organisms, viruses, microbial fungi, enzymes, fermentates, or substances produced by, or extracted from, micro-organisms or animal material A01N0063000000; medicinal preparations A61K; fertilisers C05F); Propagating, preserving, or maintaining micro-organisms; mutation or genetic engineering; culture media (microbiological testing media C12Q0001000000) |
C12P | Fermentation or enzyme-using processes to synthesise a desired chemical compound or composition or to separate optical isomers from a racemic mixture |
C12Q | Measuring or testing processes involving enzymes or micro-organisms (immunoassay G01N0033530000); Compositions or test papers therefor; processes of preparing such compositions; condition-responsive control in microbiological or enzymological processes |
C40B | Combinatorial chemistry; libraries, e.g. chemical libraries; in silico libraries |
G01N | Investigating or analysing materials by determining their chemical or physical properties (measuring or testing processes other than immunoassay, involving enzymes or micro-organisms C12M, C12Q) |
Centrality measures
Alpha centrality is defined as follows:
It corresponds to a PageRank without weight based on outward citations.
Pareto plots and the list of top 20 patents in biotechnology for EPO patent applications only
See Fig. 8.
Rights and permissions
About this article
Cite this article
Reinstaller, A., Reschenhofer, P. Using PageRank in the analysis of technological progress through patents: an illustration for biotechnological inventions. Scientometrics 113, 1407–1438 (2017). https://doi.org/10.1007/s11192-017-2549-x
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-017-2549-x