Skip to main content
Log in

Using PageRank in the analysis of technological progress through patents: an illustration for biotechnological inventions

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

This paper examines whether PageRank algorithms are a valid instrument for the analysis of technical progress in specific technological fields by means of patent citation data. It provides evidence for patent data in biotechnology. Recent literature has been critical with regard to the use of PageRank for the analysis of scientific citation networks. The results reported in this paper indicate, however, that with some minor adaptations and careful interpretation of the results the algorithm can be used to capture some important stylised facts of technical progress and the importance of single patents relatively well especially if compared to indicators based on direct inward citations only.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Shaffer (2011) has introduced a recursive algorithm to identify the technological significance and economic value of a patent he refers to as Patent Rank. It is mathematically equivalent to PageRank (cf. Langville and Meyer 2003). For a discussion see Bruck et al. (2016).

  2. In the context of patent analysis scholars have used extreme value statistics to identify “superstar” patents, (cf. Silverberg and Verspagen 2007; Castaldi et al. 2014).

  3. Walker et al. (2007) define CiteRank as \({\mathbf {x}}^{\text {CR}} = {\mathbf {I} }{\mathbf {p}}^{\text {CR}} + \alpha {\mathbf {W}} {\mathbf {p}}^{\text {CR}} + \alpha ^2 {\mathbf {W}}^2 {\mathbf {p}}^{\text {CR}} + \cdots = \sum _{i = 0}^{\infty } \left( \alpha {\mathbf {W}} \right) ^i {\mathbf {p}}^{\text {CR}}\) with \({\mathbf {W}} = {\mathbf {A}}{\mathbf {D}}^{-1}\) in the notation of this paper.

  4. The inverse matrix of the upper triangular matrix

    $$\begin{aligned} {\mathbf {I} }- \alpha {\mathbf {A}} {\mathbf {D}}^{-1} = \begin{pmatrix} 1 & -\frac{\alpha A_{1,2}}{D_{2,2}} & \cdots & -\frac{\alpha A_{1 ,n}}{D_{n,n}} \\ 0 & \ddots & & \vdots \\ 0 & \cdots & 1 & -\frac{\alpha A_{n-1,n}}{D_{n,n}} \\ 0 & \cdots & 0 & 1 \end{pmatrix} \end{aligned}$$

    can be simple calculated from bottom up, so that for the calculation of row i of the inverse only the rows \(j \ge i\) are needed.

  5. Another approach to get a time consistent PageRank would be the calculation of the PageRank on a subsample of the patents from patents newer than the date d. If in that calculation the matrix \({\mathbf {D}}\) is taken from the original graph (meaning that the count of outgoing citations to patents older than the threshold date d are still considered) and sliced to the dimensions of the subsample adjacency matrix \({\mathbf {A}}\) we obtain an identical ranking.

  6. Assigning a weight to each patent is necessary to obtain meaningful centrality measures in acyclic graphs.

  7. In the “Appendix” we present Pareto plots for the EPO sample. Qualitatively the differences between the distribution of inward citations and PageRank scores are largely identical to the ones reported here for the global sample. However, the Pareto plot for PageRank scores for the EPO sample shows a slight downward sloping curvature for extreme values indicating that extreme outliers are less frequent in the EPO sample than in the global sample.

  8. In the “Appendix” we list the top 20 patents in biotechnology for EPO applications only.

  9. Erdi et al. (2013) propose an alternative similarity indicator based on a distance measure obtained from an inward citation vector for each patent across 36 technological subcategories. This measure is less granular and focuses on “long jumps” in the knowledge space, whereas the indicator used here is able to capture also local recombinant developments. For the level of aggregation chosen for this study, this method seems not so well suited as the number of clusters resulting from the analysis of the citation vectors is extremely large, and it is not possible to identify an objective criterion to determine the correct number of cluster.

  10. For an analysis of inward citations count data models (such as Poisson or NegBin models) would be more adequate than OLS. PageRank scores do not show properties of count data for this reason OLS was used. For the EPO sample PageRank scores (\(\alpha = 0.5\); \(\text{ PR } \times 10^{6}\)) range between a minimum of 2158 and a maximum of 192,478 with an average of 2600.

  11. As an EPO patent has to be validated by national partner patent offices and patent owners can then decide on a country-by-country basis whether to renew or not the patent the database contains several renewal dates. This is true for all other patents extended to foreign patent offices through the Paris convention as well. We take therefore the average number of renewals.

  12. The country codes used in Figs. 5 and 6 follow the International Standard for country codes ISO 3166. Codes can be accessed at https://www.iso.org/iso-3166-country-codes.html.

References

  • Abrams, D., Akcigit, U., & Popadak, J. (2013). Understanding the link between patent value and citations: Creative destruction or defensive disruption. mimeo., University of Pennsylvania.

  • Acemoglu, D., Akcigit, U., & Kerr, W. (2016). Innovation network. Working Paper 22783, National Bureau of Economic Research.

  • Alcacer, J., Gittelman, M., & Sampat, B. (2009). Applicant and examiner citations in U.S. patents: An overview and analysis. Research Policy, 38, 415–427.

    Article  Google Scholar 

  • Atallah, G., & Rodriguez, G. (2006). Indirect patent citations. Scientometrics, 67, 437–65.

    Article  Google Scholar 

  • Avrachenko, L., Litvak, N., & Pham, K. (2008). A singular pertubation approach for choosing the PageRank damping factor. Internet Mathematics, 5, 47–70.

    Article  MATH  MathSciNet  Google Scholar 

  • Bessen, J. (2008). The value of U.S. patents by owner and patent characteristics. Research Policy, 37, 932–945.

    Article  Google Scholar 

  • Boldi, P., Santini, S., & Vigna, S. (2005). PageRank as a function of the damping factor. In Proceedings of the 14th international conference on the world wide web, Association for Computing Machinery, New York.

  • Bonacich, P., & Lloyd, P. (2001). Eigenvector-like measures of centrality for asymmetric relations. Social Networks, 23, 191–201.

    Article  Google Scholar 

  • Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30, 107–117.

    Article  Google Scholar 

  • Bruck, P., Rethy, I., Szente, J., Tobochnik, J., & Erdi, P. (2016). Recognition of emerging technology trends: Cass-selective study of citations in the U.S. patent citation network. Scientometrics, 107, 1465–1475.

    Article  Google Scholar 

  • Castaldi, C., Frenken, K., & Los, B. (2014). Related variety, unrelated variety and technological breakthroughs: An analysis of US state-level patenting. Regional Studies, 49, 767–781.

    Article  Google Scholar 

  • Chen, P., Xie, H., Maslov, S., & Redner, S. (2007). Finding scientific gems with Google. Journal of Informetrics, 8, 8–15.

    Article  Google Scholar 

  • de Rassenfosse, G., Dernis, H., & Boedt, G. (2014). Patent citation data in social science research: Overview and best practices. Working Paper 8/14, Melbourne Institute Working Paper.

  • Ding, Y., Yan, E., Frazho, A., & Caverlee, J. (2009). PageRank for ranking authors in co-citation networks. Journal of the American Society for Information Science and Technology, 60, 2229–2243.

    Article  Google Scholar 

  • Erdi, P., Makovi, K., Somogyvari, Z., Strandburg, K., Tobochnik, J., Volf, P., et al. (2013). Predicition of emerging technologies based on analysis of the US patent citation network. Scientometrics, 95, 225–242.

    Article  Google Scholar 

  • Hagedoorn, J., & Cloodt, M. (2003). Measuring innovative performance: is there an advantage in using multiple indicators? Research Policy, 32, 1365–1379.

    Article  Google Scholar 

  • Hall, B., Griliches, Z., & Pakes, A. (1991). R&D, patents, and market value revisited: Is there a second (technological opportunity) factor? Economics of Innovation and New Technology, 1(1), 183–202.

    Google Scholar 

  • Hall, B., Jaffe, A., & Trajtenberg, M. (2005). Market value and patent citations. RAND Journal of Economics, 36, 16–38.

    Google Scholar 

  • Harhoff, D., Narin, F., & Vopel, K. (1999). Citation frequency and the value of patented inventions. Review of Economics and Statistics, 812, 511–515.

    Article  Google Scholar 

  • Jaffe, A., & de Rassenfosse, G. (2016). Patent citation data in social science research: Overview and best practices. Working Paper 21868, National Bureau of Economic Research.

  • Katila, R. (2000). Using patent data to measure innovation performance. International Journal of Business Performance Management, 2, 180–193.

    Article  Google Scholar 

  • Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika, 18, 39–43.

    Article  MATH  Google Scholar 

  • Kogler, D. F., Rigby, D. L., & Tucker, I. (2013). Mapping knowledge space and technological relatedness in US cities. European Planning Studies, 21, 1374–1391.

    Article  Google Scholar 

  • Langville, A., & Meyer, C. (2003). Survey: Deeper inside PageRank. Internet Mathematics, 1, 335–380.

    Article  MATH  Google Scholar 

  • Lanjouw, J., & Schankerman, M. (2001). Characteristics of patent litigation: A window on competition. RAND Journal of Economics, 32, 129–51.

    Article  Google Scholar 

  • Mariani, M. S., Medo, M., & Zhang, Y.-C. (2015). Ranking nodes in growing networks: When PageRank fails. Scientific Reports, 5, 16181.

    Article  Google Scholar 

  • Maslov, S., & Redner, S. (2008). Promise and pitfalls of extending google’s PageRank algorithm to citation networks. Journal of Neuroscience, 28(44), 11103–11105.

    Article  Google Scholar 

  • Scherer, F., & Harhoff, D. (2000). Technology policy for a world of skew-distributed outcomes. Research Policy, 29, 559–566.

    Article  Google Scholar 

  • Shaffer, M. (2011). Entrepreneurial innovation: Patent rank and marketing science. PhD dissertation, Washington State University.

  • Silverberg, G. (2002). The discrete charm of the bourgeoisie: Quantum and continuous perspectives on innovation and growth. Research Policy, 31, 1275–1289.

    Article  Google Scholar 

  • Silverberg, G., & Verspagen, B. (2007). The size distribution of innovation revisited: An application of extreme value statistics to citation and value measures of patent significance. Journal of Econometrics, 139, 318–339.

    Article  MATH  MathSciNet  Google Scholar 

  • Trajtenberg, M. (1990). A penny for your quote: Patent citations and the value of innovations. RAND Journal of Economics, 21, 172–187.

    Article  Google Scholar 

  • Valverde, S., Sole, R. V., Bedau, M. A., & Packard, N. (2007). Topology and evolution of technology innovation networks. Physical Review E, 76(5), 056118.

    Article  Google Scholar 

  • Verspagen, B. (2007). Mapping technological trajectories as patent citation networks: A study on the history of fuel cell research. Advances in Complex Systems, 10, 93–115.

    Article  MATH  Google Scholar 

  • Walker, D., Xie, H., Yan, K.-K., & Maslov, S. (2007). Ranking scientific publications using a model of network traffic. Journal of Statistical Mechanics: Theory and Experiment, 2007(06), P06010.

    Article  Google Scholar 

Download references

Acknowledgements

The research leading to this paper has received funding in the context of the Austria 2025 Research Project funded by the Federal Ministry of Transport, Innovation and Technology (BMVIT), the Federal Ministry Economic Affairs and Research (BMWFW), and the Austrian National Bank (OeNB). Support by the Austrian Council for Research and Technology Development is also acknowledged. We thank Kathrin Hoffmann for research assistance.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Reinstaller.

Appendix

Appendix

IPC classes

List of IPC labels of classes used in the network graphs in Fig. 4 (Table 6):

IPC code

Label

A01H

New plants or processes for obtaining them; plant reproduction by tissue culture techniques

A61K

Preparations for medical, dental, or toilet purposes (devices or methods specially adapted for bringing pharmaceutical products into particular physical or administering forms A61J0003000000; chemical aspects of, or use of materials for deodorisation of air, for disinfection or sterilisation, or for bandages, dressings, absorbent pads or surgical articles A61L)

C02F

Treatment of water, waste water, sewage, or sludge (processes for making harmful chemical substances harmless, or less harmful, by effecting a chemical change in the substances A62D0003000000; separation, settling tanks or filter devices B01D; special arrangements on waterborne vessels of installations for treating water, waste water or sewage, e.g. for producing fresh water, B63J; adding materials to water to prevent corrosion C23F; treating radioactively-contaminated liquids G21F0009040000)

C07G

Compounds of unknown constitution (sulfonated fats, oils or waxes of undetermined constution C07C0309620000)

C07K

PEPTIDES (peptides containing-lactam rings C07D; cyclic dipeptides not having in their molecule any other peptide link than those which form their ring, e.g. piperazine-2,5-diones, C07D; ergot alkaloids of the cyclic peptide type C07D0519020000; genetic engineering processes for obtaining peptides C12N0015000000)

C12M

Apparatus for enzymology or microbiology (installations for fermenting manure A01C0003020000; preservation of living parts of humans or animals A01N0001020000; brewing apparatus C12C; fermentation apparatus for wine C12G; apparatus for preparing vinegar C12J0001100000)

C12N

Micro-organisms or enzymes; compositions thereof (biocides, pest repellants or attractants, or plant growth regulators containing micro-organisms, viruses, microbial fungi, enzymes, fermentates, or substances produced by, or extracted from, micro-organisms or animal material A01N0063000000; medicinal preparations A61K; fertilisers C05F); Propagating, preserving, or maintaining micro-organisms; mutation or genetic engineering; culture media (microbiological testing media C12Q0001000000)

C12P

Fermentation or enzyme-using processes to synthesise a desired chemical compound or composition or to separate optical isomers from a racemic mixture

C12Q

Measuring or testing processes involving enzymes or micro-organisms (immunoassay G01N0033530000); Compositions or test papers therefor; processes of preparing such compositions; condition-responsive control in microbiological or enzymological processes

C40B

Combinatorial chemistry; libraries, e.g. chemical libraries; in silico libraries

G01N

Investigating or analysing materials by determining their chemical or physical properties (measuring or testing processes other than immunoassay, involving enzymes or micro-organisms C12M, C12Q)

Table 6 Top 20 patents in biotechnology in the EPO sample (EPO patent applications only) from the EPO PATSTAT database

Centrality measures

Alpha centrality is defined as follows:

$$\begin{aligned} {\mathbf {x}} = \left( \alpha {\mathbf {I} }+ \alpha A + \alpha ^2 A^2 + \cdots \right) \mathbf {1} = \sum _{i = 0}^{\infty } \left( \alpha A \right) ^i \mathbf {1} = \left( {\mathbf {I} }- \alpha A \right) ^{-1} \mathbf {1}. \end{aligned}$$

It corresponds to a PageRank without weight based on outward citations.

Pareto plots and the list of top 20 patents in biotechnology for EPO patent applications only

See Fig. 8.

Fig. 8
figure 8

Authors’ calculations based on the EPO sample of patents in biotechnology extracted from the EPO PATSTAT database. Vertical lines indicate 90 and 99% of total number of patents in the sample. a Distribution of inward citations. b Distribution PageRank scores

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Reinstaller, A., Reschenhofer, P. Using PageRank in the analysis of technological progress through patents: an illustration for biotechnological inventions. Scientometrics 113, 1407–1438 (2017). https://doi.org/10.1007/s11192-017-2549-x

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-017-2549-x

Keywords

JEL Classification

Navigation