Skip to main content
Log in

Modeling the coevolution between citations and coauthorship of scientific papers

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Collaborations and citations within scientific research grow simultaneously and interact dynamically. Modelling the coevolution between them helps to study many phenomena that can be approached only through combining citation and coauthorship data. A geometric graph for the coevolution is proposed, the mechanism of which synthetically expresses the interactive impacts of authors and papers in a geometrical way. The model is validated against a dataset of papers published on PNAS during 2007–2015. The validation shows the ability to reproduce a range of features observed with citation and coauthorship data combined and separately. Particularly, in the empirical distribution of citations per author there exist two limits, in which the distribution appears as a generalized Poisson and a power-law respectively. Our model successfully reproduces the shape of the distribution, and provides an explanation for how the shape emerges via the decisions of authors. The model also captures the empirically positive correlations between the numbers of authors’ papers, citations and collaborators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Balietti, S., Goldstone, R. L., & Helbing, D. (2016). Peer review and competition in the art exhibition game. Proceedings of the National Academy of Sciences USA, 113(30), 8414–8419.

    Article  Google Scholar 

  • Barabási, A. L., Jeong, H., Néda, Z., Ravasz, E., Schubert, A., & Vicsek, T. (2002). Evolution of the social network of scientific collaborations. Physica A, 311, 590–614.

    Article  MathSciNet  MATH  Google Scholar 

  • Barabási, A. L., Jeong, H., Neda, Z., Ravasz, E., Schubert, A., & Vicsek, T. (2002). Evolution of the social network of scientific collaborations. Physica A, 311(3–4), 590–614.

    Article  MathSciNet  MATH  Google Scholar 

  • Börner, K., Maru, J. T., & Goldstone, R. L. (2004). The simultaneous evolution of author and paper networks. Proceedings of the National Academy of Sciences of the United States of America, 101(suppl 1), 5266–5273.

    Article  Google Scholar 

  • Bornmann, L., & Daniel, H. D. (2009). Universality of citation distributions-A validation of Radicchi et al’.s relative indicator \(\text{c{f}}{=\,}\text{c/c}{0}\) at the micro level using data from chemistry. Journal of the American Society for Information Science and Technology, 60(8), 1664–1670.

    Article  Google Scholar 

  • Catanzaro, M., Caldarelli, G., & Pietronero, L. (2004). Assortative model for social networks. Physical Review E, 70, 037101.

    Article  Google Scholar 

  • Christensen, K., & Moloney, N. R. (2005). Complexity and criticality. London: Imperial College Press.

    Book  MATH  Google Scholar 

  • Consul, P. C., & Jain, G. C. (1973). A generalization of the Poisson distribution. Technometrics, 15(4), 791–799.

    Article  MathSciNet  MATH  Google Scholar 

  • de Solla, Price D. J. (1965). Networks of scientific papers. Science, 149(3683), 510–515.

    Article  Google Scholar 

  • de Solla, Price D. J. (1976). A general theory of bibliometric and other cumulative advantage process. Journal of the American Society for Information Science, 27(5), 292–306.

    Article  Google Scholar 

  • Evans, T. S., Hopkins, N., & Kaube, B. S. (2012). Universality of performance indicators based on citation and reference counts. Scientometrics, 93, 473–495.

    Article  Google Scholar 

  • Glänzel, W., & Schubert, A. (2004). Analysing scientific networks through co-authorship. Handbook of quantitative science and technology research (pp 257–276).

  • Glänzel, W. (2002). Coauthorship patterns and trends in the sciences (1980–1998): A bibliometric study with implications for database indexing and search strategies. Library Trends, 50(3), 461.

    Google Scholar 

  • Glänzel, W. (2011). National characteristics in international scientific co-authorship relations. Scientometrics, 51, 69–115.

    Article  Google Scholar 

  • Goldberg, S. R., Anthony, H., & Evans, T. S. (2015). Modelling citation networks. Scientometrics, 105, 1577–1604.

    Article  Google Scholar 

  • Kim, J., & Diesner, J. (2016). Distortive effects of initial-based name disambiguation on measurements of large-scale coauthorship networks. Journal of the Association for Information Science and Technology, 67(6), 1446–1461.

    Article  Google Scholar 

  • Krioukov, D., Kitsak, M., Sinkovits, R. S., Rideout, D., Meyer, D., & Boguñá, M. (2012). Network cosmology. Scientific Reports, 2, 793.

    Article  Google Scholar 

  • Kuhn, T., Perc, M., & Helbing, D. (2014). Inheritance patterns in citation networks reveal scientific memes. Physical Review X, 4(4), 041036.

    Article  Google Scholar 

  • Liben-Nowell, D., & Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the Association for Information Science and Technology, 58(7), 1019–1031.

    Article  Google Scholar 

  • Mali, F., Kronegger, L., Doreian, P., & Ferligoj, A. (2012). Dynamic scientific coauthorship networks. In: Scharnhorst A, Börner K, Besselaar PVD editors. Models of science dynamics. Springer, Berlin (pp 195–232).

  • Martin, T., Ball, B., Karrer, B., & Newman, M.E.J. (2013). Coauthorship and citation in scientific publishing. arXiv:1304.0473.

  • Milojević, S. (2010). Modes of collaboration in modern science: Beyond power laws and preferential attachment. Journal of the Association for Information Science and Technology, 61(7), 1410–1423.

    Article  Google Scholar 

  • Milojević, S. (2014). Principles of scientific research team formation and evolution. Proceedings of the National Academy of Sciences USA, 111, 3984–3989.

    Article  Google Scholar 

  • Moody, J. (2004). The strucutre of a social science collaboration network: Disciplinery cohesion form 1963 to 1999. American Sociological Review, 69(2), 213–238.

    Article  Google Scholar 

  • Newman, M. (2004). Coauthorship networks and patterns of scientific collaboration. Proceedings of the National Academy of Sciences of the USA, 101, 5200–5205.

    Article  Google Scholar 

  • Perc, C. (2010). Growth and structure of Slovenia’s scientific collaboration network. Journal of Informetrics, 4, 475–482.

    Article  Google Scholar 

  • Perc, M. (2010). Zipf’s law and log-normal distributions in measures of scientific output across fields and institutions: 40 years of Slovenia’s research as an example. Journal of Informetrics, 4(3), 358–364.

    Article  Google Scholar 

  • Perc, M. (2013). Self-organization of progress across the century of physics. Scientific Reports, 3, 1720.

    Article  Google Scholar 

  • Perc, M. (2014). The Matthew effect in empirical data. Journal of The Royal Society Interface, 11(98), 20140378.

    Article  Google Scholar 

  • Radicchi, F., & Castellano, C. (2015). Understanding the scientific enterprise: citation analysis, data and modeling. In Social Phenomena. Springer. (pp 135–151).

  • Radicchi, F., Fortunato, S., Markines, B., & Vespignani, A. (2009). Diffusion of scientific credits and the ranking of scientists. Physical Review E, 80(5), 056103.

    Article  Google Scholar 

  • Squazzoni, F., & Gandelli, C. (2012). Saint Matthew strikes again: An agent-based model of peer review and the scientific community structure. Journal of Informetrics, 6(2), 265–275.

    Article  Google Scholar 

  • Tomassini, M., & Luthi, L. (2007). Empirical analysis of the evolution of a scientific collaboration network. Physica A, 285, 750–764.

    Article  Google Scholar 

  • Wagner, C. S., & Leydesdorff, L. (2005). Network structure, self-organization, and the growth of international collaboration in science. Research Policy, 34(10), 1608–1618.

    Article  Google Scholar 

  • Wang, D., Song, C., & Barabási, A. L. (2013). Quantifying long-term scientific impact. Science, 342, 127–132.

    Article  Google Scholar 

  • Xie, Z., Dong, E.M., Yi, D.Y., Ouyang, Z.Z., & Li, J.P. (2016). Modelling transition phenomena of scientific coauthorship networks. arXiv:1604.08891.

  • Xie, Z., Duan, X. J., Ouyang, Z. Z., & Zhang, P. Y. (2015). Quantitative analysis of the interdisciplinarity of applied mathematics. PLoS One, 10(9), e0137424.

    Article  Google Scholar 

  • Xie, Z., Ouyang, Z. Z., & Li, J. P. (2016). A geometric graph model for coauthorship networks. Journal of Informetrics, 10, 299–311.

    Article  Google Scholar 

  • Xie, Z., Ouyang, Z. Z., Liu, Q., & Li, J. P. (2016). A geometric graph model for citation networks of exponentially growing scientific papers. Physica A, 456, 167–175.

    Article  Google Scholar 

  • Xie, Z., Ouyang, Z. Z., Zhang, P. Y., Yi, D. Y., & Kong, D. X. (2015). Modeling the citation network by network cosmology. PLoS One, 10(3), e0120687.

    Article  Google Scholar 

  • Xie, Z., & Rogers, T. (2016). Scale-invariant geometric random graphs. Physical Review E, 93, 032310.

    Article  Google Scholar 

  • Zhou, T., Wang, B. H., Jin, Y. D., He, D. R., Zhang, P. P., He, Y., et al. (2007). Modeling collaboration networks based on nonlinear preferential attachment. International Journal of Modern Physics C, 18, 297–314.

    Article  MATH  Google Scholar 

Download references

Acknowledgements

We thank Professor K. Christensen for the valuable suggestions on the description of “cross-over”, Professor J. Y. Su for proofreading this paper. This work is supported by the fund from the national university of defense technology teacher training project (No. 434513512G).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zheng Xie.

Additional information

Zheng Xie, Zonglin Xie and Miao Li have contributed equally to this work.

Appendix

Appendix

Detecting boundary for probability density functions

The boundary detection algorithm for probability density functions (PDF) is listed in Table 6, which comes from Reference Xie et al. (2016b).

Table 6 A boundary detection algorithm for PDF

Simplifying the model

An obvious weakness of the provided model is that it has a lot of parameters. If ignoring the fitting of the distribution of references per paper and that of paper-team sizes, we can reduce the model’s parameters as those in Table 7. The reduction does not affect the synthetic distribution type of collaborators/papers per author, and that of citations per paper/per author (Fig. 8).

Table 7 The parameters of the synthetic data
Fig. 8
figure 8

The synthetic distribution of collaborators/citations/papers per author, and that of citations per paper. The model parameters are listed in Table 7

The underlying formula for the distribution of citations per paper

We only analyze the underlying formula for the distribution type of synthetic “citations per paper” (in-degrees), which is similar to that in References Xie et al. (2016c), Xie and Rogers (2016). The analysis of the formula for the distribution type of synthetic “collaborators per author” is the same as that in Reference Xie et al. (2016b). As shown in Fig. 4c, the synthetic in-degree distribution type is a mixture of generalized Poisson and power-law, hence the formula is analyzed piecewise. The formula for the head and that for the tail of the type are deduced respectively. The cross-over can be well fitted by the formula in the notes of Table 5.

The in-degrees contributed by the second half of Step 2.b are due to a random selection. Together with the preset small domain of f(x) in this step, the effect of the second half on in-degree distribution is small enough to be ignored, when compared with that contributed by the first half.

The first half of Step 2.b makes the expected in-degree of a node generated at time t to be \(k^-(t)\approx {\alpha _l}\delta p T^{ {\beta _l} } t^{- {\beta _l} }/\beta _l-1\), where \(l=2,3\) and \(\delta =N_1 /2\pi\). If t is large enough (suppose larger than a big number \(T_1\)), \(k^-(t)\) is small enough, and changes slowly over t. Hence the formula for the head is

$$\begin{aligned} P_S(k)&= \frac{1}{T-T_1+1} \sum ^{T}_{t=T_1} \frac{k^-(t)^k}{ k!} \mathrm {e}^{ - k^-(t) } , \end{aligned}$$
(1)

which is a mixture Poisson distribution. A generalized Poisson distribution can be well fitted by a mixture Poisson distribution, which can be verified numerically.

The formula for the tail is deduced as follows, where the calculations are inspired by some of the same general ideas as explored in the cosmological networks Krioukov et al. (2012):

$$\begin{aligned} P_L(k )&= \frac{1}{T_1 } \int ^{T_1 +1}_1 \frac{ k^-( t)^k}{ k! } \mathrm {e}^{-{ k^-( t) }} d t \propto \frac{1}{ k! } \int ^{\delta \alpha _lp T^{ {\beta _l} }/\beta _l }_{\delta \alpha _l (T_1 +1)^{-{\beta _l}}p T^{ {\beta _l} }/\beta _l } \tau ^{k-1-\frac{1}{\beta _l}} \mathrm {e}^{ -\tau } d\tau \nonumber \\&\approx \frac{1}{ k! } \left( \frac{k-1-\frac{1}{\beta _l}}{\mathrm {e}}\right) ^{k-1-\frac{1}{\beta _l}} \int ^{\delta \alpha _l p T^{ {\beta _l} }/\beta _l }_{\delta \alpha _l (T_1 +1)^{- {\beta _l}}p T^{ {\beta _l} }/\beta _l } \mathrm {e}^{\ -\frac{(\tau -k+1+\frac{1}{\beta _l})^2}{2(k-1-\frac{1}{\beta _l})}} d\tau \nonumber \\&\approx \frac{\Gamma (k-\frac{1}{\beta _l})}{ \Gamma (k+1) } \int ^{\delta \alpha _l p T^{ {\beta _l} }/\beta _l}_{\delta \alpha _l (T_1 +1)^{- {\beta _l}} p T^{ {\beta _l} }/\beta _l } \frac{ \mathrm {e}^{\ -\frac{(\tau -k+1+\frac{1}{\beta _l})^2}{2(k-1-\frac{1}{\beta _l})}} }{\sqrt{2\pi (k-1-\frac{1}{\beta _l})}} d\tau . \end{aligned}$$
(2)

Here Laplace approximation is used in the third step, and Stirling’s formula is used in the fourth step. When \(k\gg 0\), the integration part in Eq. (2) is free of k approximatively, which can be verified as follows:

$$\begin{aligned} \frac{d}{dk} \int ^{L_2}_{L_1 } \frac{\mathrm {e}^{\ -\frac{(\tau -k+\rho )^2}{2(k-\rho )}}}{\sqrt{2\pi (k-\rho )}} d\tau =&\frac{ \mathrm {e}^{-\frac{{\left( L_1 - k+\rho \right) }^2}{2 (k-\rho )}}}{2\, \sqrt{2 \pi (k-\rho )}}\left( 1+\frac{L_1 }{k-\rho }\right) -\frac{ \mathrm {e}^{-\frac{{\left( L_2 - k+\rho \right) }^2}{2 (k-s)}}}{2\, \sqrt{2 \pi (k-\rho )}}\left( 1+\frac{L_2 }{k-\rho }\right) , \end{aligned}$$
(3)

where \(L_1={\delta \alpha _l( T_1 +1)^{- {\beta _l}}} p T^{ {\beta _l} }/\beta _l\), \(L_2= {\delta \alpha _l } p T^{ {\beta _l} }/\beta _l\), and \(\rho =1+ {1}/{\beta _l}\). This derivative is approximately equal to 0 for \(k\gg 0\). Hence

$$\begin{aligned} P_L(k )&\propto \frac{\Gamma (k-\frac{1}{\beta _l})}{ \Gamma (k+1) }\approx \frac{1}{ k^{1+\frac{1}{\beta _l}}}\sqrt{\frac{k-\frac{1}{\beta _l}-1}{k}} \left( 1-\frac{\frac{1}{\beta _l}+1}{k}\right) ^{k-\frac{1}{\beta _l}-1} \mathrm {e}^{\frac{1}{\beta _l}+1} \approx \frac{1}{ k^{1+\frac{1}{\beta _l}}}. \end{aligned}$$
(4)

Stirling’s formula is used in the first approximation. The second approximation holds for \(k\gg 0\). Hence \(P_L(k )\) is approximately a power-law distribution with exponent \(1+ {1}/{\beta _l}\). So we obtain that the in-degree distribution tail of the network generated in Step 2.b is a mixture of power-law distributions with exponents \(1+ {1}/{\beta _1}\) and \(1+ {1}/{\beta _2}\) respectively. Note that in Eq. (1), the condition \(k\gg 0\) does not hold, so the power-law does not emerge in the head of the distribution.

Flexibility of the model

The provided model has the flexibility of fitting empirical data from different sciences. We have shown that the model can capture specific features of the empirical data PNAS 2007–2015, the papers of which mainly belong to biological sciences. Here we consider the data from physical sciences: the papers of Physical review E published during 2007–2016 (PRE 2007–2016). The data are gathered from the Web of Science. Authors are identified by their names on papers.

Synthetic data are generated through the provided model to capture specific features of PRE 2007–2016. The parameters of the synthetic data are listed in Table 8. Comparisons on statistic indicators and distributions are shown in Table 9 and Fig. 10 respectively.

Table 8 The parameters of the synthetic data
Table 9 Typical statistic indicators of the data
Fig. 9
figure 9

The empirical (PRE 2007–2016) and synthetic distributions of collaborators/citation/papers/ references per author, and those of citations/references per paper. The data are binned on abscissa axes to show the trends hiding in noise tails

TARL model

Constructively suggested by a reviewer, the result of TARL model is compared with that of the proposed model. The pseudo code of TARL model in Reference Börner et al. (2004) is repeated as follows.

  • Initialization

    • Generate m “papers” and n “authors” with randomly assigned “topics”;

    • Randomly assign l “authors” to the “papers” within the same “topic”.

  • For time \(t=1,2,\ldots ,T\) do:

    • Add s new “authors” with randomly assigned “topics”;

    • Deactivate the “authors” older than h;

    • For each “topic” do:

  • Randomly partition the “authors” within the “topic” into groups with size l;

  • For each group do:

    • Randomly read g “papers” from existing “papers” within the “topic”;

    • “Select a time-slice form (1 to \(t-\)1) with probability given in aging-function” Börner et al. (2004);

    • Generate a new “paper” and randomly cite k papers (published or cited in this time-slice) from the read “papers” and their references up to w-th level.

The generated connections are restricted to the “papers” and “authors” within the same “topic”. If no aging-function is given, then all “papers” can be “read” equally. We set the number of “topics” to be 4, and no aging-function (so no time-slice). We let \(T=200\), \(g=1\), \(h=T\), \(k=2\), \(w=2\), \(m=n=l=s=4\). The generated distribution of “collaborators”/“papers” per “author” and that of “citations” per “paper”/per “author” are shown in Fig. 10.

Fig. 10
figure 10

Synthetic distribution of collaborators/citations/papers per author, and that of citations per paper. Those distributions are generated through TARL model

TARL model can generate a “coauthorship” network and a “citation” network, which grow simultaneously. The “citation” network is scale-free (caused by recursive linking), and has a positive clustering coefficient (caused by citing the “papers” within the same “topic”). Our model harmoniously express the citation factors considered in TARL model (i. e. topics, aging and recursive follow-up of citation references) by the connection mechanism induced through the influential zones of “papers”. The aging of papers is expressed by decreasing the sizes of influential zones over t. In TARL model, “papers” and “authors” are assigned specific “topics” directly. In our model, we use a continuous way: expressing nodes’ “topic” by nodes’ spacial coordinate. So the circles could be regarded as “topic spaces”. Note that it is not a real topic space, which is a high dimensional space representing textual contents of papers. In our model, “papers” can incompletely “copy” the references of the “papers” it cited, which is induced through the overlapping of influential zones.

TARL model neither consider the Matthew effect on the number of authors’ collaborators nor that on papers. In addition, the above instantiation assumes that the number of “papers” per “author” is a constant. Hence, the generated distribution of “papers” per “author” and that of “collaborators” per “author” (Fig. 10) have no power-law tails, which emerge in the corresponding distributions from real data (Figs. 4, 9). Our model expresses those Matthew effects geometrically: older leaders having a larger influential zone to obtain more “collaborators” and “papers”. Therefore, our model can reproduce those power-law tails.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xie, Z., Xie, Z., Li, M. et al. Modeling the coevolution between citations and coauthorship of scientific papers. Scientometrics 112, 483–507 (2017). https://doi.org/10.1007/s11192-017-2359-1

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-017-2359-1

Keywords

Navigation