Modeling the coevolution between citations and coauthorship of scientific papers

Xie, Zheng; Xie, Zonglin; Li, Miao; Li, Jianping; Yi, Dongyun

doi:10.1007/s11192-017-2359-1

Modeling the coevolution between citations and coauthorship of scientific papers

Published: 27 March 2017

Volume 112, pages 483–507, (2017)
Cite this article

Scientometrics Aims and scope Submit manuscript

Zheng Xie¹,
Zonglin Xie¹,
Miao Li²,
Jianping Li¹ &
…
Dongyun Yi¹

679 Accesses
14 Citations
4 Altmetric
Explore all metrics

Abstract

Collaborations and citations within scientific research grow simultaneously and interact dynamically. Modelling the coevolution between them helps to study many phenomena that can be approached only through combining citation and coauthorship data. A geometric graph for the coevolution is proposed, the mechanism of which synthetically expresses the interactive impacts of authors and papers in a geometrical way. The model is validated against a dataset of papers published on PNAS during 2007–2015. The validation shows the ability to reproduce a range of features observed with citation and coauthorship data combined and separately. Particularly, in the empirical distribution of citations per author there exist two limits, in which the distribution appears as a generalized Poisson and a power-law respectively. Our model successfully reproduces the shape of the distribution, and provides an explanation for how the shape emerges via the decisions of authors. The model also captures the empirically positive correlations between the numbers of authors’ papers, citations and collaborators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

Article Open access 30 April 2020

How to design bibliometric research: an overview and a framework proposal

Article Open access 06 March 2024

Authorship conflicts in academia: an international cross-discipline survey

Article Open access 31 March 2024

References

Balietti, S., Goldstone, R. L., & Helbing, D. (2016). Peer review and competition in the art exhibition game. Proceedings of the National Academy of Sciences USA, 113(30), 8414–8419.
Article Google Scholar
Barabási, A. L., Jeong, H., Néda, Z., Ravasz, E., Schubert, A., & Vicsek, T. (2002). Evolution of the social network of scientific collaborations. Physica A, 311, 590–614.
Article MathSciNet MATH Google Scholar
Barabási, A. L., Jeong, H., Neda, Z., Ravasz, E., Schubert, A., & Vicsek, T. (2002). Evolution of the social network of scientific collaborations. Physica A, 311(3–4), 590–614.
Article MathSciNet MATH Google Scholar
Börner, K., Maru, J. T., & Goldstone, R. L. (2004). The simultaneous evolution of author and paper networks. Proceedings of the National Academy of Sciences of the United States of America, 101(suppl 1), 5266–5273.
Article Google Scholar
Bornmann, L., & Daniel, H. D. (2009). Universality of citation distributions-A validation of Radicchi et al’.s relative indicator $\text{c{f}}{=\,}\text{c/c}{0}$ at the micro level using data from chemistry. Journal of the American Society for Information Science and Technology, 60(8), 1664–1670.
Article Google Scholar
Catanzaro, M., Caldarelli, G., & Pietronero, L. (2004). Assortative model for social networks. Physical Review E, 70, 037101.
Article Google Scholar
Christensen, K., & Moloney, N. R. (2005). Complexity and criticality. London: Imperial College Press.
Book MATH Google Scholar
Consul, P. C., & Jain, G. C. (1973). A generalization of the Poisson distribution. Technometrics, 15(4), 791–799.
Article MathSciNet MATH Google Scholar
de Solla, Price D. J. (1965). Networks of scientific papers. Science, 149(3683), 510–515.
Article Google Scholar
de Solla, Price D. J. (1976). A general theory of bibliometric and other cumulative advantage process. Journal of the American Society for Information Science, 27(5), 292–306.
Article Google Scholar
Evans, T. S., Hopkins, N., & Kaube, B. S. (2012). Universality of performance indicators based on citation and reference counts. Scientometrics, 93, 473–495.
Article Google Scholar
Glänzel, W., & Schubert, A. (2004). Analysing scientific networks through co-authorship. Handbook of quantitative science and technology research (pp 257–276).
Glänzel, W. (2002). Coauthorship patterns and trends in the sciences (1980–1998): A bibliometric study with implications for database indexing and search strategies. Library Trends, 50(3), 461.
Google Scholar
Glänzel, W. (2011). National characteristics in international scientific co-authorship relations. Scientometrics, 51, 69–115.
Article Google Scholar
Goldberg, S. R., Anthony, H., & Evans, T. S. (2015). Modelling citation networks. Scientometrics, 105, 1577–1604.
Article Google Scholar
Kim, J., & Diesner, J. (2016). Distortive effects of initial-based name disambiguation on measurements of large-scale coauthorship networks. Journal of the Association for Information Science and Technology, 67(6), 1446–1461.
Article Google Scholar
Krioukov, D., Kitsak, M., Sinkovits, R. S., Rideout, D., Meyer, D., & Boguñá, M. (2012). Network cosmology. Scientific Reports, 2, 793.
Article Google Scholar
Kuhn, T., Perc, M., & Helbing, D. (2014). Inheritance patterns in citation networks reveal scientific memes. Physical Review X, 4(4), 041036.
Article Google Scholar
Liben-Nowell, D., & Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the Association for Information Science and Technology, 58(7), 1019–1031.
Article Google Scholar
Mali, F., Kronegger, L., Doreian, P., & Ferligoj, A. (2012). Dynamic scientific coauthorship networks. In: Scharnhorst A, Börner K, Besselaar PVD editors. Models of science dynamics. Springer, Berlin (pp 195–232).
Martin, T., Ball, B., Karrer, B., & Newman, M.E.J. (2013). Coauthorship and citation in scientific publishing. arXiv:1304.0473.
Milojević, S. (2010). Modes of collaboration in modern science: Beyond power laws and preferential attachment. Journal of the Association for Information Science and Technology, 61(7), 1410–1423.
Article Google Scholar
Milojević, S. (2014). Principles of scientific research team formation and evolution. Proceedings of the National Academy of Sciences USA, 111, 3984–3989.
Article Google Scholar
Moody, J. (2004). The strucutre of a social science collaboration network: Disciplinery cohesion form 1963 to 1999. American Sociological Review, 69(2), 213–238.
Article Google Scholar
Newman, M. (2004). Coauthorship networks and patterns of scientific collaboration. Proceedings of the National Academy of Sciences of the USA, 101, 5200–5205.
Article Google Scholar
Perc, C. (2010). Growth and structure of Slovenia’s scientific collaboration network. Journal of Informetrics, 4, 475–482.
Article Google Scholar
Perc, M. (2010). Zipf’s law and log-normal distributions in measures of scientific output across fields and institutions: 40 years of Slovenia’s research as an example. Journal of Informetrics, 4(3), 358–364.
Article Google Scholar
Perc, M. (2013). Self-organization of progress across the century of physics. Scientific Reports, 3, 1720.
Article Google Scholar
Perc, M. (2014). The Matthew effect in empirical data. Journal of The Royal Society Interface, 11(98), 20140378.
Article Google Scholar
Radicchi, F., & Castellano, C. (2015). Understanding the scientific enterprise: citation analysis, data and modeling. In Social Phenomena. Springer. (pp 135–151).
Radicchi, F., Fortunato, S., Markines, B., & Vespignani, A. (2009). Diffusion of scientific credits and the ranking of scientists. Physical Review E, 80(5), 056103.
Article Google Scholar
Squazzoni, F., & Gandelli, C. (2012). Saint Matthew strikes again: An agent-based model of peer review and the scientific community structure. Journal of Informetrics, 6(2), 265–275.
Article Google Scholar
Tomassini, M., & Luthi, L. (2007). Empirical analysis of the evolution of a scientific collaboration network. Physica A, 285, 750–764.
Article Google Scholar
Wagner, C. S., & Leydesdorff, L. (2005). Network structure, self-organization, and the growth of international collaboration in science. Research Policy, 34(10), 1608–1618.
Article Google Scholar
Wang, D., Song, C., & Barabási, A. L. (2013). Quantifying long-term scientific impact. Science, 342, 127–132.
Article Google Scholar
Xie, Z., Dong, E.M., Yi, D.Y., Ouyang, Z.Z., & Li, J.P. (2016). Modelling transition phenomena of scientific coauthorship networks. arXiv:1604.08891.
Xie, Z., Duan, X. J., Ouyang, Z. Z., & Zhang, P. Y. (2015). Quantitative analysis of the interdisciplinarity of applied mathematics. PLoS One, 10(9), e0137424.
Article Google Scholar
Xie, Z., Ouyang, Z. Z., & Li, J. P. (2016). A geometric graph model for coauthorship networks. Journal of Informetrics, 10, 299–311.
Article Google Scholar
Xie, Z., Ouyang, Z. Z., Liu, Q., & Li, J. P. (2016). A geometric graph model for citation networks of exponentially growing scientific papers. Physica A, 456, 167–175.
Article Google Scholar
Xie, Z., Ouyang, Z. Z., Zhang, P. Y., Yi, D. Y., & Kong, D. X. (2015). Modeling the citation network by network cosmology. PLoS One, 10(3), e0120687.
Article Google Scholar
Xie, Z., & Rogers, T. (2016). Scale-invariant geometric random graphs. Physical Review E, 93, 032310.
Article Google Scholar
Zhou, T., Wang, B. H., Jin, Y. D., He, D. R., Zhang, P. P., He, Y., et al. (2007). Modeling collaboration networks based on nonlinear preferential attachment. International Journal of Modern Physics C, 18, 297–314.
Article MATH Google Scholar

Download references

Acknowledgements

We thank Professor K. Christensen for the valuable suggestions on the description of “cross-over”, Professor J. Y. Su for proofreading this paper. This work is supported by the fund from the national university of defense technology teacher training project (No. 434513512G).

Author information

Authors and Affiliations

College of Science, National University of Defense Technology, Changsha, 410031, China
Zheng Xie, Zonglin Xie, Jianping Li & Dongyun Yi
School of Foreign Languages, Shanghai Jiao Tong University, Shanghai, 200240, China
Miao Li

Authors

Zheng Xie
View author publications
You can also search for this author in PubMed Google Scholar
Zonglin Xie
View author publications
You can also search for this author in PubMed Google Scholar
Miao Li
View author publications
You can also search for this author in PubMed Google Scholar
Jianping Li
View author publications
You can also search for this author in PubMed Google Scholar
Dongyun Yi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zheng Xie.

Additional information

Zheng Xie, Zonglin Xie and Miao Li have contributed equally to this work.

Appendix

Detecting boundary for probability density functions

The boundary detection algorithm for probability density functions (PDF) is listed in Table 6, which comes from Reference Xie et al. (2016b).

Table 6 A boundary detection algorithm for PDF

Full size table

Simplifying the model

An obvious weakness of the provided model is that it has a lot of parameters. If ignoring the fitting of the distribution of references per paper and that of paper-team sizes, we can reduce the model’s parameters as those in Table 7. The reduction does not affect the synthetic distribution type of collaborators/papers per author, and that of citations per paper/per author (Fig. 8).

Table 7 The parameters of the synthetic data

Full size table

The underlying formula for the distribution of citations per paper

We only analyze the underlying formula for the distribution type of synthetic “citations per paper” (in-degrees), which is similar to that in References Xie et al. (2016c), Xie and Rogers (2016). The analysis of the formula for the distribution type of synthetic “collaborators per author” is the same as that in Reference Xie et al. (2016b). As shown in Fig. 4c, the synthetic in-degree distribution type is a mixture of generalized Poisson and power-law, hence the formula is analyzed piecewise. The formula for the head and that for the tail of the type are deduced respectively. The cross-over can be well fitted by the formula in the notes of Table 5.

The in-degrees contributed by the second half of Step 2.b are due to a random selection. Together with the preset small domain of f(x) in this step, the effect of the second half on in-degree distribution is small enough to be ignored, when compared with that contributed by the first half.

The first half of Step 2.b makes the expected in-degree of a node generated at time t to be $k^-(t)\approx {\alpha _l}\delta p T^{ {\beta _l} } t^{- {\beta _l} }/\beta _l-1$, where $l=2,3$ and $\delta =N_1 /2\pi$. If t is large enough (suppose larger than a big number $T_1$), $k^-(t)$ is small enough, and changes slowly over t. Hence the formula for the head is

$$\begin{aligned} P_S(k)&= \frac{1}{T-T_1+1} \sum ^{T}_{t=T_1} \frac{k^-(t)^k}{ k!} \mathrm {e}^{ - k^-(t) } , \end{aligned}$$

(1)

which is a mixture Poisson distribution. A generalized Poisson distribution can be well fitted by a mixture Poisson distribution, which can be verified numerically.

The formula for the tail is deduced as follows, where the calculations are inspired by some of the same general ideas as explored in the cosmological networks Krioukov et al. (2012):

$$\begin{aligned} P_L(k )&= \frac{1}{T_1 } \int ^{T_1 +1}_1 \frac{ k^-( t)^k}{ k! } \mathrm {e}^{-{ k^-( t) }} d t \propto \frac{1}{ k! } \int ^{\delta \alpha _lp T^{ {\beta _l} }/\beta _l }_{\delta \alpha _l (T_1 +1)^{-{\beta _l}}p T^{ {\beta _l} }/\beta _l } \tau ^{k-1-\frac{1}{\beta _l}} \mathrm {e}^{ -\tau } d\tau \nonumber \\&\approx \frac{1}{ k! } \left( \frac{k-1-\frac{1}{\beta _l}}{\mathrm {e}}\right) ^{k-1-\frac{1}{\beta _l}} \int ^{\delta \alpha _l p T^{ {\beta _l} }/\beta _l }_{\delta \alpha _l (T_1 +1)^{- {\beta _l}}p T^{ {\beta _l} }/\beta _l } \mathrm {e}^{\ -\frac{(\tau -k+1+\frac{1}{\beta _l})^2}{2(k-1-\frac{1}{\beta _l})}} d\tau \nonumber \\&\approx \frac{\Gamma (k-\frac{1}{\beta _l})}{ \Gamma (k+1) } \int ^{\delta \alpha _l p T^{ {\beta _l} }/\beta _l}_{\delta \alpha _l (T_1 +1)^{- {\beta _l}} p T^{ {\beta _l} }/\beta _l } \frac{ \mathrm {e}^{\ -\frac{(\tau -k+1+\frac{1}{\beta _l})^2}{2(k-1-\frac{1}{\beta _l})}} }{\sqrt{2\pi (k-1-\frac{1}{\beta _l})}} d\tau . \end{aligned}$$

(2)

Here Laplace approximation is used in the third step, and Stirling’s formula is used in the fourth step. When $k\gg 0$, the integration part in Eq. (2) is free of k approximatively, which can be verified as follows:

$$\begin{aligned} \frac{d}{dk} \int ^{L_2}_{L_1 } \frac{\mathrm {e}^{\ -\frac{(\tau -k+\rho )^2}{2(k-\rho )}}}{\sqrt{2\pi (k-\rho )}} d\tau =&\frac{ \mathrm {e}^{-\frac{{\left( L_1 - k+\rho \right) }^2}{2 (k-\rho )}}}{2\, \sqrt{2 \pi (k-\rho )}}\left( 1+\frac{L_1 }{k-\rho }\right) -\frac{ \mathrm {e}^{-\frac{{\left( L_2 - k+\rho \right) }^2}{2 (k-s)}}}{2\, \sqrt{2 \pi (k-\rho )}}\left( 1+\frac{L_2 }{k-\rho }\right) , \end{aligned}$$

(3)

where $L_1={\delta \alpha _l( T_1 +1)^{- {\beta _l}}} p T^{ {\beta _l} }/\beta _l$, $L_2= {\delta \alpha _l } p T^{ {\beta _l} }/\beta _l$, and $\rho =1+ {1}/{\beta _l}$. This derivative is approximately equal to 0 for $k\gg 0$. Hence

$$\begin{aligned} P_L(k )&\propto \frac{\Gamma (k-\frac{1}{\beta _l})}{ \Gamma (k+1) }\approx \frac{1}{ k^{1+\frac{1}{\beta _l}}}\sqrt{\frac{k-\frac{1}{\beta _l}-1}{k}} \left( 1-\frac{\frac{1}{\beta _l}+1}{k}\right) ^{k-\frac{1}{\beta _l}-1} \mathrm {e}^{\frac{1}{\beta _l}+1} \approx \frac{1}{ k^{1+\frac{1}{\beta _l}}}. \end{aligned}$$

(4)

Stirling’s formula is used in the first approximation. The second approximation holds for $k\gg 0$. Hence $P_L(k )$ is approximately a power-law distribution with exponent $1+ {1}/{\beta _l}$. So we obtain that the in-degree distribution tail of the network generated in Step 2.b is a mixture of power-law distributions with exponents $1+ {1}/{\beta _1}$ and $1+ {1}/{\beta _2}$ respectively. Note that in Eq. (1), the condition $k\gg 0$ does not hold, so the power-law does not emerge in the head of the distribution.

Flexibility of the model

The provided model has the flexibility of fitting empirical data from different sciences. We have shown that the model can capture specific features of the empirical data PNAS 2007–2015, the papers of which mainly belong to biological sciences. Here we consider the data from physical sciences: the papers of Physical review E published during 2007–2016 (PRE 2007–2016). The data are gathered from the Web of Science. Authors are identified by their names on papers.

Synthetic data are generated through the provided model to capture specific features of PRE 2007–2016. The parameters of the synthetic data are listed in Table 8. Comparisons on statistic indicators and distributions are shown in Table 9 and Fig. 10 respectively.

Table 8 The parameters of the synthetic data

Full size table

Table 9 Typical statistic indicators of the data

Full size table

TARL model

Constructively suggested by a reviewer, the result of TARL model is compared with that of the proposed model. The pseudo code of TARL model in Reference Börner et al. (2004) is repeated as follows.

Initialization
- Generate m “papers” and n “authors” with randomly assigned “topics”;
- Randomly assign l “authors” to the “papers” within the same “topic”.
For time $t=1,2,\ldots ,T$ do:
- Add s new “authors” with randomly assigned “topics”;
- Deactivate the “authors” older than h;
- For each “topic” do:
Randomly partition the “authors” within the “topic” into groups with size l;
For each group do:
- Randomly read g “papers” from existing “papers” within the “topic”;
- “Select a time-slice form (1 to $t-$1) with probability given in aging-function” Börner et al. (2004);
- Generate a new “paper” and randomly cite k papers (published or cited in this time-slice) from the read “papers” and their references up to w-th level.

The generated connections are restricted to the “papers” and “authors” within the same “topic”. If no aging-function is given, then all “papers” can be “read” equally. We set the number of “topics” to be 4, and no aging-function (so no time-slice). We let $T=200$, $g=1$, $h=T$, $k=2$, $w=2$, $m=n=l=s=4$. The generated distribution of “collaborators”/“papers” per “author” and that of “citations” per “paper”/per “author” are shown in Fig. 10.

TARL model can generate a “coauthorship” network and a “citation” network, which grow simultaneously. The “citation” network is scale-free (caused by recursive linking), and has a positive clustering coefficient (caused by citing the “papers” within the same “topic”). Our model harmoniously express the citation factors considered in TARL model (i. e. topics, aging and recursive follow-up of citation references) by the connection mechanism induced through the influential zones of “papers”. The aging of papers is expressed by decreasing the sizes of influential zones over t. In TARL model, “papers” and “authors” are assigned specific “topics” directly. In our model, we use a continuous way: expressing nodes’ “topic” by nodes’ spacial coordinate. So the circles could be regarded as “topic spaces”. Note that it is not a real topic space, which is a high dimensional space representing textual contents of papers. In our model, “papers” can incompletely “copy” the references of the “papers” it cited, which is induced through the overlapping of influential zones.

TARL model neither consider the Matthew effect on the number of authors’ collaborators nor that on papers. In addition, the above instantiation assumes that the number of “papers” per “author” is a constant. Hence, the generated distribution of “papers” per “author” and that of “collaborators” per “author” (Fig. 10) have no power-law tails, which emerge in the corresponding distributions from real data (Figs. 4, 9). Our model expresses those Matthew effects geometrically: older leaders having a larger influential zone to obtain more “collaborators” and “papers”. Therefore, our model can reproduce those power-law tails.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xie, Z., Xie, Z., Li, M. et al. Modeling the coevolution between citations and coauthorship of scientific papers. Scientometrics 112, 483–507 (2017). https://doi.org/10.1007/s11192-017-2359-1

Download citation

Received: 15 December 2016
Published: 27 March 2017
Issue Date: July 2017
DOI: https://doi.org/10.1007/s11192-017-2359-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modeling the coevolution between citations and coauthorship of scientific papers

Abstract

Access this article

Similar content being viewed by others

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

How to design bibliometric research: an overview and a framework proposal

Authorship conflicts in academia: an international cross-discipline survey

References

Acknowledgements