The distribution of the number of academic publications against citation count for papers published in the same year is remarkably similar from year to year. We characterise the shape of such distributions by a ‘width’, \(\sigma ^2\), associated with fitting a log-normal to each distribution, and find the width to be approximately constant for publications published in different years. This similarity is not surprising, after all, why would papers in a given year be cited more than another year? Nevertheless, we show that simple citation models fail to capture this behaviour. We then provide a simple three parameter citation network model which can reproduce the correct width over time. We use the citation network of papers from the hep-th section of arXiv to test our model. Our final model reproduces the data’s observed ‘width’ when around 20 % of the citations in the model are made to recently published papers in the entire network (‘global information’). The remaining 80 % of citations are made using the references from these papers’ bibliographies (‘local searches’). We note that this is consistent with other studies, though our motivation to achieve the above distribution with time is very different. Finally, we find that, in the citation network model, varying the number of papers referenced by a new publication is important as it alters the parameters in the model which are fitted to the data. This is not addressed in current models and needs further work.
KeywordsComplex networks Directed acyclic graphs Bibliometrics Citation networks
Mathematics Subject Classification91D30
We would like to thank James Gollings and James Clough for allowing us to use their transitive reduction code from which we created our own declustering code. We would like to thank Tamar Loach for sharing her results on related projects and M. V. Simkin for discussions about his work.
- Clough, J. R., & Evans, T. S. (2014). What is the dimension of citation space? arXiv:1408.1274.
- Clough, J. R., Gollings, J., Loach, T. V., & Evans, T. S. (2014). Transitive reduction of citation networks. Journal of Complex Networks. arXiv:1310.8224.
- Goldberg, S. R. (2013). Modelling citation networks. figshare. doi: 10.6084/m9.figshare.1134542.
- Goldberg, S. R., & Evans, T. S. (2012). Universality of performance indicators based on citation and reference counts. figshare. doi: 10.6084/m9.figshare.1134544. Retrieved 12 Aug 2014.
- KDD Cup. (2003). Network mining and usage log analysis. http://www.cs.cornell.edu/projects/kddcup/datasets.html. Accessed 1 Oct 2012.
- Stringer, M. J., Sales-Pardo, M., & Amaral, L. A. N. (2010). Statistical validation of a global model for the distribution of the ultimate number of citations accrued by papers published in a scientific journal. Journal of the American Society for Information Science and Technology, 61, 1377–1385.CrossRefGoogle Scholar
- Vazquez, A. (2000). Knowing a network by walking on it: Emergence of scaling. arXiv:cond-mat/0006132.
- Vázquez, A. (2001). Statistics of citation networks. arXiv:cond-mat/0105031.