, Volume 91, Issue 3, pp 827–842 | Cite as

Using complex networks concepts to assess approaches for citations in scientific papers

  • D. R. AmancioEmail author
  • M. G. V. Nunes
  • O. N. OliveiraJr.
  • L. da F. Costa


The number of citations received by authors in scientific journals has become a major parameter to assess individual researchers and the journals themselves through the impact factor. A fair assessment therefore requires that the criteria for selecting references in a given manuscript should be unbiased with regard to the authors or journals cited. In this paper, we assess approaches for citations considering two recommendations for authors to follow while preparing a manuscript: (i) consider similarity of contents with the topics investigated, lest related work should be reproduced or ignored; (ii) perform a systematic search over the network of citations including seminal or very related papers. We use formalisms of complex networks for two datasets of papers from the arXiv and the Web of Science repositories to show that neither of these two criteria is fulfilled in practice. By representing the texts as complex networks we estimated a similarity index between pieces of texts and found that the list of references did not contain the most similar papers in the dataset. This was quantified by calculating a consistency index, whose maximum value is one if the references in a given paper are the most similar in the dataset. For the areas of “complex networks” and “graphenes”, the consistency index was only 0.11–0.23 and 0.10–0.25, respectively. To simulate a systematic search in the citation network, we employed a traditional random walk search (i.e. diffusion) and a random walk whose probabilities of transition are proportional to the number of the ingoing edges of the neighbours. The frequency of visits to the nodes (papers) in the network had a very small correlation with either the actual list of references in the papers or with the number of downloads from the arXiv repository. Therefore, apparently the authors and users of the repository did not follow the criterion related to a systematic search over the network of citations. Based on these results, we propose an approach that we believe is fairer for evaluating and complementing citations of a given author, effectively leading to a virtual scientometry.


Complex networks Virtual scientometry Similarity network 



The authors are grateful to FAPESP (2010/00927-9) and CNPq (Brazil) for the financial support.


  1. Aires, R. V. X., Aluísio, S. M., Kuhn, D. C. S., Andreeta, M. L. B., & Oliveira, O. N., Jr. (2000). Combining multiple classifiers to improve part of speech tagging: A case study for Brazilian Portuguese. In Proceedings of the Brazilian AI symposium.Google Scholar
  2. Albert, R., & Barabási, A.-L. (2002). Statistical mechanics of complex networks. Reviews of Modern Physics, 74, 47–97.MathSciNetzbMATHCrossRefGoogle Scholar
  3. Amancio, D. R., Antiqueira, L., Pardo, T. A. S., Costa, L. F., Oliveira, O. N., Jr., & Nunes, M. G. V. (2008). Complex networks analysis of manual and machine translations. International Journal of Modern Physics C, 19(4), 583–598.zbMATHCrossRefGoogle Scholar
  4. Amancio, D. R., Nunes, M. G. V., Oliveira, O. N., Jr, Pardo, T. A. S., Antiqueira, L., & Costa, L. F. (2011). Using metrics from complex networks to evaluate machine translation. Physica A, 390, 131–142.CrossRefGoogle Scholar
  5. Antiqueira, L., Nunes, M. G. V., Oliveira, O. N, Jr., & Costa, L. F. (2005). Modeling texts as complex networks. In III STIL, Brazilian symposium in information and human language technology, São Leopoldo, RS, Brazil.Google Scholar
  6. Antiqueira, L., Nunes, M. G. V., Oliveira, O. N., Jr., & Costa, L. F. (2007). Strong correlations between text quality and complex networks features. Physica A, 373, 811–820.CrossRefGoogle Scholar
  7. Antiqueira, L., Oliveira, O. N., Jr., Costa, L. F., & Nunes, M. G. V. (2009). A complex network approach to text summarization. Information Sciences, 179(5), 584–599.zbMATHCrossRefGoogle Scholar
  8. Barabási, A.-L. (2009). Scale-free networks: A decade and beyond. Science, 24(325), 412–413.CrossRefGoogle Scholar
  9. Barbara, K. (2004). Procedures for performing systematic reviews. NICTA Technical Report 0400011T.1.Google Scholar
  10. Börner, K., Marus, J. T., & Goldstone, R. L. (2004). The simultaneous evolution of author and paper networks. PNAS, 101(Suppl. 1), 5266–5273.CrossRefGoogle Scholar
  11. Bornmann, L., & Daniel, H.-D. (2008). What do citation counts measure? a review of studies on citing behavior. Journal of Documentation, 64, 45–80.CrossRefGoogle Scholar
  12. Costa, L. F. (2004). What’s in a name? International Journal of Modern Physics C, 15, 371–379.CrossRefGoogle Scholar
  13. Costa, L. F. (2006). On the dynamics of the h-index in complex networks with coexisting communities. arXiv: physics/0609116.Google Scholar
  14. Cotta, C., & Merelo, J. J. (2005). The complex network of evolutionary computation authors: An initial study. arXiv: physics/0507196v2.Google Scholar
  15. Cronin, B. (1982). Norms and functions in citation—The view of journal editors and referees in psychology. Social Science Information Studies, 2, 65–78.CrossRefGoogle Scholar
  16. De Mey, M. (1982). The cognitive paradigm. Chicago: University of Chicago Press.CrossRefGoogle Scholar
  17. Ferrer, I., Cancho, R., & Solé, R. V. (2001). The small world of human language. Proceedings: Biological Sciences/The Royal Society, 268(1482), 5–2261.Google Scholar
  18. Ferrer, I., Cancho, R., Solé, R. V., & Köhler, R. (2004). Patterns in syntactic dependency networks. Physical Review E, 69(5), 1–8.Google Scholar
  19. Gingras, Y., Larivière, V., & Archambault, É. (2009). Literature citations in the internet era. Science, 323(5910), 36.Google Scholar
  20. Gross, P. L. K., & Gross, E. M. (1927). College libraries and chemical education. Science, 66, 385–389.CrossRefGoogle Scholar
  21. Hajra, K. B., & Sen, P. (2005). Aging in citation networks. Physica A, 346, 44–48.CrossRefGoogle Scholar
  22. Huang, S., Yu, Y., Xue, G.-R., Zhang, B.-Y., Chen, Z., & Ma, W.-Y. (2006). TSSP: Multi-features based reinforcement algorithm to find related papers. Web Intelligence and Agent Systems, 4(3), 271–287.Google Scholar
  23. King, J. (1987). A review of bibliometric and other science indicator and their role in research evaluation. Journal of Information Science, 13, 261–276.CrossRefGoogle Scholar
  24. Lancaster, F. W., Lee, S.-Y. K., & Diluvio, C. (1990). Does the place of publication influence citation behavior? Scientometrics, 19(3–4), 239–244.CrossRefGoogle Scholar
  25. Lawrence, S. (2001). Free online availability substantially increases a paper’s impact. Nature 411, 521.Google Scholar
  26. Lilien, G. L. (2008). The ombudsman: Who’s at Fawlt at Fawlty Towers? Commentaries on the citation dilemma. Interfaces, 38, 123–124.CrossRefGoogle Scholar
  27. Liu, Y., Niculescu-Mizil, A., & Gryc, W. (2009). Topic-link LDA: Joint models of topic and author community. In ICML ’09 proceedings of the 26th annual international conference on machine learning.Google Scholar
  28. MacRoberts, M. H., & MacRoberts, B. R. (1997). Citation content analysis of a botany journal. Journal of American Society for Information Science, 48, 5–274.Google Scholar
  29. Martins, W. S., Gonçalves, M. A., Laender, A. H. F., & Ziviani, N. (2010). Assessing the quality of scientific conferences based on bibliographic citations. Scientometrics, 83(1), 133–155.CrossRefGoogle Scholar
  30. May, K. O. (1967). Abuses of citation indexing. Science, 19(156), 890–892.CrossRefGoogle Scholar
  31. McClellan, J. E. (2003). Specialist control: The publications committee of the Academie Royal des Sciences. Transactions of the American Philosophical Society, 93, 1700–1793.CrossRefGoogle Scholar
  32. Meyn, S. P., & Tweedie, R. L. (2005). Markov chains and stochastic stability. Cambridge: Cambridge University Press.Google Scholar
  33. Neter, J., Kutner, M. H., Nachtsheim, C. J., & Wasserman, W. (1996). Applied linear statistical models (4th ed.). McGraw-Hill/Irwin.Google Scholar
  34. Newman, M. E. J. (2003). The structure and function of complex networks. Siam Review, 45(2), 167–256.MathSciNetzbMATHCrossRefGoogle Scholar
  35. Nunes, M. G. V., et al. (1996). O Processo de Construção de um Léxico para o Português do Brasil: Lições Aprendidas e Perspectivas. In II Encontro para o Processamento Computacional de Português Escrito e Falado (pp. 61–70).Google Scholar
  36. Patrick, D. (1985). A measure of standing of journals in stratified networks. Journal of the American Society for lnformation Science, 8(5–6), 341–363.Google Scholar
  37. Peters, H. P. F., & Van Raan, A. F. J. (1994). On determinants of citations scores—A case study in chemical engineering. Journal of the American Society for Information Science, 27, 292–306.Google Scholar
  38. Ratnaparki, A. (1997). A maximum entropy part-of-speech tagger. In Proceedings of the empirical methods in natural language processing conference, University of Pennsylvania.Google Scholar
  39. Redner, S. (1998). How popular is your paper? An empirical study of the citation distribution. The European Physical Journal B, 4, 131–134.CrossRefGoogle Scholar
  40. Shevchuk, R., & Snarskii, A. (2010). Studying the structure of complex networks by the transition to acyclic networks. arXiv: 1010.1864.Google Scholar
  41. Sigman, M., & Cecchi, G. A. (2002). Global organization of the Wordnet lexicon. Proceedings of the National Academy of Sciences of the United States of America, 99(3), 7–1742.CrossRefGoogle Scholar
  42. Tan, P. N., Steinbach, M., & Kumar, V. (2005). Introduction to data mining. Boston: Addison-Wesley.Google Scholar
  43. Thomas, J., et al. (2004). Integrating qualitative research with trials in systematic reviews: an example from public health. British Medical Journal, 328, 1010–1012.CrossRefGoogle Scholar
  44. Van Raan, A. F. J. (2005). For your citations only?. Scientometrics, 59, 467–472.CrossRefGoogle Scholar
  45. Velho, L. (1986). The meaning of citation in the context of a scientifically peripheral country. Scientometrics, 9(1–2), 71–89.CrossRefGoogle Scholar
  46. Vinkler, P. (1987). A quasi-quantitative citation model. Scientometrics, 12, 47–72.CrossRefGoogle Scholar
  47. Wang, M., Yu, G., & Yu, D. (2009). Effect of the age of papers on the preferential attachment in citation networks. Physica A: Statistical Mechanics and Its Applications, 388(19), 4273–4276.Google Scholar
  48. White, H. D. (2001). Authors as citers over time. Journal of the American Society for Information Science and Technology, 52, 87–108.CrossRefGoogle Scholar
  49. White, M. D., & Wang, P. L. (1997). A qualitative study of citing behavior: Contributions criteria, and metalevel documentation concerns. Library Quarterly, 67, 122–154.CrossRefGoogle Scholar
  50. Wright, M., & Armstrong, J. S. (2008). The ombudsman: Verification of citations: Fawlty towers of knowledge? Interfaces, 38, 125–139.CrossRefGoogle Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2012

Authors and Affiliations

  • D. R. Amancio
    • 1
    Email author
  • M. G. V. Nunes
    • 2
  • O. N. OliveiraJr.
    • 1
  • L. da F. Costa
    • 1
  1. 1.Institute of Physics of São CarlosUniversity of São PauloSão CarlosBrazil
  2. 2.Institute of Mathematics and Computer ScienceUniversity of São PauloSão CarlosBrazil

Personalised recommendations