Using complex networks concepts to assess approaches for citations in scientific papers

Abstract

The number of citations received by authors in scientific journals has become a major parameter to assess individual researchers and the journals themselves through the impact factor. A fair assessment therefore requires that the criteria for selecting references in a given manuscript should be unbiased with regard to the authors or journals cited. In this paper, we assess approaches for citations considering two recommendations for authors to follow while preparing a manuscript: (i) consider similarity of contents with the topics investigated, lest related work should be reproduced or ignored; (ii) perform a systematic search over the network of citations including seminal or very related papers. We use formalisms of complex networks for two datasets of papers from the arXiv and the Web of Science repositories to show that neither of these two criteria is fulfilled in practice. By representing the texts as complex networks we estimated a similarity index between pieces of texts and found that the list of references did not contain the most similar papers in the dataset. This was quantified by calculating a consistency index, whose maximum value is one if the references in a given paper are the most similar in the dataset. For the areas of “complex networks” and “graphenes”, the consistency index was only 0.11–0.23 and 0.10–0.25, respectively. To simulate a systematic search in the citation network, we employed a traditional random walk search (i.e. diffusion) and a random walk whose probabilities of transition are proportional to the number of the ingoing edges of the neighbours. The frequency of visits to the nodes (papers) in the network had a very small correlation with either the actual list of references in the papers or with the number of downloads from the arXiv repository. Therefore, apparently the authors and users of the repository did not follow the criterion related to a systematic search over the network of citations. Based on these results, we propose an approach that we believe is fairer for evaluating and complementing citations of a given author, effectively leading to a virtual scientometry.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Notes

  1. 1.

    http://arxiv.org.

  2. 2.

    http://apps.isiknowledge.com/.

  3. 3.

    http://www.cytoscape.org.

  4. 4.

    http://www.citebase.org

References

  1. Aires, R. V. X., Aluísio, S. M., Kuhn, D. C. S., Andreeta, M. L. B., & Oliveira, O. N., Jr. (2000). Combining multiple classifiers to improve part of speech tagging: A case study for Brazilian Portuguese. In Proceedings of the Brazilian AI symposium.

  2. Albert, R., & Barabási, A.-L. (2002). Statistical mechanics of complex networks. Reviews of Modern Physics, 74, 47–97.

    MathSciNet  MATH  Article  Google Scholar 

  3. Amancio, D. R., Antiqueira, L., Pardo, T. A. S., Costa, L. F., Oliveira, O. N., Jr., & Nunes, M. G. V. (2008). Complex networks analysis of manual and machine translations. International Journal of Modern Physics C, 19(4), 583–598.

    MATH  Article  Google Scholar 

  4. Amancio, D. R., Nunes, M. G. V., Oliveira, O. N., Jr, Pardo, T. A. S., Antiqueira, L., & Costa, L. F. (2011). Using metrics from complex networks to evaluate machine translation. Physica A, 390, 131–142.

    Article  Google Scholar 

  5. Antiqueira, L., Nunes, M. G. V., Oliveira, O. N, Jr., & Costa, L. F. (2005). Modeling texts as complex networks. In III STIL, Brazilian symposium in information and human language technology, São Leopoldo, RS, Brazil.

  6. Antiqueira, L., Nunes, M. G. V., Oliveira, O. N., Jr., & Costa, L. F. (2007). Strong correlations between text quality and complex networks features. Physica A, 373, 811–820.

    Article  Google Scholar 

  7. Antiqueira, L., Oliveira, O. N., Jr., Costa, L. F., & Nunes, M. G. V. (2009). A complex network approach to text summarization. Information Sciences, 179(5), 584–599.

    MATH  Article  Google Scholar 

  8. Barabási, A.-L. (2009). Scale-free networks: A decade and beyond. Science, 24(325), 412–413.

    Article  Google Scholar 

  9. Barbara, K. (2004). Procedures for performing systematic reviews. NICTA Technical Report 0400011T.1.

  10. Börner, K., Marus, J. T., & Goldstone, R. L. (2004). The simultaneous evolution of author and paper networks. PNAS, 101(Suppl. 1), 5266–5273.

    Article  Google Scholar 

  11. Bornmann, L., & Daniel, H.-D. (2008). What do citation counts measure? a review of studies on citing behavior. Journal of Documentation, 64, 45–80.

    Article  Google Scholar 

  12. Costa, L. F. (2004). What’s in a name? International Journal of Modern Physics C, 15, 371–379.

    Article  Google Scholar 

  13. Costa, L. F. (2006). On the dynamics of the h-index in complex networks with coexisting communities. arXiv: physics/0609116.

  14. Cotta, C., & Merelo, J. J. (2005). The complex network of evolutionary computation authors: An initial study. arXiv: physics/0507196v2.

  15. Cronin, B. (1982). Norms and functions in citation—The view of journal editors and referees in psychology. Social Science Information Studies, 2, 65–78.

    Article  Google Scholar 

  16. De Mey, M. (1982). The cognitive paradigm. Chicago: University of Chicago Press.

    Google Scholar 

  17. Ferrer, I., Cancho, R., & Solé, R. V. (2001). The small world of human language. Proceedings: Biological Sciences/The Royal Society, 268(1482), 5–2261.

    Google Scholar 

  18. Ferrer, I., Cancho, R., Solé, R. V., & Köhler, R. (2004). Patterns in syntactic dependency networks. Physical Review E, 69(5), 1–8.

    Google Scholar 

  19. Gingras, Y., Larivière, V., & Archambault, É. (2009). Literature citations in the internet era. Science, 323(5910), 36.

  20. Gross, P. L. K., & Gross, E. M. (1927). College libraries and chemical education. Science, 66, 385–389.

    Article  Google Scholar 

  21. Hajra, K. B., & Sen, P. (2005). Aging in citation networks. Physica A, 346, 44–48.

    Article  Google Scholar 

  22. Huang, S., Yu, Y., Xue, G.-R., Zhang, B.-Y., Chen, Z., & Ma, W.-Y. (2006). TSSP: Multi-features based reinforcement algorithm to find related papers. Web Intelligence and Agent Systems, 4(3), 271–287.

    Google Scholar 

  23. King, J. (1987). A review of bibliometric and other science indicator and their role in research evaluation. Journal of Information Science, 13, 261–276.

    Article  Google Scholar 

  24. Lancaster, F. W., Lee, S.-Y. K., & Diluvio, C. (1990). Does the place of publication influence citation behavior? Scientometrics, 19(3–4), 239–244.

    Article  Google Scholar 

  25. Lawrence, S. (2001). Free online availability substantially increases a paper’s impact. Nature 411, 521.

    Google Scholar 

  26. Lilien, G. L. (2008). The ombudsman: Who’s at Fawlt at Fawlty Towers? Commentaries on the citation dilemma. Interfaces, 38, 123–124.

    Article  Google Scholar 

  27. Liu, Y., Niculescu-Mizil, A., & Gryc, W. (2009). Topic-link LDA: Joint models of topic and author community. In ICML ’09 proceedings of the 26th annual international conference on machine learning.

  28. MacRoberts, M. H., & MacRoberts, B. R. (1997). Citation content analysis of a botany journal. Journal of American Society for Information Science, 48, 5–274.

    Google Scholar 

  29. Martins, W. S., Gonçalves, M. A., Laender, A. H. F., & Ziviani, N. (2010). Assessing the quality of scientific conferences based on bibliographic citations. Scientometrics, 83(1), 133–155.

    Article  Google Scholar 

  30. May, K. O. (1967). Abuses of citation indexing. Science, 19(156), 890–892.

    Article  Google Scholar 

  31. McClellan, J. E. (2003). Specialist control: The publications committee of the Academie Royal des Sciences. Transactions of the American Philosophical Society, 93, 1700–1793.

    Article  Google Scholar 

  32. Meyn, S. P., & Tweedie, R. L. (2005). Markov chains and stochastic stability. Cambridge: Cambridge University Press.

    Google Scholar 

  33. Neter, J., Kutner, M. H., Nachtsheim, C. J., & Wasserman, W. (1996). Applied linear statistical models (4th ed.). McGraw-Hill/Irwin.

  34. Newman, M. E. J. (2003). The structure and function of complex networks. Siam Review, 45(2), 167–256.

    MathSciNet  MATH  Article  Google Scholar 

  35. Nunes, M. G. V., et al. (1996). O Processo de Construção de um Léxico para o Português do Brasil: Lições Aprendidas e Perspectivas. In II Encontro para o Processamento Computacional de Português Escrito e Falado (pp. 61–70).

  36. Patrick, D. (1985). A measure of standing of journals in stratified networks. Journal of the American Society for lnformation Science, 8(5–6), 341–363.

    Google Scholar 

  37. Peters, H. P. F., & Van Raan, A. F. J. (1994). On determinants of citations scores—A case study in chemical engineering. Journal of the American Society for Information Science, 27, 292–306.

    Google Scholar 

  38. Ratnaparki, A. (1997). A maximum entropy part-of-speech tagger. In Proceedings of the empirical methods in natural language processing conference, University of Pennsylvania.

  39. Redner, S. (1998). How popular is your paper? An empirical study of the citation distribution. The European Physical Journal B, 4, 131–134.

    Article  Google Scholar 

  40. Shevchuk, R., & Snarskii, A. (2010). Studying the structure of complex networks by the transition to acyclic networks. arXiv: 1010.1864.

  41. Sigman, M., & Cecchi, G. A. (2002). Global organization of the Wordnet lexicon. Proceedings of the National Academy of Sciences of the United States of America, 99(3), 7–1742.

    Article  Google Scholar 

  42. Tan, P. N., Steinbach, M., & Kumar, V. (2005). Introduction to data mining. Boston: Addison-Wesley.

  43. Thomas, J., et al. (2004). Integrating qualitative research with trials in systematic reviews: an example from public health. British Medical Journal, 328, 1010–1012.

    Article  Google Scholar 

  44. Van Raan, A. F. J. (2005). For your citations only?. Scientometrics, 59, 467–472.

    Article  Google Scholar 

  45. Velho, L. (1986). The meaning of citation in the context of a scientifically peripheral country. Scientometrics, 9(1–2), 71–89.

    Article  Google Scholar 

  46. Vinkler, P. (1987). A quasi-quantitative citation model. Scientometrics, 12, 47–72.

    Article  Google Scholar 

  47. Wang, M., Yu, G., & Yu, D. (2009). Effect of the age of papers on the preferential attachment in citation networks. Physica A: Statistical Mechanics and Its Applications, 388(19), 4273–4276.

    Google Scholar 

  48. White, H. D. (2001). Authors as citers over time. Journal of the American Society for Information Science and Technology, 52, 87–108.

    Article  Google Scholar 

  49. White, M. D., & Wang, P. L. (1997). A qualitative study of citing behavior: Contributions criteria, and metalevel documentation concerns. Library Quarterly, 67, 122–154.

    Article  Google Scholar 

  50. Wright, M., & Armstrong, J. S. (2008). The ombudsman: Verification of citations: Fawlty towers of knowledge? Interfaces, 38, 125–139.

    Article  Google Scholar 

Download references

Acknowledgments

The authors are grateful to FAPESP (2010/00927-9) and CNPq (Brazil) for the financial support.

Author information

Affiliations

Authors

Corresponding author

Correspondence to D. R. Amancio.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Amancio, D.R., Nunes, M.G.V., Oliveira, O.N. et al. Using complex networks concepts to assess approaches for citations in scientific papers. Scientometrics 91, 827–842 (2012). https://doi.org/10.1007/s11192-012-0630-z

Download citation

Keywords

  • Complex networks
  • Virtual scientometry
  • Similarity network