The number of citations received by authors in scientific journals has become a major parameter to assess individual researchers and the journals themselves through the impact factor. A fair assessment therefore requires that the criteria for selecting references in a given manuscript should be unbiased with regard to the authors or journals cited. In this paper, we assess approaches for citations considering two recommendations for authors to follow while preparing a manuscript: (i) consider similarity of contents with the topics investigated, lest related work should be reproduced or ignored; (ii) perform a systematic search over the network of citations including seminal or very related papers. We use formalisms of complex networks for two datasets of papers from the arXiv and the Web of Science repositories to show that neither of these two criteria is fulfilled in practice. By representing the texts as complex networks we estimated a similarity index between pieces of texts and found that the list of references did not contain the most similar papers in the dataset. This was quantified by calculating a consistency index, whose maximum value is one if the references in a given paper are the most similar in the dataset. For the areas of “complex networks” and “graphenes”, the consistency index was only 0.11–0.23 and 0.10–0.25, respectively. To simulate a systematic search in the citation network, we employed a traditional random walk search (i.e. diffusion) and a random walk whose probabilities of transition are proportional to the number of the ingoing edges of the neighbours. The frequency of visits to the nodes (papers) in the network had a very small correlation with either the actual list of references in the papers or with the number of downloads from the arXiv repository. Therefore, apparently the authors and users of the repository did not follow the criterion related to a systematic search over the network of citations. Based on these results, we propose an approach that we believe is fairer for evaluating and complementing citations of a given author, effectively leading to a virtual scientometry.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Aires, R. V. X., Aluísio, S. M., Kuhn, D. C. S., Andreeta, M. L. B., & Oliveira, O. N., Jr. (2000). Combining multiple classifiers to improve part of speech tagging: A case study for Brazilian Portuguese. In Proceedings of the Brazilian AI symposium.
Albert, R., & Barabási, A.-L. (2002). Statistical mechanics of complex networks. Reviews of Modern Physics, 74, 47–97.
Amancio, D. R., Antiqueira, L., Pardo, T. A. S., Costa, L. F., Oliveira, O. N., Jr., & Nunes, M. G. V. (2008). Complex networks analysis of manual and machine translations. International Journal of Modern Physics C, 19(4), 583–598.
Amancio, D. R., Nunes, M. G. V., Oliveira, O. N., Jr, Pardo, T. A. S., Antiqueira, L., & Costa, L. F. (2011). Using metrics from complex networks to evaluate machine translation. Physica A, 390, 131–142.
Antiqueira, L., Nunes, M. G. V., Oliveira, O. N, Jr., & Costa, L. F. (2005). Modeling texts as complex networks. In III STIL, Brazilian symposium in information and human language technology, São Leopoldo, RS, Brazil.
Antiqueira, L., Nunes, M. G. V., Oliveira, O. N., Jr., & Costa, L. F. (2007). Strong correlations between text quality and complex networks features. Physica A, 373, 811–820.
Antiqueira, L., Oliveira, O. N., Jr., Costa, L. F., & Nunes, M. G. V. (2009). A complex network approach to text summarization. Information Sciences, 179(5), 584–599.
Barabási, A.-L. (2009). Scale-free networks: A decade and beyond. Science, 24(325), 412–413.
Barbara, K. (2004). Procedures for performing systematic reviews. NICTA Technical Report 0400011T.1.
Börner, K., Marus, J. T., & Goldstone, R. L. (2004). The simultaneous evolution of author and paper networks. PNAS, 101(Suppl. 1), 5266–5273.
Bornmann, L., & Daniel, H.-D. (2008). What do citation counts measure? a review of studies on citing behavior. Journal of Documentation, 64, 45–80.
Costa, L. F. (2004). What’s in a name? International Journal of Modern Physics C, 15, 371–379.
Costa, L. F. (2006). On the dynamics of the h-index in complex networks with coexisting communities. arXiv: physics/0609116.
Cotta, C., & Merelo, J. J. (2005). The complex network of evolutionary computation authors: An initial study. arXiv: physics/0507196v2.
Cronin, B. (1982). Norms and functions in citation—The view of journal editors and referees in psychology. Social Science Information Studies, 2, 65–78.
De Mey, M. (1982). The cognitive paradigm. Chicago: University of Chicago Press.
Ferrer, I., Cancho, R., & Solé, R. V. (2001). The small world of human language. Proceedings: Biological Sciences/The Royal Society, 268(1482), 5–2261.
Ferrer, I., Cancho, R., Solé, R. V., & Köhler, R. (2004). Patterns in syntactic dependency networks. Physical Review E, 69(5), 1–8.
Gingras, Y., Larivière, V., & Archambault, É. (2009). Literature citations in the internet era. Science, 323(5910), 36.
Gross, P. L. K., & Gross, E. M. (1927). College libraries and chemical education. Science, 66, 385–389.
Hajra, K. B., & Sen, P. (2005). Aging in citation networks. Physica A, 346, 44–48.
Huang, S., Yu, Y., Xue, G.-R., Zhang, B.-Y., Chen, Z., & Ma, W.-Y. (2006). TSSP: Multi-features based reinforcement algorithm to find related papers. Web Intelligence and Agent Systems, 4(3), 271–287.
King, J. (1987). A review of bibliometric and other science indicator and their role in research evaluation. Journal of Information Science, 13, 261–276.
Lancaster, F. W., Lee, S.-Y. K., & Diluvio, C. (1990). Does the place of publication influence citation behavior? Scientometrics, 19(3–4), 239–244.
Lawrence, S. (2001). Free online availability substantially increases a paper’s impact. Nature 411, 521.
Lilien, G. L. (2008). The ombudsman: Who’s at Fawlt at Fawlty Towers? Commentaries on the citation dilemma. Interfaces, 38, 123–124.
Liu, Y., Niculescu-Mizil, A., & Gryc, W. (2009). Topic-link LDA: Joint models of topic and author community. In ICML ’09 proceedings of the 26th annual international conference on machine learning.
MacRoberts, M. H., & MacRoberts, B. R. (1997). Citation content analysis of a botany journal. Journal of American Society for Information Science, 48, 5–274.
Martins, W. S., Gonçalves, M. A., Laender, A. H. F., & Ziviani, N. (2010). Assessing the quality of scientific conferences based on bibliographic citations. Scientometrics, 83(1), 133–155.
May, K. O. (1967). Abuses of citation indexing. Science, 19(156), 890–892.
McClellan, J. E. (2003). Specialist control: The publications committee of the Academie Royal des Sciences. Transactions of the American Philosophical Society, 93, 1700–1793.
Meyn, S. P., & Tweedie, R. L. (2005). Markov chains and stochastic stability. Cambridge: Cambridge University Press.
Neter, J., Kutner, M. H., Nachtsheim, C. J., & Wasserman, W. (1996). Applied linear statistical models (4th ed.). McGraw-Hill/Irwin.
Newman, M. E. J. (2003). The structure and function of complex networks. Siam Review, 45(2), 167–256.
Nunes, M. G. V., et al. (1996). O Processo de Construção de um Léxico para o Português do Brasil: Lições Aprendidas e Perspectivas. In II Encontro para o Processamento Computacional de Português Escrito e Falado (pp. 61–70).
Patrick, D. (1985). A measure of standing of journals in stratified networks. Journal of the American Society for lnformation Science, 8(5–6), 341–363.
Peters, H. P. F., & Van Raan, A. F. J. (1994). On determinants of citations scores—A case study in chemical engineering. Journal of the American Society for Information Science, 27, 292–306.
Ratnaparki, A. (1997). A maximum entropy part-of-speech tagger. In Proceedings of the empirical methods in natural language processing conference, University of Pennsylvania.
Redner, S. (1998). How popular is your paper? An empirical study of the citation distribution. The European Physical Journal B, 4, 131–134.
Shevchuk, R., & Snarskii, A. (2010). Studying the structure of complex networks by the transition to acyclic networks. arXiv: 1010.1864.
Sigman, M., & Cecchi, G. A. (2002). Global organization of the Wordnet lexicon. Proceedings of the National Academy of Sciences of the United States of America, 99(3), 7–1742.
Tan, P. N., Steinbach, M., & Kumar, V. (2005). Introduction to data mining. Boston: Addison-Wesley.
Thomas, J., et al. (2004). Integrating qualitative research with trials in systematic reviews: an example from public health. British Medical Journal, 328, 1010–1012.
Van Raan, A. F. J. (2005). For your citations only?. Scientometrics, 59, 467–472.
Velho, L. (1986). The meaning of citation in the context of a scientifically peripheral country. Scientometrics, 9(1–2), 71–89.
Vinkler, P. (1987). A quasi-quantitative citation model. Scientometrics, 12, 47–72.
Wang, M., Yu, G., & Yu, D. (2009). Effect of the age of papers on the preferential attachment in citation networks. Physica A: Statistical Mechanics and Its Applications, 388(19), 4273–4276.
White, H. D. (2001). Authors as citers over time. Journal of the American Society for Information Science and Technology, 52, 87–108.
White, M. D., & Wang, P. L. (1997). A qualitative study of citing behavior: Contributions criteria, and metalevel documentation concerns. Library Quarterly, 67, 122–154.
Wright, M., & Armstrong, J. S. (2008). The ombudsman: Verification of citations: Fawlty towers of knowledge? Interfaces, 38, 125–139.
The authors are grateful to FAPESP (2010/00927-9) and CNPq (Brazil) for the financial support.
About this article
Cite this article
Amancio, D.R., Nunes, M.G.V., Oliveira, O.N. et al. Using complex networks concepts to assess approaches for citations in scientific papers. Scientometrics 91, 827–842 (2012). https://doi.org/10.1007/s11192-012-0630-z
- Complex networks
- Virtual scientometry
- Similarity network