Successful fish go with the flow: citation impact prediction based on centrality measures for term–document networks

Klimek, Peter; S. Jovanovic, Aleksandar; Egloff, Rainer; Schneider, Reto

doi:10.1007/s11192-016-1926-1

Successful fish go with the flow: citation impact prediction based on centrality measures for term–document networks

Published: 12 April 2016

Volume 107, pages 1265–1282, (2016)
Cite this article

Scientometrics Aims and scope Submit manuscript

Peter Klimek^1,2,
Aleksandar S. Jovanovic^2,3,
Rainer Egloff⁴ &
…
Reto Schneider⁴

1097 Accesses
35 Citations
Explore all metrics

Abstract

In this work we address the challenge of how to identify those documents from a given set of texts that are most likely to have substantial impact in the future. To this end we develop a purely content-based methodology in order to rank a given set of documents, for example abstracts of scientific publications, according to their potential to generate impact as measured by the numbers of citations that the articles will receive in the future. We construct a bipartite network consisting of documents that are linked to keywords and terms that they contain. We study recursive centrality measures for such networks that quantify how many different terms a document contains and how these terms are related to each other. From this we derive a novel indicator—document centrality—that is shown to be highly predictive of citation impact in six different case studies. We compare these results to findings from a multivariable regression model and from conventional network-based centrality measures to show that document centrality indeed offers a comparably high performance in identifying those articles that contain a large number of high-impact keywords. Our findings suggest that articles which conform to the mainstream within a given research field tend to receive higher numbers of citations than highly original and innovative articles.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identifying key papers within a journal via network centrality measures

Article 15 February 2016

Looking deeper into academic citations through network analysis: popularity, influence and impact

Article 10 July 2017

Network Analysis and Indicators

Notes

http://wikibon.org/blog/big-data-statistics/, retrieved 07/29/2015.
http://apps.webofknowledge.com/.

References

Altman, N. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3), 175–185.
MathSciNet Google Scholar
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
MATH Google Scholar
Bollen, J., Rodriguez, M. A., & Van De Sompel, H. (2006). Journal status. Scientometrics, 69(3), 669–687.
Article Google Scholar
Bollen, J., Van de Sompel, H., Hagberg, A., & Chute, R. (2009). A principal component analysis of 30 scientific impact measures. PLoS One, 4(6), e6022.
Bornmann, L., Schier, H., Marx, W., & Daniel, H. D. (2012). What factors determine citation counts of publications in chemistry besides their quality? Journal of Informetrics, 6(1), 11–18.
Article Google Scholar
Callaham, M., Wears, R. L., & Weber, E. (2002). Journal prestige, publication bias, and other characteristics associated with citation of published studies in peer-reviewed journals. Journal of the American Medical Association, 287(21), 2847–2850.
Article Google Scholar
Chang, J., & Blei, D. M. (2009). Relational topic models for document networks. In Proceedings of the 12th international conference on artificial intelligence and statistics (AISTATS) (Vol. 5, pp. 81–88).
Chen, P., Xie, H., Maslov, S., & Redner, S. (2007). Finding scientific gems with Google’s PageRank algorithm. Journal of Informetrics, 1(1), 8–15.
Article Google Scholar
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
MATH Google Scholar
Danell, R. (2011). Can the quality of scientific work be predicted using information on the author’s track record? Journal of the American Society for Information Science and Technology, 62(1), 50–60.
Article Google Scholar
Didegah, F., & Thelwall, M. (2013). Determinants of research citation impact in nanoscience and nanotechnology. Journal of the American Society for Information Science and Technology, 64(5), 1055–1064.
Article Google Scholar
Dietz, L., Bickel, S., & Scheffer, T. (2007). Unsupervised prediction of citation influences. In Proceedings of the 24th international conference on machine learning (pp. 233–240).
Dodds, P. S., Harris, K., Kloumann, I., Bliss, C., & Danforth, C. (2011). Temporal patterns of happiness and information in a global social network: Hedonometrics and twitter. PLoS ONE, 6(12), e26752.
Article Google Scholar
Eysenbach, G. (2011). Can tweets predict citations? Metrics of social impact based on twitter and correlation with traditional metrics of scientific impact. Journal of Medical Internet Research, 13(4), e123.
Article Google Scholar
Feng, G., Guo, J., Jing, B.-Y., & Hao, L. (2011). A Bayesian feature selection paradigm for text classification. Information Processing and Management, 48(2), 283–302.
Article Google Scholar
Fu, L. D., & Aliferis, C. F. (2010). Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature. Scientometrics, 85, 257–270.
Article Google Scholar
Garfield, E. (1979). Citation indexing: Its theory and application in science, technology, and humanities. Ney York: Wiley.
Google Scholar
Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2004). Bayesian data analysis. Boca Raton, FL: Chapman & Hall/CRC.
MATH Google Scholar
Hidalgo, C. A., & Hausmann, R. (2009). Building blocks of economic complexity. Proceedings of the National Academy of Sciences, 106(26), 10570–10575.
Article Google Scholar
Hofmann, T. (2001). Unsupervised learning by probabilistic semantic analysis. Machine Learning, 42, 177–196.
Article MATH Google Scholar
Jian, L., Cai, Z., Wang, D., & Zhang, H. (2014). Bayesian citation-KNN with distance weighting. International Journal of Machine Learning and Cybernetics, 5(2), 193–199.
Article Google Scholar
Jovanovic, A. S., & Renn, O. (2013). Search for the ‘European way’ of taming the risks of new technologies: The EU research project iNTeg-Risk. Journal of Risk Research, 16(3–4), 271–274.
Article Google Scholar
Kwok, J.T.-Y. (1998). Automated text categorization using support vector machine. In Proceedings of the international conference on neural information processing (ICONIP) (pp. 347–351).
Larsen, P. O., & von Ins, M. (2010). The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index. Scientometrics, 84(3), 575–603.
Article Google Scholar
Leydesdorff, L. (2007). Betweenness centrality as an indicator of the interdisciplinarity of scientific journals. Journal of the American Society for Information Science and Techology, 58(9), 1303–1319.
Article Google Scholar
Leydesdorff, L. (2009). How are new citation-based journal indicators adding to the bibliometric toolbox? Journal of the American Society for Information Science and Technology, 60(7), 1327–1336.
Article Google Scholar
Leydesdorff, L., & Bornmann, L. (2011). Integrated impact indicators (I3) compared with impact factors (Ifs): An alternative design with policy implications. Journal of the American Society for Information Science and Technology, 62(7), 1370–1381.
Article Google Scholar
Liu, Y., Niculescu-Mizil, A., & Gryc, W. (2009). Topic-link LDA: Joint models of topic and author community. In Proceedings of the 26th annual international conference on machine learning (ICML) (pp. 665–72).
Liu, L. G., Xuan, Z. G., Dang, Z. Y., Guo, Q., & Wang, Z. T. (2007). Weighted network properties of Chines nature science basic research. Physica A, 377(1), 302–314.
Article Google Scholar
MacRoberts, M. H., & MacRoberts, B. R. (1996). Problems of citation analysis. Scientometrics, 36(3), 435–444.
Article Google Scholar
Meyer, D., Leisch, F., & Hornik, K. (2003). The support vector machine under test. Neurocomputing, 55(1), 169–186.
Article Google Scholar
Moohebat, M., Raj, R. G., Kareem, S. B. A., & Thorleuchter, D. (2015). Identifying ISI-indexed articles by their lexical usage: A text analysis approach. Journal of the Association for Information Science and Technology, 66(3), 501–511.
Article Google Scholar
Nallapati, R., Ahmed, A., Xing, E., & Cohen, W. W. (2008). Joint latent topic models for text and citations. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (KDD) (pp. 542–550).
Newman, M. E. J. (2004). Coauthorship networks and patterns of scientific collaboration. Proceedings of the National Academy of Sciences, 101(1), 5200–5205.
Article Google Scholar
Newman, M. E. J. (2009). The first-mover advantage in scientific publication. Europhysics Letters, 86(6), 68001.
Article Google Scholar
Newman, M. E. J. (2010). Networks: An introduction. Oxford: Oxford University Press.
Book MATH Google Scholar
Percino, G., Klimek, P., & Thurner, S. (2014). Instrumentational complexity of music genres and why simplicity sells. PLoS ONE, 9, e115255.
Article Google Scholar
Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137.
Article Google Scholar
Radicchi, F., & Castellano, C. (2012). Testing the fairness of citation indicators for comparison across scientific domains: The case of fractional citation counts. Journal of Informetrics, 6(1), 121–130.
Article Google Scholar
Radicchi, F., Fortunato, S., Markines, B., & Vespignani, A. (2009). Diffusion of scientific credits and the ranking of scientists. Physical Review E, 056103.
Sayyadi, H., & Getoor, L. (2009). FutureRank: Ranking scientific articles by predicting their future PageRank. In The 9th SIAM international conference on data mining.
Stewart, J. A. (1983). Achievement and ascriptive processes in the recognition of scientific articles. Social Forces, 62(1), 166–189.
Article Google Scholar
Van Dalen, H. P., & Henkens, K. (2001). What makes a scientific article influential? The case of demographers. Scientometrics, 50(3), 455–482.
Article Google Scholar
Vieira, E. S., & Gomes, J. A. N. F. (2010). Citation to scientific articles: Its distribution and dependence on the article features. Journal of Informetrics, 4(1), 1–13.
Article Google Scholar
Walker, D., Xie, H., Yan, K. K., & Maslov, S. (2007). Ranking scientific publications using a simple model of network traffic. Journal of Statistical Mechanics, P06010. doi:10.1088/1742-5468/2007/06/P06010.
Wang, D., Song, C., & Barabási, A.-L. (2013). Quantifying long-term scientific impact. Science, 342(6154), 127–132.
Article Google Scholar
Yan, E., Ding, Y., & Sugimoto, C. R. (2011). P-Rank: An indicator measuring prestige in heterogeneous scholarly networks. Journal of the American Society for Information Science and Technology, 62(3), 467–477.
Google Scholar
Yu, X., Gu, Q., Zhou, M., & Han, J. (2012). Citation prediction in heterogeneous bibliographic networks. In SDM (Vol. 12, pp. 1119–1130).
Yu, T., Yu, G., Li, P.-Y., & Wang, L. (2014). Citation impact prediction for scientific papers using stepwise regression analysis. Scientometrics, 101, 1233–1252.
Article Google Scholar

Download references

Acknowledgments

PK acknowledges financial support from the European Commission, EU FP7 Project MULTIPLEX, No. 317532. We thank the anonymous referees for providing extremely helpful comments and suggestions.

Author information

Authors and Affiliations

Section for Science of Complex Systems, Medical University of Vienna, Spitalgasse 23, 1090, Vienna, Austria
Peter Klimek
Steinbeis Advanced Risk Technologies, Willi-Bleicher-Straße 19, 70174, Stuttgart, Germany
Peter Klimek & Aleksandar S. Jovanovic
EU-VRi, Willi-Bleicher-Straße 19, 70174, Stuttgart, Germany
Aleksandar S. Jovanovic
Swiss Reinsurance Company Ltd, Mythenquai 50/60, 8022, Zurich, Switzerland
Rainer Egloff & Reto Schneider

Authors

Peter Klimek
View author publications
You can also search for this author in PubMed Google Scholar
Aleksandar S. Jovanovic
View author publications
You can also search for this author in PubMed Google Scholar
Rainer Egloff
View author publications
You can also search for this author in PubMed Google Scholar
Reto Schneider
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aleksandar S. Jovanovic.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (TXT 2 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Klimek, P., S. Jovanovic, A., Egloff, R. et al. Successful fish go with the flow: citation impact prediction based on centrality measures for term–document networks. Scientometrics 107, 1265–1282 (2016). https://doi.org/10.1007/s11192-016-1926-1

Download citation

Received: 01 October 2015
Published: 12 April 2016
Issue Date: June 2016
DOI: https://doi.org/10.1007/s11192-016-1926-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Successful fish go with the flow: citation impact prediction based on centrality measures for term–document networks

Abstract

Access this article

Similar content being viewed by others

Identifying key papers within a journal via network centrality measures

Looking deeper into academic citations through network analysis: popularity, influence and impact

Network Analysis and Indicators

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (TXT 2 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Successful fish go with the flow: citation impact prediction based on centrality measures for term–document networks

Abstract

Access this article

Similar content being viewed by others

Identifying key papers within a journal via network centrality measures

Looking deeper into academic citations through network analysis: popularity, influence and impact

Network Analysis and Indicators

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (TXT 2 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation