Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature

Fu, Lawrence D.; Aliferis, Constantin F.

doi:10.1007/s11192-010-0160-5

Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature

Published: 03 February 2010

Volume 85, pages 257–270, (2010)
Cite this article

Scientometrics Aims and scope Submit manuscript

Lawrence D. Fu¹ &
Constantin F. Aliferis¹

1768 Accesses
65 Citations
Explore all metrics

Abstract

The most popular method for judging the impact of biomedical articles is citation count which is the number of citations received. The most significant limitation of citation count is that it cannot evaluate articles at the time of publication since citations accumulate over time. This work presents computer models that accurately predict citation counts of biomedical publications within a deep horizon of 10 years using only predictive information available at publication time. Our experiments show that it is indeed feasible to accurately predict future citation counts with a mixture of content-based and bibliometric features using machine learning methods. The models pave the way for practical prediction of the long-term impact of publication, and their statistical analysis provides greater insight into citation behavior.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Literature reviews as independent studies: guidelines for academic practice

Article Open access 14 October 2022

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Predicting academic success in higher education: literature review and best practices

Article Open access 10 February 2020

References

Aliferis, C., Statnikov, A., et al. (2009). Local causal and markov blanket induction for causal discovery and feature selection for classification. JMLR (accepted).
Aliferis, C., Statnikov, A., et al. (2006). Challenges in the analysis of mass-throughput data. Cancer Informatics, 2, 133–162.
Google Scholar
Aphinyanaphongs, Y., Tsamardinos, I., et al. (2005). Text categorization models for high-quality article retrieval in internal medicine. JAMIA, 12(2), 207–216.
Google Scholar
Burges, C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.
Article Google Scholar
Feitelson, D., & Yovel, U. (2004). Predictive ranking of computer scientists using CiteSeer data. Journal of Documentation, 60(1), 44–61.
Article Google Scholar
Fu, L., & Aliferis, C. (2008). Models for predicting and explaining citation count of biomedical articles. AMIA symposium.
Garfield, E. (1962). Can citation indexing be automated? Essays of an Information Scientist, 1, 84–90.
Google Scholar
Getoor, L. (2003). Link mining: A new data mining challenge. SIGKDD Explorations, 5(1), 84–89.
Article MathSciNet Google Scholar
Gross, P., & Gross, E. (1927). College libraries and chemical education. Science, 66, 385–389.
Article Google Scholar
Leopold, E., & Kindermann, J. (2002). Text categorization with support vector machines. Machine Learning, 46, 423–444.
Article MATH Google Scholar
Lokker, C., McKibbon, K. A., et al. (2008). Prediction of citation counts for clinical articles at two years using data available within three weeks of publication: Retrospective cohort study. BMJ. http://www.bmj.com/cgi/content/abstract/bmj.39482.526713.BEv526711.
MacRoberts, M., & MacRoberts, B. (1996). Problems of citation analysis. Scientometrics, 36(3), 435–444.
Article Google Scholar
Phelan, T. (1999). A compendium of issues for citation analysis. Scientometrics, 45(1), 117–136.
Article Google Scholar
Porter, M. (1980). An algorithm for suffix stripping. Program, 14, 130–137.
Google Scholar
Rattigan, M., & Jensen, D. (2003). The case for anomalous link discovery. SIGKDD Explorations, 5(1), 41–47.
Google Scholar
Seglen, P. (1998). Citation rates and journal impact factors are not suitable for evaluation of research. Acta Orthopaedica Scandinavica, 69(3), 224–229.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Center for Health Informatics and Bioinformatics, New York University Medical Center, 333 E. 38th St, 6th Floor, New York, NY, 10016, USA
Lawrence D. Fu & Constantin F. Aliferis

Authors

Lawrence D. Fu
View author publications
You can also search for this author in PubMed Google Scholar
Constantin F. Aliferis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lawrence D. Fu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fu, L.D., Aliferis, C.F. Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature. Scientometrics 85, 257–270 (2010). https://doi.org/10.1007/s11192-010-0160-5

Download citation

Received: 20 November 2009
Published: 03 February 2010
Issue Date: October 2010
DOI: https://doi.org/10.1007/s11192-010-0160-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature

Abstract

Access this article

Similar content being viewed by others

Literature reviews as independent studies: guidelines for academic practice

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Predicting academic success in higher education: literature review and best practices

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature

Abstract

Access this article

Similar content being viewed by others

Literature reviews as independent studies: guidelines for academic practice

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Predicting academic success in higher education: literature review and best practices

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation