Skip to main content
Log in

Exploring linguistic characteristics of highly browsed and downloaded academic articles

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Views and downloads of academic articles have become important supplementary indicators of scholarly impact. It is assumed that linguistic characteristics have an influence on article views and downloads to some extent. To understand the relationship between linguistic characteristics and article views and downloads, this study selected 63,002 full-text articles published from 2014 to 2015 in the PLoS (Public Library of Science) journals (PLoS Biology, PLoS Computational Biology, PLoS Genetics, PLoS Medicine, PLoS Neglected Tropical Diseases, PLoS One and PLoS Pathogens), and introduced seven indicators (title length, abstract length, full text length, sentence length, lexical diversity, lexical density and lexical sophistication) to measure linguistic characteristics of articles, grouped into Top 20% viewed and downloaded (proxy of highly browsed and downloaded articles), total and Bottom 20% viewed and downloaded categories. The results suggested that most linguistic characteristics played little role in article views and downloads in our data sets in general, but some linguistic characteristics (e.g. title length and average sentence length) in specific PLoS journal and platform (PLoS platform or PubMed Central platform) played certain role in article views and downloads. Also, journal differences and platform differences regarding linguistic characteristics of highly viewed and downloaded articles were existed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. https://www.plos.org/article-level-metrics.

  2. http://www.lagotto.io/plos/.

  3. http://images.webofknowledge.com/WOKRS519B3/help/WOK/hp_usage_score.html.

References

  • Belter, C. W. (2014). Measuring the value of research data: A citation analysis of oceanographic data sets. PLoS ONE,9(3), e92590.

    Google Scholar 

  • Bollen, J., Luce, R., Vemulapalli, S. S., & Xu, W. (2002). Usage analysis for the identification of research trends in digital libraries. D-Lib Magazine. https://doi.org/10.1045/may2003-bollen.

    Article  Google Scholar 

  • Bollen, J., Sompel, H. V. D., Smith, J. A., & Luce, R. (2005). Toward alternative metrics of journal impact: A comparison of download and citation data. Information Processing and Management,41(6), 1419–1440.

    Google Scholar 

  • Bonzi, S., & Snyder, H. W. (1991). Motivations for citation: A comparison of self citation and citation to others. Scientometrics,21(2), 245–254.

    Google Scholar 

  • Boyack, K. W., Eck, N. J. V., Colavizza, G., & Waltman, L. (2018). Characterizing in-text citations in scientific articles: A large-scale analysis. Journal of Informetrics,12(1), 59–73.

    Google Scholar 

  • Boyack, K. W., Small, H., & Klavans, R. (2013). Improving the accuracy of co-citation clustering using full text. Journal of the American Society for Information Science and Technology,64(9), 1759–1767.

    Google Scholar 

  • Brooks, T. A. (1986). Evidence of complex citer motivations. Journal of the Association for Information Science and Technology,37(1), 34–36.

    Google Scholar 

  • Cano, V. (1989). Citation behavior: Classification, utility, and location. Journal of the Association for Information Science and Technology,40(4), 284–290.

    Google Scholar 

  • Case, D. O., & Higgins, G. M. (2000). How can we investigate citation behavior? A study of reasons for citing literature in communication. Journal of the American Society for Information Science,51(7), 635–645.

    Google Scholar 

  • Chen, B. (2018). Usage pattern comparison of the same scholarly articles between Web of Science (WoS) and Springer. Scientometrics,115(1), 519–537.

    Google Scholar 

  • Chen, B., Zhong, Z., & Zhan, C. (2017). Usage pattern analysis of academic articles from two Chinese journals. In K. Holmberg & J. Vainio (Eds.), Proceedings of ISSI 2017 (pp. 366–375). Wuhan: Wuhan University.

    Google Scholar 

  • Chen, B., Zhou, H., Zhong, Z., & Wang, Y. (2018). Exploring the user platform preference and user interest preference of chinese scholarly articles: A comparison based on usage metrics. Journal of Library Science in China,44(6), 90–104. (in Chinese).

    Google Scholar 

  • Chi, P. S., & Glänzel, W. (2018). Comparison of citation and usage indicators in research assessment in scientific disciplines and journals. Scientometrics,116(1), 537–554.

    Google Scholar 

  • Chi, P. S., & Glänzel, W. (2017). An empirical investigation of the associations among usage, scientific collaboration and citation impact. Scientometrics,112(1), 403–412.

    Google Scholar 

  • Davis, P. M. (2006). Ejournal interface can influence usage statistics: Implications for libraries, publishers, and project counter. Journal of the Association for Information Science and Technology,57(9), 1243–1248.

    Google Scholar 

  • Davis, P. M., & Solla, L. R. (2003). An ip-level analysis of usage statistics for electronic journals in chemistry: Making inferences about user behavior. Journal of the American Society for Information Science and Technology,54(11), 1062–1068.

    Google Scholar 

  • De Sordi, O. J., Conejero, M. A., & Meireles, M. (2016). Bibliometric indicators in the context of regional repositories: Proposing the d-index. Scientometrics,107(1), 235–258.

    Google Scholar 

  • Ding, Y., Song, M., Han, J., Yu, Q., Yan, E., Lin, L., et al. (2013). Entitymetrics: Measuring the impact of entities. PLoS ONE,8(8), e71416.

    Google Scholar 

  • Ding, Y., Zhang, G., Chambers, T., Song, M., Wang, X., & Zhai, C. (2014). Content-based citation analysis: The next generation of citation analysis. Journal of the Association for Information Science and Technology,65(9), 1820–1833.

    Google Scholar 

  • Duan, Y., & Xiong, Z. (2017). Download patterns of journal papers and their influencing factors. Scientometrics,112(3), 1761–1775.

    Google Scholar 

  • Elgendi, M. (2019). Characteristics of a highly cited article: A machine learning perspective. IEEE Access,7, 87977–87986.

    Google Scholar 

  • Ferris, D. R. (1994). Rhetorical strategies in student persuasive writing: Differences between native and non-native English speakers. Research in the Teaching of English,28(1), 45–65.

    MathSciNet  Google Scholar 

  • Gipp, B., & Beel, J. (2009). Citation proximity analysis (CPA)—a new approach for identifying related work based on co-citation analysis. In B. Larsen & J. Leta (Eds.), Proceedings of ISSI 2009 (pp. 571–575). Wuhan: Wuhan University.

    Google Scholar 

  • Gorraiz, J., Gumpenberger, C., & Schloegl, C. (2014). Usage versus citation behaviours in four subject areas. Scientometrics,101(2), 1077–1095.

    Google Scholar 

  • Hu, Z., Chen, C., & Liu, Z. (2013). Where are citations located in the body of scientific articles? A study of the distributions of citation locations. Journal of Informetrics,7(4), 887–896.

    Google Scholar 

  • Jamali, H. R., & Nikzad, M. (2011). Article title type and its relation with the number of downloads and citations. Scientometrics,88(2), 653–661.

    Google Scholar 

  • Khan, M. S., & Younas, M. (2017). Analyzing readers behavior in downloading articles from IEEE digital library: A study of two selected journals in the field of education. Scientometrics,110(3), 1523–1537.

    Google Scholar 

  • Kim, H. J., Jeong, Y. K., & Song, M. (2016). Content- and proximity-based author co-citation analysis using citation sentences. Journal of Informetrics,10(4), 954–966.

    Google Scholar 

  • Kormos, J. (2011). Task complexity and linguistic and discourse features of narrative writing performance. Journal of Second Language Writing,20(2), 148–161.

    Google Scholar 

  • Kurtz, M. J., & Bollen, J. (2010). Usage bibliometrics. Annual Review of Information Science and Technology,44(1), 1–64.

    Google Scholar 

  • Kurtz, M. J., & Henneken, E. A. (2016). Measuring metrics-a 40-year longitudinal cross-validation of citations, downloads, and peer review in astrophysics. Journal of the Association for Information Science and Technology,68(3), 695–708.

    Google Scholar 

  • Lippi, G., & Favaloro, E. J. (2013). Article downloads and citations: Is there any relationship? Clinica Chimica Acta,415, 195.

    Google Scholar 

  • Liu, S., & Chen, C. (2012). The proximity of co-citation. Scientometrics,91(2), 495–511.

    MathSciNet  Google Scholar 

  • Liu, S., & Chen, C. (2013). The differences between latent topics in abstracts and citation contexts of citing papers. Journal of the American Society for Information Science and Technology,64(3), 627–639.

    Google Scholar 

  • Lu, C., Bu, Y., Dong, X., Wang, J., Ding, Y., Larivière, V., et al. (2019a). Analyzing linguistic complexity and scientific impact. Journal of Informetrics,13(3), 817–829.

    Google Scholar 

  • Lu, C., Bu, Y., Wang, J., Ding, Y., Torvik, V., Schnaars, M., et al. (2019b). Examining scientific writing styles from the perspective of linguistic complexity. Journal of the Association for Information Science and Technology,70(5), 462–475.

    Google Scholar 

  • Lu, C., Ding, Y., & Zhang, C. (2017). Understanding the impact change of a highly cited article: A content-based citation analysis. Scientometrics,112(2), 927–945.

    Google Scholar 

  • Mckeown, K., Daume, H., Chaturvedi, S., Paparrizos, J., Thadani, K., Barrio, P., et al. (2016). Predicting the impact of scientific concepts using full-text features. Journal of the Association for Information Science and Technology,67(11), 2684–2696.

    Google Scholar 

  • Moed, H. F. (2005). Statistical relationships between downloads and citations at the level of individual documents within a single journal. Journal of the American Society for Information Science and Technology,56(10), 1088–1097.

    Google Scholar 

  • Moed, H. F., & Halevi, G. (2016). On full text download and citation distributions in scientific-scholarly journals. Journal of the Association for Information Science and Technology,67(2), 412–431.

    Google Scholar 

  • Ojima, M. (2006). Concept mapping as pre-task planning: A case study of three Japanese ESL writers. System,34(4), 566–585.

    Google Scholar 

  • O’Leary, D. E. (2008). The relationship between citations and number of downloads in decision support systems. Decision Support Systems, 45(4), 972–980.

    Google Scholar 

  • Pan, X., Yan, E., Cui, M., & Hua, W. (2018). Examining the usage, citation, and diffusion patterns of bibliometric mapping software: A comparative study of three tools. Journal of Informetrics,12(2), 481–493.

    Google Scholar 

  • Pan, X., Yan, E., Cui, M., & Hua, W. (2019). How important is software to library and information science research? A content analysis of full-text publications. Journal of Informetrics,13(1), 397–406.

    Google Scholar 

  • Pan, X., Yan, E., & Hua, W. (2016). Disciplinary differences of software use and impact in scientific literature. Scientometrics,109(3), 1–18.

    Google Scholar 

  • Pan, X., Yan, E., Wang, Q., & Hua, W. (2015). Assessing the impact of software on science: A bootstrapped learning of software entities in full-text papers. Journal of Informetrics,9(4), 860–871.

    Google Scholar 

  • Schloegl, C., Gorraiz, J., Gumpenberger, C., Jack, K., & Kraker, P. (2014). Comparison of downloads, citations and readership data for two information systems journals. Scientometrics,101(2), 1113–1128.

    Google Scholar 

  • Small, H. (2011). Interpreting maps of science using citation context sentiments: A preliminary investigation. Scientometrics,87(2), 373–388.

    Google Scholar 

  • Subotic, S., & Mukherjee, (2014). Short and amusing: The relationship between title characteristics, downloads, and citations in psychology articles. Journal of Information Science,40(1), 115–124.

    Google Scholar 

  • Vajjala, S., & Meurers, D. (2012). On improving the accuracy of readability classification using insights from second language acquisition. In Proceedings of the seventh workshop on building educational applications using NLP (pp. 163–173), July 8–14, 2012, Jelu Island, South Korea.

  • Wan, J. K., Hua, P. H., Rousseau, R., & Sun, X. K. (2010). The journal download immediacy index (DII): Experiences using a chinese full-text database. Scientometrics,82(3), 555–566.

    Google Scholar 

  • Wang, X., Fang, Z., & Sun, X. (2016a). Usage patterns of scholarly articles on Web of Science: A study on Web of Science usage count. Scientometrics,109(2), 917–926.

    Google Scholar 

  • Wang, X., Peng, L., Zhang, C., Xu, S., Wang, Z., Wang, C., et al. (2013a). Exploring scientists’ working timetable: A global survey. Journal of Informetrics,7(3), 665–675.

    Google Scholar 

  • Wang, X., Wang, Z., & Xu, S. (2013b). Tracing scientist’s research trends realtimely. Scientometrics,95(2), 717–729.

    Google Scholar 

  • Wang, X., Xu, S., & Fang, Z. (2016). Tracing digital footprints to academic articles: An investigation of PeerJ publication referral data. Retrieved October 28, 2018, from http://cn.arxiv.org/abs/1601.05271.

  • Wang, X., Xu, S., Peng, L., Wang, Z., Wang, C., Zhang, C., et al. (2012). Exploring scientists’ working timetable: Do scientists often work overtime? Journal of Informetrics,6(4), 655–660.

    Google Scholar 

  • Wang, Y., & Zhang, C. (2018). Using full-text of research articles to analyze academic impact of algorithms. In G. Chowdhury, J. McLeod, V. Gillet, & P. Willett (Eds.), Proceedings of iConference (pp. 395–401). Sheffield: University of Sheffield.

    Google Scholar 

  • Zhang, C., Ding, R., & Wang, Y. (2018). Using behavior and influence assessment of algorithms based on full-text academic articles. Journal of the China Society for Scientific and Technical Information,37(12), 1175–1187. (in Chinese).

    Google Scholar 

  • Zhao, X. (2017). Exploring the features of usage data for academic literatures. Journal of Library Science in China,43(3), 44–57. (in Chinese).

    Google Scholar 

  • Zhao, S. X., Lou, W., Tan, A. M., & Yu, S. (2018). Do funded papers attract more usage? Scientometrics,115(1), 153–168.

    Google Scholar 

Download references

Acknowledgements

This paper is supported by Youth Program of National Social Science Fund in China (15CTQ035), Social Public Safety S&T Collaborative Innovation Center of Universities in Jiangsu Province and China Scholarship Council (ID: 201906845042).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bikun Chen.

Appendix

Appendix

See Tables 6, 7, 8, 9, 10 and 11.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, B., Deng, D., Zhong, Z. et al. Exploring linguistic characteristics of highly browsed and downloaded academic articles. Scientometrics 122, 1769–1790 (2020). https://doi.org/10.1007/s11192-020-03361-4

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-020-03361-4

Keywords

Mathematical Subject Classification

JEL Classification

Navigation