Skip to main content
Log in

Distributional characteristics of Dimensions concepts: An Empirical Analysis using Zipf’s law

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

The massive growth in scholarly outputs during the last few decades has resulted into the creation of several scholarly databases to index the outputs. These scholarly databases index publication records and provide different metadata fields for different kinds of usage ranging from retrieval and research evaluation to various scientometric analysis. The ‘author keywords’ is one such important metadata field provided by many databases and used for different text-based and thematic structure analysis. The Dimensions database, however, does not provide ‘author keywords’ metadata field, instead it provides automatically generated terms from the article full texts, called ‘concepts’. Therefore, it is not clear whether different text-based analysis can be done with data provided by Dimensions database. Therefore, this article explores the distributional characteristics of Dimensions concepts. The Dimensions concept data obtained for a sufficiently large sample of scholarly articles is analysed through rank frequency distribution plots in the log–log space. Existence of Zipfian distribution is explored. The results indicate that Dimensions concepts adhere to the Zipfian properties which in turn indicates that Dimensions concepts have similar distributional characteristics as author keywords and hence they may have the same expressive power as that of author or index keywords for scientometric exercises. The study is novel as it is the first study to explore the distributional characteristics of the Dimensions concepts, particularly with respect to Zipfian properties, which provide the statistical foundation for understanding the Dimensions concepts and help to model and analyse them.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Ausloos, M., Nedic, O., Fronczak, A., & Fronczak, P. (2016). Quantifying the quality of peer reviewers through Zipf’s law. Scientometrics, 106, 347–368.

    Article  Google Scholar 

  • Banshal, S. K., Gupta, S., Lathabai, H. H., & Singh, V. K. (2022). Power laws in altmetrics: An empirical analysis. Journal of Informetrics, 16(3), 101309. https://doi.org/10.1016/j.joi.2022.101309

    Article  Google Scholar 

  • Barker, M. A. A. R. (1969). An Urdu Newspaper Word Count. McGill University Press.

    Google Scholar 

  • Bode, C., Herzog, C., Hook, D., & McGrath, R. (2018). A Guide to the Dimensions Data Approach. Dimensions Report. Digital Science.

    Google Scholar 

  • Brzezinski, M. (2015). Power laws in citation distributions: Evidence from scopus. Scientometrics, 103, 213–228.

    Article  PubMed  PubMed Central  Google Scholar 

  • Cardoso, L., Araújo-Vila, N., Soliman, M., Araújo, A. F., & Almeida, G. G. F. (2022). How to employ Zipf’s laws for content analysis in tourism studies. International Journal of Hospitality and Tourism Systems, 15, 1–16.

    Google Scholar 

  • Clauset, A., Shalizi, C. R., & Newman, M. E. (2009). Power-law distributions in empirical data. SIAM Review, 51(4), 661–703.

    Article  ADS  MathSciNet  Google Scholar 

  • Fedorowicz, J. (1982). The theoretical foundation of Zipf’s law and its application to the bibliographic database environment. Journal of the American Society for Information Science, 33(5), 285–293.

    Article  Google Scholar 

  • García-Sánchez, P., Mora, A. M., Castillo, P. A., & Pérez, I. J. (2019). A bibliometric study of the research area of videogames using dimensions. AI database. Procedia Computer Science, 162, 737–744.

    Article  Google Scholar 

  • Ghatage, A. M. (1964). Phonemic and Morphemic Frequencies in Hindi. Deccan College Postgraduate and Research Institute.

    Google Scholar 

  • Haitun, S. (1982). Stationary scientometric distributions: Part III. The role of the Zipf distribution. Scientometrics, 4(3), 181–194.

    Article  Google Scholar 

  • Herzog, C., Hook, D., & Konkiel, S. (2020). Dimensions: Bringing down barriers between scientometricians and data. Quantitative Science Studies, 1(1), 387–395.

    Article  Google Scholar 

  • Hook, D. W., Porter, S. J., & Herzog, C. (2018). Dimensions: Building context for search and evaluation. Frontiers in Research Metrics and Analytics, 3, 23.

    Article  Google Scholar 

  • Hou, Z., & Wang, D. (2022). New observations on Zipf’s Law in passwords. IEEE Transactions on Information Forensics and Security, 18, 517–532.

    Article  Google Scholar 

  • Jayaram, B. D., & Vidya, M. N. (2008). Zipf’s law for Indian languages. Journal of Quantitative Linguistics, 15(4), 293–317.

    Article  Google Scholar 

  • Lathabai, H. H., Nandy, A., & Singh, V. K. (2021). x-index: Identifying core competency and thematic research strengths of institutions using an NLP and network based ranking framework. Scientometrics, 126, 9557–9583.

    Article  Google Scholar 

  • Lu, W., Liu, Z., Huang, Y., Bu, Y., Li, X., & Cheng, Q. (2020). How do authors select keywords? A preliminary study of author keyword selection behavior. Journal of Informetrics, 14(4), 101066.

    Article  Google Scholar 

  • Mahi, M., Ismail, I., Phoong, S. W., & Isa, C. R. (2021). Mapping trends and knowledge structure of energy efficiency research: What we know and where we are going. Environmental Science and Pollution Research, 28(27), 35327–35345.

    Article  PubMed  Google Scholar 

  • Moreno-Sánchez, I., Font-Clos, F., & Corral, Á. (2016). Large-scale analysis of Zipf’s law in english texts. PLoS ONE, 11(1), e0147073.

    Article  PubMed  PubMed Central  Google Scholar 

  • Newman, M. E. (2005). Power laws, Pareto distributions and Zipf’s law. Contemporary Physics, 46(5), 323–351.

    Article  ADS  Google Scholar 

  • Okuyama, K., Takayasu, M., & Takayasu, H. (1999). Zipf’s law in income distribution of companies. Physica a: Statistical Mechanics and Its Applications, 269(1), 125–131.

    Article  ADS  Google Scholar 

  • Piantadosi, S. T. (2014). Zipf’s word frequency law in natural language: A critical review and future directions. Psychonomic Bulletin & Review, 21, 1112–1130.

    Article  Google Scholar 

  • Rana, M. S. (2015). Content analysis and application of Zipf's law in computer science literature. In 2015 4th International Symposium on emerging trends and technologies in libraries and information services (pp. 223–227). IEEE.

  • Singh, P., Singh, V. K., & Pinto, D. (2020). Revisiting subject classification in academic databases: A comparison of the classification accuracy of web of science, scopus & dimensions. Journal of Intelligent and Fuzzy Systems, 39(2), 2471–2476.

    Article  Google Scholar 

  • Singh, V. K., Singh, P., Karmakar, M., Leta, J., & Mayr, P. (2021). The journal coverage of Web of science, scopus and dimensions: A comparative analysis. Scientometrics, 126, 5113–5142.

    Article  Google Scholar 

  • Valderrama-Zurián, J. C., García-Zorita, C., Marugán-Lázaro, S., & Sanz-Casado, E. (2021). Comparison of MeSH terms and KeyWords plus terms for more accurate classification in medical research fields. A case study in cannabis research. Information Processing & Management, 58(5), 102658.

    Article  Google Scholar 

  • Wang, D., Cheng, H., Wang, P., Huang, X., & Jian, G. (2017). Zipf’s law in passwords. IEEE Transactions on Information Forensics and Security, 12(11), 2776–2791.

    Article  Google Scholar 

  • Yu, D., & Hong, X. (2022). A theme evolution and knowledge trajectory study in AHP using science mapping and main path analysis. Expert Systems with Applications, 205, 117675.

    Article  Google Scholar 

  • Zhang, J., Yu, Q., Zheng, F., Long, C., Lu, Z., & Duan, Z. (2016). Comparing keywords plus of WOS and author keywords: A case study of patient adherence research. Journal of the Association for Information Science and Technology, 67(4), 967–972.

    Article  Google Scholar 

  • Zhang, Z. K., Lü, L., Liu, J. G., & Zhou, T. (2008). Empirical analysis on a keyword-based semantic system. The European Physical Journal B, 66, 557–561.

    Article  ADS  CAS  Google Scholar 

  • Zipf, G. (1936). The Psychobiology of Language. Routledge.

    Google Scholar 

  • Zipf, G. K. (1949). Human Behaviour and the Principle of Least-Effort (p. 24). Addison-Wesley.

    Google Scholar 

Download references

Funding

This work is partly supported by extramural research Grant No.: MTR/2020/000625 from Science and Engineering Research Board (SERB), India, and by HPE Aruba Centre for Research in Information Systems at BHU (Grant No.: M-22-69 of BHU), to the second author.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vivek Kumar Singh.

Ethics declarations

Conflicts of interest

The authors declare that the manuscript complies with ethical standards of the journal and there is no conflict of interests whatsoever.

Additional information

Publisher's Note

Springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gupta, S., Singh, V.K. Distributional characteristics of Dimensions concepts: An Empirical Analysis using Zipf’s law. Scientometrics 129, 1037–1053 (2024). https://doi.org/10.1007/s11192-023-04899-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-023-04899-9

Keywords

Navigation