, Volume 118, Issue 2, pp 653–671 | Cite as

Large enough sample size to rank two groups of data reliably according to their means

  • Zhesi Shen
  • Liying Yang
  • Zengru Di
  • Jinshan WuEmail author


Often we need to compare two sets of data, say X and Y, and often via comparing their means \(\mu _{X}\) and \(\mu _{Y}\). However, when two sets are highly overlapped (say for example \(\sqrt{\sigma ^{2}_{X}+\sigma ^{2}_{Y}}\gg \left| \mu _{X}-\mu _{Y}\right|\)), ranking the two sets according to their means might not be reliable. Based on the observation that replacing the one-by-one comparison, where we take one sample from each set at a time and compare the two samples, with the \(K_{X}\)-by-\(K_{Y}\) comparison, where we take \(K_{X}\) samples \(\left\{ x_{1}, x_{2}, \ldots , x_{K_{X}}\right\}\) from one set and \(K_{Y}\) samples \(\left\{ y_{1}, y_{2},\ldots , y_{K_{X}}\right\}\) from the other set at a time and compare the averages \(\frac{\sum _{j=1}^{K_{X}}x_{j}}{K_{X}}\) and \(\frac{\sum _{j=1}^{K_{Y}}y_{j}}{K_{Y}}\), reduces the overlap and thus improves the reliability, we propose a definition of the minimum representative size \(\kappa\) of each set for comparing sets by requiring roughly speaking \(\sqrt{\sigma ^{2}_{K_X}+\sigma ^{2}_{K_Y}}\ll \left| \mu _{X}-\mu _{Y}\right|\)). Applied to journal comparison, this minimum representative size \(\kappa\) might be used as a complementary index to the journal impact factor (JIF) to indicate a measure of reliability of comparing two journals using their JIFs. Generally, this idea of minimum representative size can be used when any two sets of data with overlapping distributions are compared.


Journal impact factor Minimum representative size Bootstrap sampling 



We thank Jianlin Zhou, Per Ahlgren, Ludo Waltman and Lawrence Smolinsky for valuable discussions since this paper’s early version (Shen et al. 2017). We would also like to thank the anonymous referees for their suggestions and criticisms, which have greatly improved the paper’s presentation. This work was supported by the NSFC under Grant No. 61374175, the China Postdoctoral Science Foundation under Grant 2017 M620944, and Fundamental Research Funds for the Central Universities.


  1. Anonymous. (2011). Dissecting our impact factor. Nature Materials, 10, 645.Google Scholar
  2. Bar-Ilan, J. (2008). Informetrics at the beginning of the 21st century–a review. Journal of Informetrics, 2, 1–52.CrossRefGoogle Scholar
  3. Bornmann, L., Leydesdorff, L., & Mutz, R. (2013). The use of percentiles and percentile rank classes in the analysis of bibliometric data: Opportunities and limits. Journal of Informetrics, 7, 158–165.CrossRefGoogle Scholar
  4. Bornmann, L., Marx, W., Gasparyan, A. Y., & Kitas, G. D. (2012). Diversity, value and limitations of the journal impact factor and alternative metrics. Rheumatology International, 32, 1861–1867.CrossRefGoogle Scholar
  5. Bornmann, L., & Mutz, R. (2011). Further steps towards an ideal method of measuring citation performance: The avoidance of citation (ratio) averages in field-normalization. Journal of Informetrics, 5, 228–230.CrossRefGoogle Scholar
  6. Bornmann, L., Stefaner, M., de Moya Anegón, F., & Mutz, R. (2014). Ranking and mapping of universities and research-focused institutions worldwide based on highly-cited papers. Online Information Review, 38, 43–58. Scholar
  7. Callaway, E. (2016). Beat it, impact factor! publishing elite turns against controversial metric. Nature, 535, 210–211.CrossRefGoogle Scholar
  8. Church, J. D., & Harris, B. (1970). The estimation of reliability from stress-strength relationships. Technometrics, 12, 49–54. Scholar
  9. DORA (2013). San francisco declaration on research assessment. Accessed 20 December 2016.
  10. Downton, F. (1973). The estimation of pr \((\text{ y } < \text{ x })\) in the normal case. Technometrics, 15, 551–558.MathSciNetzbMATHGoogle Scholar
  11. Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. Monographs on statistics and applied probability (Vol. 57). Boca Raton: Chapman & Hall/CRC.CrossRefzbMATHGoogle Scholar
  12. Garfield, E. (1999). Journal impact factor: A brief review. Canadian Medical Association Journal, 161, 979–80.Google Scholar
  13. Glänzel, W. (2010). On reliability and robustness of scientometrics indicators based on stochastic models. An evidence-based opinion paper. Journal of Informetrics, 4, 313–319.CrossRefGoogle Scholar
  14. Glänzel, W., & Moed, H. F. (2002). Journal impact measures in bibliometric research. Scientometrics, 53, 171–193.CrossRefGoogle Scholar
  15. Glänzel, W., & Moed, H. F. (2013). Opinion paper: Thoughts and facts on bibliometric indicators. Scientometrics, 96, 381–394.CrossRefGoogle Scholar
  16. Herrnstein, R. J., Loveland, D. H., & Cable, C. (1976). Natural concepts in pigeons. Journal of Experimental Psychology: Animal Behavior Processes, 2, 285–302.Google Scholar
  17. Hicks, D., Wouters, P., Waltman, L., De, R. S., & Rafols, I. (2015). The leiden manifesto for research metrics. Nature, 520, 429–31.CrossRefGoogle Scholar
  18. Kurmis, A. P. (2003). Understanding the limitations of the journal impact factor. Journal of Bone and Joint Surgery American, 85–A, 2449–54.CrossRefGoogle Scholar
  19. Larivière, V., Kiermer, V., MacCallum, C. J., McNutt, M., Patterson, M., Pulverer, B., Swaminathan, S., Taylor, S., & Curry, S. (2016). A simple proposal for the publication of journal citation distributions. bioRxiv, .
  20. Leydesdorff, L., & Bornmann, L. (2011a). How fractional counting of citations affects the impact factor: Normalization in terms of differences in citation potentials among fields of science. Journal of the Association for Information Science and Technology, 62, 217–229.Google Scholar
  21. Leydesdorff, L., & Bornmann, L. (2011b). Integrated impact indicators compared with impact factors: An alternative research design with policy implications. Journal of the Association for Information Science and Technology, 62, 2133–2146.Google Scholar
  22. Leydesdorff, L., & Opthof, T. (2010). Normalization at the field level: Fractional counting of citations. Journal of Informetrics, 4, 644–646.CrossRefGoogle Scholar
  23. Mann, H., & Whitney, D. (1947). On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 18, 50–60.MathSciNetCrossRefzbMATHGoogle Scholar
  24. Milojević, S., Radicchi, F., & Bar-Ilan, J. (2017). Citation success index an intuitive pair-wise journal comparison metric. Journal of Informetrics, 11, 223–231.CrossRefGoogle Scholar
  25. Mingers, J., & Leydesdorff, L. (2015). A review of theory and practice in scientometrics. European Journal of Operational Research, 246, 1–19.CrossRefzbMATHGoogle Scholar
  26. Mingers, J., & Yang, L. (2017). Evaluating journal quality: A review of journal citation indicators and ranking in business and management. European Journal of Operational Research, 257, 323–337.MathSciNetCrossRefzbMATHGoogle Scholar
  27. Mutz, R., & Daniel, H. D. (2012). Skewed citation distributions and bias factors: Solutions to two core problems with the journal impact factor. Journal of Informetrics, 6, 169–176.CrossRefGoogle Scholar
  28. NSB (2016). National science board science and engineering indicators 2016. Accessed 18 June 2017
  29. Radicchi, F., Fortunato, S., & Castellano, C. (2008). Universality of citation distributions: Toward an objective measure of scientific impact. Proceedings of the National Academy of Sciences, 105, 17268–17272. Scholar
  30. Reiser, B., & Guttman, I. (1986). Statistical inference for \(\text{ pr }(\text{ y } < \text{ x })\): The normal case. Technometrics, 28, 253–257.MathSciNetzbMATHGoogle Scholar
  31. Seglen, P. O. (1992). The skewness of science. Journal of the Association for Information Science and Technology, 43, 628–638.Google Scholar
  32. Seglen, P. O. (1997). Why the impact factor of journals should not be used for evaluating research. Bmj Clinical Research, 314, 498–502.CrossRefGoogle Scholar
  33. Shen, Z., Yang, L., Di, Z., & Wu, J. (2017). How large is large enough? In Proceedings of ISSI 2017, (pp. 288–299)Google Scholar
  34. Stringer, M. J., Sales-Pardo, M., & Amaral, L. A. N. (2008). Effectiveness of journal ranking schemes as a tool for locating information. PLoS ONE, 3, e1683.CrossRefGoogle Scholar
  35. Waltman, L. (2016). A review of the literature on citation impact indicators. Journal of Informetrics, 10, 365–391.CrossRefGoogle Scholar
  36. Waltman, L., Calero-Medina, C., Kosten, J., Noyons, E. C., Tijssen, R. J., Eck, N. J., et al. (2012). The leiden ranking 2011/2012: Data collection, indicators, and interpretation. Journal of the Association or Information Science & Technology, 63, 2419–2432.Google Scholar
  37. Wasserman, L. (2004). All of statistics. New York: Springer.CrossRefzbMATHGoogle Scholar
  38. Welch, B. L. (1947). The generalization of ‘Student’s’ problem when several different population variances are involved. Biometrika, 34, 28–35. Scholar
  39. Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1, 80–83.CrossRefGoogle Scholar
  40. Zhou, W. (2008). Statistical inference for \(P(X < Y)\). Statistics in Medicine, 27, 257–279. Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2019

Authors and Affiliations

  1. 1.National Science LibraryChinese Academy of SciencesBeijingPeople’s Republic of China
  2. 2.School of Systems ScienceBeijing Normal UniversityBeijingPeople’s Republic of China

Personalised recommendations