Abstract
In this chapter we address the statistical analysis of percentiles: How should the citation impact of institutions be compared? In educational and psychological testing, percentiles are already used widely as a standard to evaluate an individual’s test scores—intelligence tests for example—by comparing them with the scores of a calibrated sample. Percentiles, or percentile rank classes, are also a very suitable method for bibliometrics to normalize citations of publications in terms of the subject category and the publication year and, unlike the mean-based indicators (the relative citation rates), percentiles are scarcely affected by skewed distributions of citations. The percentile of a certain publication provides information about the citation impact this publication has achieved in comparison to other similar publications in the same subject category and publication year. Analyses of percentiles, however, have not always been presented in the most effective and meaningful way. New APA guidelines (Association American Psychological, Publication manual of the American Psychological Association (6 ed.). Washington, DC: American Psychological Association (APA), 2010) suggest a lesser emphasis on significance tests and a greater emphasis on the substantive and practical significance of findings. Drawing on work by Cumming (Understanding the new statistics: effect sizes, confidence intervals, and meta-analysis. London: Routledge, 2012) we show how examinations of effect sizes (e.g., Cohen’s d statistic) and confidence intervals can lead to a clear understanding of citation impact differences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Cumming (2012) refers to the CI obtained from an analysis as “One from the dance.” What he means is that it is NOT correct to say that there is a 95 % chance that the true value of the mean lies within the confidence interval. Either the true value falls within the interval or it doesn’t. It is correct to say that, if this process were repeated an infinite number of times, then 95 % of the time the CI would include the true value of the mean while 5 % of the time it would not. Whether it does in the specific data we are analyzing, we don’t know.
- 2.
Cumming (2012) notes various cautions about using Cohen’s d (p. 283). For example, while it is common to use sample standard deviations as we do here, other “standardizers” are possible, e.g., you might use the standard deviation for a reference population, such as elite institutions. Researchers should be clear exactly how Cohen’s d was computed.
- 3.
With independent samples there are two different types of t-tests that can be conducted. The first type, used here, assumes that the variances for each group are equal. The second approach allows the variances for the two groups to be different. In our examples, it makes little difference which approach is used, since, as Table 12.2 shows, the standard deviations for the three groups are similar. In cases where the variances do clearly differ the second approach should be used. Most, perhaps all, statistical software packages can compute either type of t-test easily.
- 4.
Nonetheless, as we found for other measures in our analysis, Cohen’s d seems robust to violations of its assumptions. When we estimated Cohen’s d using binary dependent variables, we got almost exactly the same numbers as we did for Cohen’s h.
References
Acock, A. (2010). A gentle introduction to Stata (3rd ed.). College Station, TX: Stata Press.
Association American Psychological. (2010). Publication manual of the American Psychological Association (6th ed.). Washington, DC: American Psychological Association (APA).
Bornmann, L., & Leydesdorff, L. (2013). Statistical tests and research assessments: A comment on Schneider (2012). Journal of the American Society for Information Science and Technology, 64(6), 1306–1308. doi:10.1002/asi.22860.
Bornmann, L., Leydesdorff, L., & Mutz, R. (2013). The use of percentiles and percentile rank classes in the analysis of bibliometric data: opportunities and limits. Journal of Informetrics, 7(1), 158–165.
Bornmann, L., & Mutz, R. (2013). The advantage of the use of samples in evaluative bibliometric studies. Journal of Informetrics, 7(1), 89–90. doi:10.1016/j.joi.2012.08.002.
Bornmann, L., & Williams, R. (2013). How to calculate the practical significance of citation impact differences? An empirical example from evaluative institutional bibliometrics using adjusted predictions and marginal effects. Journal of Informetrics, 7(2), 562–574. doi:10.1016/j.joi.2013.02.005.
Bornmann, L., de Moya Anegon, F., & Leydesdorff, L. (2012). The new Excellence Indicator in the World Report of the SCImago Institutions Rankings 2011. Journal of Informetrics, 6(2), 333–335. doi: 10.1016/j.joi.2011.11.006.
Cameron, A. C. & Trivedi, P. K. (2010). Microeconomics using Stata (Revised ed.). College Station, TX: Stata Press.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.
Cox, N. J. (2005). Calculating percentile ranks or plotting positions. Retrieved May 30, from http://www.stata.com/support/faqs/stat/pcrank.html
Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. London: Routledge.
Glänzel, W., Thijs, B., Schubert, A., & Debackere, K. (2009). Subfield-specific normalized relative indicators and a new generation of relational charts: methodological foundations illustrated on the assessment of institutional research performance. Scientometrics, 78(1), 165–188.
Huber, C. (2013). Measures of effect size in Stata 13. The Stata Blog. Retrieved December 6, 2013, from http://blog.stata.com/2013/09/05/measures-of-effect-size-in-stata-13.
Hyndman, R. J., & Fan, Y. N. (1996). Sample quantiles in statistical packages. American Statistician, 50(4), 361–365.
International Committee of Medical Journal Editors. (2010). Uniform requirements for manuscripts submitted to biomedical journals: Writing and editing for biomedical publication. Journal of Pharmacology and Pharmacotherapeutics, 1(1), 42–58. Retrieved April 10, 2014 from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3142758/.
Leydesdorff, L. (2012). Accounting for the uncertainty in the evaluation of percentile ranks. Journal of the American Society for Information Science and Technology, 63(11), 2349–2350.
Leydesdorff, L., & Bornmann, L. (2011). Integrated impact indicators (I3) compared with impact factors (IFs): An alternative research design with policy implications. Journal of the American Society of Information Science and Technology, 62(11), 2133–2146.
Leydesdorff, L., & Bornmann, L. (2012). Percentile ranks and the integrated impact indicator (I3). Journal of the American Society for Information Science and Technology, 63(9), 1901–1902. doi:10.1002/asi.22641.
Long, S., & Freese, J. (2006). Regression models for categorical dependent variables using Stata (2nd ed.). College Station, TX: Stata Press.
Lundberg, J. (2007). Lifting the crown - citation z-score. Journal of Informetrics, 1(2), 145–154.
Moed, H. F., De Bruin, R. E., & Van Leeuwen, T. N. (1995). New bibliometric tools for the assessment of national research performance - database description, overview of indicators and first applications. Scientometrics, 33(3), 381–422.
Opthof, T., & Leydesdorff, L. (2010). Caveats for the journal and field normalizations in the CWTS (“Leiden”) evaluations of research performance. Journal of Informetrics, 4(3), 423–430.
Pudovkin, A. I., & Garfield, E. (2009). Percentile rank and author superiority indexes for evaluating individual journal articles and the author’s overall citation performance. Paper presented at the Fifth International Conference on Webometrics, Informetrics & Scientometrics (WIS).
Schneider, J., & Schneider, J. (2012). Testing university rankings statistically: Why this is not such a good idea after all. Some reflections on statistical power, effect sizes, random sampling and imaginary populations. In E. Archambault, Y. Gingras, & V. Lariviere (Eds.), The 17th International Conference on Science and Technology Indicators (pp. 719–732). Montreal, Canada: Repro-UQAM.
Schreiber, M. (2012). Inconsistencies of recently proposed citation impact indicators and how to avoid them. Journal of the American Society for Information Science and Technology, 63(10), 2062–2073. doi:10.1002/asi.22703.
Schreiber, M. (2013). Uncertainties and ambiguities in percentiles and how to avoid them. Journal of the American Society for Information Science and Technology, 64(3), 640–643. doi:10.1002/asi.22752.
Schubert, A., & Braun, T. (1986). Relative indicators and relational charts for comparative assessment of publication output and citation impact. Scientometrics, 9(5–6), 281–291.
StataCorp. (2013). Stata statistical software: Release 13. College Station, TX: Stata Corporation.
Tressoldi, P. E., Giofre, D., Sella, F., & Cumming, G. (2013). High impact = high statistical standards? not necessarily so. PLoS One, 8(2). doi: 10.1371/journal.pone.0056180.
van Raan, A. F. J., van Leeuwen, T. N., Visser, M. S., van Eck, N. J., & Waltman, L. (2010). Rivals for the crown: Reply to Opthof and Leydesdorff. Journal of Informetrics, 4, 431–435.
Waltman, L., Calero-Medina, C., Kosten, J., Noyons, E. C. M., Tijssen, R. J. W., van Eck, N. J., et al. (2012). The Leiden Ranking 2011/2012: Data collection, indicators, and interpretation. Journal of the American Society for Information Science and Technology, 63(12), 2419–2432.
Waltman, L., & Schreiber, M. (2013). On the calculation of percentile-based bibliometric indicators. Journal of the American Society for Information Science and Technology, 64(2), 372–379.
Williams, R. (2012). Using the margins command to estimate and interpret adjusted predictions and marginal effects. The Stata Journal, 12(2), 308–331.
Zhou, P., & Zhong, Y. (2012). The citation-based indicator and combined impact indicator—new options for measuring impact. Journal of Informetrics, 6(4), 631–638. doi:10.1016/j.joi.2012.05.004.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix: Stata Code Used for These Analyses
Appendix: Stata Code Used for These Analyses
* Stata code for Williams & Bornmann book chapter on effect sizes.
* Be careful when running this code -- make sure it doesn't
* overwrite existing files or graphs that use the same names.
version 13.1
use "http://www3.nd.edu/~rwilliam/statafiles/rwlbes", clear
gen inst12 = inst if inst!=3
gen inst13 = inst if inst!=2
gen inst23 = inst if inst!=1
gen top10 = perc <= 10
* Limit to 2001 & 2002; this can be changed
keep if py <=2002
¶
* Table 12.2
* Single group designs - pages 286-287 of Cumming
* For each institution, test whether percentile mu = 50
* Note that negative differences mean better than average performance
forval instnum = 1/3 {
display
display "Institution `instnum'"
ttest perc = 50 if inst==`instnum'
display
display "Cohen's d = " r(t) / sqrt(r(N_1))
* DOUBLE CHECK: Compares above CIs and t-tests with bootstrap
* Results from the test command should be similar to the t-test
* significance level
bootstrap, reps(100): reg perc if inst==`instnum'
test _cons = 50
}
¶
* Table 12.3
* Two group designs - Test whether two institutions
* differ from each other on mean percentile rating.
* Starts around p. 155
* Get both the t-tests and the ES stats, e.g. Cohen's d
* Note: you should flip the signs for the 3 vs 2 comparison
¶
foreach iv of varlist inst12 inst13 inst23 {
display "perc is dependent, `iv'"
¶
ttest perc, by(`iv')
scalar n1 = r(N_1)
scalar n2 = r(N_2)
scalar s1 = r(sd_1)
scalar s2 = r(sd_2)
display
display "Pooled sd is " ///
sqrt(((n1 - 1) * s1^2 + (n2 - 1) * s2^2 ) / (n1 + n2 - 2))
display
esize two perc, by(`iv') all
display
* DOUBLE CHECKS: Compare Mann-Whitney & bootstrap results with above
* Mann-Whitney test
ranksum perc, by(`iv')
* Bootstrap
bootstrap, rep(100): reg perc i.`iv'
}
¶
¶
* Table 12.4
* Proportions in Top 10, pp. 399-402
* Single institution tests
* Numbers in table are multiplied by 100
forval instnum = 1/3 {
display
display "Institution `instnum'"
prtest top10 = .10 if inst==`instnum'
display
display
scalar phi1 = 2 * asin(sqrt(r(P_1)))
scalar phi2 = 2 * asin(sqrt(.10))
di "h effect size = " phi1 - phi2
display
}
¶
¶
* Table 12.5
* Proportions in Top 10 - pairwise comparisons of institutions
* Numbers in table are multiplied by 100
foreach instpair of varlist inst12 inst13 inst23 {
display
display "`instpair'"
prtest top10, by (`instpair')
display
scalar phi1 = 2 * asin(sqrt(r(P_1)))
scalar phi2 = 2 * asin(sqrt(r(P_2)))
di "h effect size = " phi1 - phi2
display
* NOTE: Cohen's d provides very similar results to Cohen's h
esize two top10, by (`instpair') all
display
}
* Do graphs with Stata
* NOTE: Additional editing was done with the Stata Graph Editor
* Use ciplot for Univariate graphs
¶
* Figure 12.1 - Average percentile score by inst with CI
ciplot perc, by(inst) name(fig1, replace)
¶
* Figure 12.3
* Was edited to multiply by 100
ciplot top10, bin by(inst) name(fig3, replace)
¶
*** Save figures before running figure 12.2 code
¶
* Figure 12.2 - Differences in mean percentile rankings
* Use statsby and serrbar for tests of group differences
* Note: Data in memory is overwritten
gen inst32 = inst23 * -1 + 4
tab2 inst32 inst23
statsby _b _se, saving(xb12, replace) : reg perc i.inst12
statsby _b _se, saving(xb13, replace) : reg perc i.inst13
statsby _b _se, saving(xb32, replace) : reg perc i.inst32
clear all
append using xb12 xb13 xb32, gen(pairing)
label define pairing 1 "1 vs 2" 2 "1 vs 3" 3 "3 vs 2"
label values pairing pairing
serrbar _stat_2 _stat_5 pairing, scale(1.96) name(fig2, replace)
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Williams, R., Bornmann, L. (2014). The Substantive and Practical Significance of Citation Impact Differences Between Institutions: Guidelines for the Analysis of Percentiles Using Effect Sizes and Confidence Intervals. In: Ding, Y., Rousseau, R., Wolfram, D. (eds) Measuring Scholarly Impact. Springer, Cham. https://doi.org/10.1007/978-3-319-10377-8_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-10377-8_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10376-1
Online ISBN: 978-3-319-10377-8
eBook Packages: Computer ScienceComputer Science (R0)