The Substantive and Practical Significance of Citation Impact Differences Between Institutions: Guidelines for the Analysis of Percentiles Using Effect Sizes and Confidence Intervals

Williams, Richard; Bornmann, Lutz

doi:10.1007/978-3-319-10377-8_12

Richard Williams⁴ &
Lutz Bornmann⁵

4599 Accesses
3 Citations

Abstract

In this chapter we address the statistical analysis of percentiles: How should the citation impact of institutions be compared? In educational and psychological testing, percentiles are already used widely as a standard to evaluate an individual’s test scores—intelligence tests for example—by comparing them with the scores of a calibrated sample. Percentiles, or percentile rank classes, are also a very suitable method for bibliometrics to normalize citations of publications in terms of the subject category and the publication year and, unlike the mean-based indicators (the relative citation rates), percentiles are scarcely affected by skewed distributions of citations. The percentile of a certain publication provides information about the citation impact this publication has achieved in comparison to other similar publications in the same subject category and publication year. Analyses of percentiles, however, have not always been presented in the most effective and meaningful way. New APA guidelines (Association American Psychological, Publication manual of the American Psychological Association (6 ed.). Washington, DC: American Psychological Association (APA), 2010) suggest a lesser emphasis on significance tests and a greater emphasis on the substantive and practical significance of findings. Drawing on work by Cumming (Understanding the new statistics: effect sizes, confidence intervals, and meta-analysis. London: Routledge, 2012) we show how examinations of effect sizes (e.g., Cohen’s d statistic) and confidence intervals can lead to a clear understanding of citation impact differences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Cumming (2012) refers to the CI obtained from an analysis as “One from the dance.” What he means is that it is NOT correct to say that there is a 95 % chance that the true value of the mean lies within the confidence interval. Either the true value falls within the interval or it doesn’t. It is correct to say that, if this process were repeated an infinite number of times, then 95 % of the time the CI would include the true value of the mean while 5 % of the time it would not. Whether it does in the specific data we are analyzing, we don’t know.
2.
Cumming (2012) notes various cautions about using Cohen’s d (p. 283). For example, while it is common to use sample standard deviations as we do here, other “standardizers” are possible, e.g., you might use the standard deviation for a reference population, such as elite institutions. Researchers should be clear exactly how Cohen’s d was computed.
3.
With independent samples there are two different types of t-tests that can be conducted. The first type, used here, assumes that the variances for each group are equal. The second approach allows the variances for the two groups to be different. In our examples, it makes little difference which approach is used, since, as Table 12.2 shows, the standard deviations for the three groups are similar. In cases where the variances do clearly differ the second approach should be used. Most, perhaps all, statistical software packages can compute either type of t-test easily.
4.
Nonetheless, as we found for other measures in our analysis, Cohen’s d seems robust to violations of its assumptions. When we estimated Cohen’s d using binary dependent variables, we got almost exactly the same numbers as we did for Cohen’s h.

References

Acock, A. (2010). A gentle introduction to Stata (3rd ed.). College Station, TX: Stata Press.
Google Scholar
Association American Psychological. (2010). Publication manual of the American Psychological Association (6th ed.). Washington, DC: American Psychological Association (APA).
Google Scholar
Bornmann, L., & Leydesdorff, L. (2013). Statistical tests and research assessments: A comment on Schneider (2012). Journal of the American Society for Information Science and Technology, 64(6), 1306–1308. doi:10.1002/asi.22860.
Article Google Scholar
Bornmann, L., Leydesdorff, L., & Mutz, R. (2013). The use of percentiles and percentile rank classes in the analysis of bibliometric data: opportunities and limits. Journal of Informetrics, 7(1), 158–165.
Article Google Scholar
Bornmann, L., & Mutz, R. (2013). The advantage of the use of samples in evaluative bibliometric studies. Journal of Informetrics, 7(1), 89–90. doi:10.1016/j.joi.2012.08.002.
Article Google Scholar
Bornmann, L., & Williams, R. (2013). How to calculate the practical significance of citation impact differences? An empirical example from evaluative institutional bibliometrics using adjusted predictions and marginal effects. Journal of Informetrics, 7(2), 562–574. doi:10.1016/j.joi.2013.02.005.
Article Google Scholar
Bornmann, L., de Moya Anegon, F., & Leydesdorff, L. (2012). The new Excellence Indicator in the World Report of the SCImago Institutions Rankings 2011. Journal of Informetrics, 6(2), 333–335. doi: 10.1016/j.joi.2011.11.006.
Cameron, A. C. & Trivedi, P. K. (2010). Microeconomics using Stata (Revised ed.). College Station, TX: Stata Press.
Google Scholar
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.
MATH Google Scholar
Cox, N. J. (2005). Calculating percentile ranks or plotting positions. Retrieved May 30, from http://www.stata.com/support/faqs/stat/pcrank.html
Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. London: Routledge.
Google Scholar
Glänzel, W., Thijs, B., Schubert, A., & Debackere, K. (2009). Subfield-specific normalized relative indicators and a new generation of relational charts: methodological foundations illustrated on the assessment of institutional research performance. Scientometrics, 78(1), 165–188.
Article Google Scholar
Huber, C. (2013). Measures of effect size in Stata 13. The Stata Blog. Retrieved December 6, 2013, from http://blog.stata.com/2013/09/05/measures-of-effect-size-in-stata-13.
Hyndman, R. J., & Fan, Y. N. (1996). Sample quantiles in statistical packages. American Statistician, 50(4), 361–365.
Google Scholar
International Committee of Medical Journal Editors. (2010). Uniform requirements for manuscripts submitted to biomedical journals: Writing and editing for biomedical publication. Journal of Pharmacology and Pharmacotherapeutics, 1(1), 42–58. Retrieved April 10, 2014 from http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3142758/.
Leydesdorff, L. (2012). Accounting for the uncertainty in the evaluation of percentile ranks. Journal of the American Society for Information Science and Technology, 63(11), 2349–2350.
Article Google Scholar
Leydesdorff, L., & Bornmann, L. (2011). Integrated impact indicators (I3) compared with impact factors (IFs): An alternative research design with policy implications. Journal of the American Society of Information Science and Technology, 62(11), 2133–2146.
Article Google Scholar
Leydesdorff, L., & Bornmann, L. (2012). Percentile ranks and the integrated impact indicator (I3). Journal of the American Society for Information Science and Technology, 63(9), 1901–1902. doi:10.1002/asi.22641.
Article Google Scholar
Long, S., & Freese, J. (2006). Regression models for categorical dependent variables using Stata (2nd ed.). College Station, TX: Stata Press.
MATH Google Scholar
Lundberg, J. (2007). Lifting the crown - citation z-score. Journal of Informetrics, 1(2), 145–154.
Article Google Scholar
Moed, H. F., De Bruin, R. E., & Van Leeuwen, T. N. (1995). New bibliometric tools for the assessment of national research performance - database description, overview of indicators and first applications. Scientometrics, 33(3), 381–422.
Article Google Scholar
Opthof, T., & Leydesdorff, L. (2010). Caveats for the journal and field normalizations in the CWTS (“Leiden”) evaluations of research performance. Journal of Informetrics, 4(3), 423–430.
Article Google Scholar
Pudovkin, A. I., & Garfield, E. (2009). Percentile rank and author superiority indexes for evaluating individual journal articles and the author’s overall citation performance. Paper presented at the Fifth International Conference on Webometrics, Informetrics & Scientometrics (WIS).
Google Scholar
Schneider, J., & Schneider, J. (2012). Testing university rankings statistically: Why this is not such a good idea after all. Some reflections on statistical power, effect sizes, random sampling and imaginary populations. In E. Archambault, Y. Gingras, & V. Lariviere (Eds.), The 17th International Conference on Science and Technology Indicators (pp. 719–732). Montreal, Canada: Repro-UQAM.
Google Scholar
Schreiber, M. (2012). Inconsistencies of recently proposed citation impact indicators and how to avoid them. Journal of the American Society for Information Science and Technology, 63(10), 2062–2073. doi:10.1002/asi.22703.
Article MathSciNet Google Scholar
Schreiber, M. (2013). Uncertainties and ambiguities in percentiles and how to avoid them. Journal of the American Society for Information Science and Technology, 64(3), 640–643. doi:10.1002/asi.22752.
Article Google Scholar
Schubert, A., & Braun, T. (1986). Relative indicators and relational charts for comparative assessment of publication output and citation impact. Scientometrics, 9(5–6), 281–291.
Article Google Scholar
StataCorp. (2013). Stata statistical software: Release 13. College Station, TX: Stata Corporation.
Google Scholar
Tressoldi, P. E., Giofre, D., Sella, F., & Cumming, G. (2013). High impact = high statistical standards? not necessarily so. PLoS One, 8(2). doi: 10.1371/journal.pone.0056180.
van Raan, A. F. J., van Leeuwen, T. N., Visser, M. S., van Eck, N. J., & Waltman, L. (2010). Rivals for the crown: Reply to Opthof and Leydesdorff. Journal of Informetrics, 4, 431–435.
Article Google Scholar
Waltman, L., Calero-Medina, C., Kosten, J., Noyons, E. C. M., Tijssen, R. J. W., van Eck, N. J., et al. (2012). The Leiden Ranking 2011/2012: Data collection, indicators, and interpretation. Journal of the American Society for Information Science and Technology, 63(12), 2419–2432.
Article Google Scholar
Waltman, L., & Schreiber, M. (2013). On the calculation of percentile-based bibliometric indicators. Journal of the American Society for Information Science and Technology, 64(2), 372–379.
Article Google Scholar
Williams, R. (2012). Using the margins command to estimate and interpret adjusted predictions and marginal effects. The Stata Journal, 12(2), 308–331.
Google Scholar
Zhou, P., & Zhong, Y. (2012). The citation-based indicator and combined impact indicator—new options for measuring impact. Journal of Informetrics, 6(4), 631–638. doi:10.1016/j.joi.2012.05.004.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Sociology, University of Notre Dame, 810 Flanner Hall, Notre Dame, IN, 46556, USA
Richard Williams
Division for Science and Innovation Studies, Administrative Headquarters of the Max Planck Society, Hofgartenstr. 8, 80539, Munich, Germany
Lutz Bornmann

Authors

Richard Williams
View author publications
You can also search for this author in PubMed Google Scholar
Lutz Bornmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Richard Williams .

Editor information

Editors and Affiliations

School of Informatics and Computing, Indiana University, Bloomington, Indiana, USA
Ying Ding
University of Antwerp, Antwerp, Belgium
Ronald Rousseau
University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, USA
Dietmar Wolfram

Appendix: Stata Code Used for These Analyses

* Stata code for Williams & Bornmann book chapter on effect sizes.

* Be careful when running this code -- make sure it doesn't

* overwrite existing files or graphs that use the same names.

version 13.1

use "http://www3.nd.edu/~rwilliam/statafiles/rwlbes", clear

gen inst12 = inst if inst!=3

gen inst13 = inst if inst!=2

gen inst23 = inst if inst!=1

gen top10 = perc <= 10

* Limit to 2001 & 2002; this can be changed

keep if py <=2002

¶

* Table 12.2

* Single group designs - pages 286-287 of Cumming

* For each institution, test whether percentile mu = 50

* Note that negative differences mean better than average performance

forval instnum = 1/3 {

display

display "Institution `instnum'"

ttest perc = 50 if inst==`instnum'

display

display "Cohen's d = " r(t) / sqrt(r(N_1))

* DOUBLE CHECK: Compares above CIs and t-tests with bootstrap

* Results from the test command should be similar to the t-test

* significance level

bootstrap, reps(100): reg perc if inst==`instnum'

test _cons = 50

}

¶

* Table 12.3

* Two group designs - Test whether two institutions

* differ from each other on mean percentile rating.

* Starts around p. 155

* Get both the t-tests and the ES stats, e.g. Cohen's d

* Note: you should flip the signs for the 3 vs 2 comparison

¶

foreach iv of varlist inst12 inst13 inst23 {

display "perc is dependent, `iv'"

¶

ttest perc, by(`iv')

scalar n1 = r(N_1)

scalar n2 = r(N_2)

scalar s1 = r(sd_1)

scalar s2 = r(sd_2)

display

display "Pooled sd is " ///

sqrt(((n1 - 1) * s1^2 + (n2 - 1) * s2^2 ) / (n1 + n2 - 2))

display

esize two perc, by(`iv') all

display

* DOUBLE CHECKS: Compare Mann-Whitney & bootstrap results with above

* Mann-Whitney test

ranksum perc, by(`iv')

* Bootstrap

bootstrap, rep(100): reg perc i.`iv'

}

¶

* Table 12.4

* Proportions in Top 10, pp. 399-402

* Single institution tests

* Numbers in table are multiplied by 100

forval instnum = 1/3 {

display

display "Institution `instnum'"

prtest top10 = .10 if inst==`instnum'

display

scalar phi1 = 2 * asin(sqrt(r(P_1)))

scalar phi2 = 2 * asin(sqrt(.10))

di "h effect size = " phi1 - phi2

display

}

¶

* Table 12.5

* Proportions in Top 10 - pairwise comparisons of institutions

* Numbers in table are multiplied by 100

foreach instpair of varlist inst12 inst13 inst23 {

display

display "`instpair'"

prtest top10, by (`instpair')

display

scalar phi1 = 2 * asin(sqrt(r(P_1)))

scalar phi2 = 2 * asin(sqrt(r(P_2)))

di "h effect size = " phi1 - phi2

display

* NOTE: Cohen's d provides very similar results to Cohen's h

esize two top10, by (`instpair') all

display

}

* Do graphs with Stata

* NOTE: Additional editing was done with the Stata Graph Editor

* Use ciplot for Univariate graphs

¶

* Figure 12.1 - Average percentile score by inst with CI

ciplot perc, by(inst) name(fig1, replace)

¶

* Figure 12.3

* Was edited to multiply by 100

ciplot top10, bin by(inst) name(fig3, replace)

¶

*** Save figures before running figure 12.2 code

¶

* Figure 12.2 - Differences in mean percentile rankings

* Use statsby and serrbar for tests of group differences

* Note: Data in memory is overwritten

gen inst32 = inst23 * -1 + 4

tab2 inst32 inst23

statsby _b _se, saving(xb12, replace) : reg perc i.inst12

statsby _b _se, saving(xb13, replace) : reg perc i.inst13

statsby _b _se, saving(xb32, replace) : reg perc i.inst32

clear all

append using xb12 xb13 xb32, gen(pairing)

label define pairing 1 "1 vs 2" 2 "1 vs 3" 3 "3 vs 2"

label values pairing pairing

serrbar _stat_2 _stat_5 pairing, scale(1.96) name(fig2, replace)

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Williams, R., Bornmann, L. (2014). The Substantive and Practical Significance of Citation Impact Differences Between Institutions: Guidelines for the Analysis of Percentiles Using Effect Sizes and Confidence Intervals. In: Ding, Y., Rousseau, R., Wolfram, D. (eds) Measuring Scholarly Impact. Springer, Cham. https://doi.org/10.1007/978-3-319-10377-8_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-10377-8_12
Published: 29 September 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10376-1
Online ISBN: 978-3-319-10377-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Substantive and Practical Significance of Citation Impact Differences Between Institutions: Guidelines for the Analysis of Percentiles Using Effect Sizes and Confidence Intervals

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: Stata Code Used for These Analyses

Appendix: Stata Code Used for These Analyses

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation