Statistical Methods For Assessing Measurement Error (Reliability) in Variables Relevant to Sports Medicine

Atkinson, Greg; Nevill, Alan M.

doi:10.2165/00007256-199826040-00002

Statistical Methods For Assessing Measurement Error (Reliability) in Variables Relevant to Sports Medicine

Review Article
Published: 23 September 2012

Volume 26, pages 217–238, (1998)
Cite this article

Sports Medicine Aims and scope Submit manuscript

Greg Atkinson¹ &
Alan M. Nevill¹

11k Accesses
2560 Citations
12 Altmetric
Explore all metrics

Abstract

Minimal measurement error (reliability) during the collection of interval- and ratio-type data is critically important to sports medicine research. The main components of measurement error are systematic bias (e.g. general learning or fatigue effects on the tests) and random error due to biological or mechanical variation. Both error components should be meaningfully quantified for the sports physician to relate the described error to judgements regarding ‘analytical goals’ (the requirements of the measurement tool for effective practical use) rather than the statistical significance of any reliability indicators.

Methods based on correlation coefficients and regression provide an indication of ‘relative reliability’. Since these methods are highly influenced by the range of measured values, researchers should be cautious in: (i) concluding acceptable relative reliability even if a correlation is above 0.9; (ii) extrapolating the results of a test-retest correlation to a new sample of individuals involved in an experiment; and (iii) comparing test-retest correlations between different reliability studies.

Methods used to describe ‘absolute reliability’ include the standard error of measurements (SEM), coefficient of variation (CV) and limits of agreement (LOA). These statistics are more appropriate for comparing reliability between different measurement tools in different studies. They can be used in multiple retest studies from ANOVA procedures, help predict the magnitude of a ‘real’ change in individual athletes and be employed to estimate statistical power for a repeated-measures experiment.

These methods vary considerably in the way they are calculated and their use also assumes the presence (CV) or absence (SEM) of heteroscedasticity. Most methods of calculating SEM and CV represent approximately 68% of the error that is actually present in the repeated measurements for the ‘average’ individual in the sample. LOA represent the test-retest differences for 95% of a population. The associated Bland-Altman plot shows the measurement error schematically and helps to identify the presence of heteroscedasticity. If there is evidence of heteroscedasticity or non-normality, one should logarithmically transform the data and quote the bias and random error as ratios. This allows simple comparisons of reliability across different measurement tools.

It is recommended that sports clinicians and researchers should cite and interpret a number of statistical methods for assessing reliability. We encourage the inclusion of the LOA method, especially the exploration of heteroscedasticity that is inherent in this analysis. We also stress the importance of relating the results of any reliability statistic to ‘analytical goals’ in sports medicine.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

The Mental Health of Elite Athletes: A Narrative Systematic Review

Article Open access 20 February 2016

Simon M. Rice, Rosemary Purcell, … Alexandra G. Parker

The Importance of Muscular Strength: Training Considerations

Article 25 January 2018

Timothy J. Suchomel, Sophia Nimphius, … Michael H. Stone

The Biomechanics of the Track and Field Sprint Start: A Narrative Review

Article Open access 17 June 2019

Neil Edward Bezodis, Steffen Willwacher & Aki Ilkka Tapio Salo

References

Yeadon MR, Challis JH. The future of performance-related sports biomechanics research. J Sports Sci 1994; 12: 3–32
Article PubMed CAS Google Scholar
Jakeman PM, Winter EM, Doust J. A review of research in sports physiology. J Sports Sci 1994; 12: 33–60
Article PubMed CAS Google Scholar
Hardy L, Jones G. Current issues and future directions for performance-related research in sport psychology. J Sports Sci 1994; 12: 61–92
Article PubMed CAS Google Scholar
Nevill AM. Statistical methods in kinanthropometry and exercise physiology. In. Eston R, Reilly T, editors. Kinanthropometry and exercise physiology laboratory manual. London: E and FN Spon, 1996: 297–320
Google Scholar
Safrit MJ. An overview of measurement. In. Safrit MJ, Wood TM, editors. Measurement concepts in physical education and exercise science. Champaign (IL): Human Kinetics, 1989: 3–20
Google Scholar
Zar JH. Biostatistical analysis. London: Prentice Hall, 1996
Google Scholar
Mathews JN. A formula for the probability of discordant classification in method comparison studies. Stat Med 1997; 16 (6): 705–10
Article Google Scholar
Bates BT, Dufek JS, Davis HP. The effects of trial size on statistical power. Med Sci Sports Exerc 1992; 24 (9): 1059–65
PubMed CAS Google Scholar
Dufek JS, Bates BT, Davis HP. The effect of trial size and variability on statistical power. Med Sci Sports Exerc 1995; 27: 288–95
PubMed CAS Google Scholar
Atkinson G. [Letter]. British Association of Sports Sciences Newsletter, 1995 Sep: 5
Google Scholar
Nevill AM. Validity and measurement agreement in sports performance [abstract]. J Sports Sci 1996; 14: 199
Article PubMed CAS Google Scholar
Ottenbacher KJ, Stull GA. The analysis and interpretation of method comparison studies in rehabilitation research. Am J Phys Med Rehab 1993; 72: 266–71
Article CAS Google Scholar
Hollis S. Analysis of method comparison studies. Ann Clin Biochem 1996; 33: 1–4
PubMed Google Scholar
Liehr P, Dedo YL, Torres S, et al. Assessing agreement between clinical measurement methods. Heart Lung 1995; 24: 240–5
Article PubMed CAS Google Scholar
Ottenbacher KJ, Tomcheck SD. Measurement variation in method comparison studies: an empirical examination. Arch Phys Med Rehabil 1994; 75 (5): 505–12
PubMed CAS Google Scholar
Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986; I: 307–10
Article Google Scholar
Safrit MJ, Wood TM, editors. Measurement concepts in physical education and exercise science. Champaign (IL): Human Kinetics, 1989
Google Scholar
Baumgarter TA. Norm-referenced measurement: reliability. In. Safrit MJ, Wood TM, editors. Measurement concepts in physical education and exercise science. Champaign (IL): Human Kinetics, 1989: 45–72
Google Scholar
Atkinson G. Reilly T. Circadian variation in sports performance. Sports Med 1996; 21 (4): 292–312
Article PubMed CAS Google Scholar
Morrow JR, Jackson AW, Disch JG, et al. Measurement and evaluation in human performance. Champaign (IL): Human Kinetics, 1995
Google Scholar
Morrow JR. Generalizability theory. In. Safrit MJ, Wood TM, editors. Measurement concepts in physical education and exercise science. Champaign (IL): Human Kinetics, 1989: 73–96
Google Scholar
Roebroeck ME, Harlaar J, Lankhorst GJ. The application of generalizability theory to reliability assessment: an illustration using isometric force measurements. Phys Ther 1993; 73 (6): 386–95
PubMed CAS Google Scholar
Chatburn RL. Evaluation of instrument error and method agreement. Am Assoc Nurse Anesthet J 1996; 64 (3): 261–8
CAS Google Scholar
Coldwells A, Atkinson G, Reilly T. Sources of variation in back and leg dynamometry. Ergonomics 1994; 37: 79–86
Article PubMed CAS Google Scholar
Hickey MS, Costill DL, McConnell GK, et al. Day-to-day variation in time trial cycling performance. Int J Sports Med 1992; 13: 467–70
Article PubMed CAS Google Scholar
Nevill A. Why the analysis of performance variables recorded on a ratio scale will invariably benefit from a log transformation. J Sports Sci 1997; 15: 457–8
Article PubMed CAS Google Scholar
Bland JM, Altman DG. Transforming data. BMJ 1996; 312 (7033): 770
Article PubMed CAS Google Scholar
Schultz RW. Analysing change. In. Safrit MJ, Wood TM, editors. Measurement concepts in physical education and exercise science. Champaign (IL): Human Kinetics, 1989: 207–28
Google Scholar
Morrow JR, Jackson AW. How ’significant’ is your reliability?. Res Q Exerc Sport 1993; 64 (3): 352–5
PubMed Google Scholar
Altman DG. Practical statistics for medical research. London: Chapman and Hall, 1991: 396–403
Google Scholar
Mathews JNS, Altman DG, Campbell MJ, et al. Analysis of serial measurements in medical research. BMJ 1990; 300: 230–5
Article Google Scholar
Vincent J. Statistics in kinesiology. Champaign (IL): Human Kinetics Books, 1994
Google Scholar
Ross JW, Fraser MD. Analytical goals developed from the inherent error of medical tests. Clin Chem 1993; 39 (7): 1481–93
PubMed CAS Google Scholar
Fraser CG, Hyltoft Peterson P, et al. Setting analytical goals for random analytical error in specific clinical monitoring situations. Clin Chem 1990; 36 (9): 1625–8
PubMed CAS Google Scholar
Zehr ER, Sale DG. Reproducibility of ballistic movement. Med Sci Sports Exerc 1997; 29: 1383–8
Article PubMed CAS Google Scholar
Hofstra WB, Sont JK, Sterk PJ, et al. Sample size estimation in studies monitoring exercise-induced bronchoconstriction in asthmatic children. Thorax 1997; 52: 739–41
Article PubMed CAS Google Scholar
Schabort EJ, Hopkins WG, Hawley JA. Reproducibility of selfpaced treadmill performance of trained endurance runners. Int J Sports Med 1998; 19: 48–51
Article PubMed CAS Google Scholar
Hopkins W. A new view of statistics. Internet site, 1997, http://www.sportsci.org/resource/stats/index.html
Google Scholar
Bland M. An introduction to medical statistics. Oxford: University Press, 1995
Google Scholar
Proceedings of the 43rd Meeting of the American College of Sports Medicine. Med Sci Sports Exerc 1996; 28: S1-211
Google Scholar
Altman DG, Bland JM. Measurement in medicine: the analysis of method comparison studies. Statistician 1983; 32: 307–17
Article Google Scholar
Bland JM, Altman DG. Comparing two methods of clinical measurement: a personal history. Int J Epidemiol 1995; 24 Suppl. 1: S7–14
PubMed Google Scholar
Bland JM, Altman DG. Measurement error. BMJ 1996; 312 (7047): 1654
Article PubMed CAS Google Scholar
Bland JM, Altman DG. Measurement error proportional to the mean. BMJ 1996; 313 (7049): 106
Article PubMed CAS Google Scholar
Thomas JR, Nelson JK. Research methods in physical activity. Champaign (IL): Human Kinetics, 1990
Google Scholar
Nevill AN, Atkinson G. Assessing measurement agreement (repeatability) between 3 or more trials [abstract]. J Sports Sci 1998; 16: 29
Google Scholar
Coolican H. Research methods and statistics in psychology. London: Hodder and Stoughton, 1994
Google Scholar
Sale DG. Testing strength and power. In. MacDougall JD, Wenger HA, Green HJ, editors. Physiological testing of the high performance athlete. Champaign (IL): Human Kinetics, 1991: 21–106
Google Scholar
Bates BT, Zhang S, Dufek JS, et al. The effects of sample size and variability on the correlation coefficient. Med Sci Sports Exerc 1996; 28 (3): 386–91
PubMed CAS Google Scholar
Perrin DH. Isokinetic exercise and assessment. Champaign (IL): Human Kinetics, 1993
Google Scholar
Glass GV, Hopkins KD. Statistical methods in education and psychology. 2nd ed. Englewood Cliffs (NJ): Prentice-Hall, 1984
Google Scholar
Estelberger W, Reibnegger G. The rank correlation coefficient: an additional aid in the interpretation of laboratory data. Clin Chim Acta 1995; 239 (2): 203–7
Article PubMed CAS Google Scholar
Nevill AN, Atkinson G. Assessing agreement between measurements recorded on a ratio scale in sports medicine and sports science. Br J Sports Med 1997; 31: 314–8
Article PubMed CAS Google Scholar
Atkinson G, Greeves J, Reilly T, et al. Day-to-day and circadian variability of leg strength measured with the lido isokinetic dynamometer. J Sports Sci 1995; 13: 18–9
Google Scholar
Bailey SM, Sarmandal P, Grant JM. A comparison of three methods of assessing inter-observer variation applied to measurement of the symphysis-fundal height. Br J Obstet Gynaecol 1989; 96 (11): 1266–71
Article PubMed CAS Google Scholar
Sarmandal P, Bailey SM, Grant JM. A comparison of three methods of assessing inter-observer variation applied to ultrasonic fetal measurement in the third trimester. Br J Obstet Gynaecol 1989; 96 (11): 1261–5
Article PubMed CAS Google Scholar
Atkinson G, Coldwells A, Reilly T, et al. Does the within-test session variation in measurements of muscle strength depend on time of day?. [abstract] J Sports Sci 1997; 15: 22
Article Google Scholar
Charter RA. Effect of measurement error on tests of statistical significance. J Clin Exp Neuropsychol 1997; 19 (3): 458–62
Article PubMed CAS Google Scholar
Muller R, Buttner P. A critical discussion of intraclass correlation coefficients. Stat Med 1994; 13: 23–4, 2465-76
Article Google Scholar
Eliasziw M, Young SL, Woodbury MG, et al. Statistical methodology for the concurrent assessment of inter-rater and intra-rater reliability: using goniometric measurements as an example. Phys Ther 1994; 74 (8): 777–88
PubMed CAS Google Scholar
Krebs DE. Declare your ICC type [letter]. Phys Ther 1986; 66: 1431
PubMed CAS Google Scholar
Atkinson G. A comparison of statistical methods for assessing measurement repeatability in ergonomics research. In. Atkinson G, Reilly T, editors. Sport, leisure and ergonomics. London: E and FN Spon, 1995: 218–22
Google Scholar
Bland JM, Altman DG. A note on the use of the intraclass correlation coefficient in the evaluation of agreement between two methods of measurement. Comput Biol Med 1990; 20: 337–40
Article PubMed CAS Google Scholar
Myrer JW, Schulthies SS, Fellingham GW. Relative and absolute reliability of the KT-2000 arthrometer for uninjured knees. Testing at 67, 89, 134 and 178 N and manual maximum forces. Am J Sports Med 1996; 24 (1): 104–8
Article PubMed CAS Google Scholar
Quan H, Shih WJ. Assessing reproducibility by the withinsubject coefficient of variation with random effects models. Biometrics 1996; 52 (4): 1195–203
Article PubMed CAS Google Scholar
Lin LI-K. A concordance correlation coefficient to evaluate reproducibility. Biometrics 1989; 45: 255–68
Article PubMed CAS Google Scholar
Nickerson CAE. A note on ‘A concordance correlation coefficient to evaluate reproducibility’. Biometrics 1997; 53: 1503–7
Article Google Scholar
Atkinson G, Nevill A. Comment on the use of concordance correlation to assess the agreement between two variables. Biometrics 1997; 53: 775–7
Google Scholar
Stratford PW, Goldsmith CH. Use of the standard error as a reliability index of interest: an applied example using elbow flexor strength data. Phys Ther 1997; 77 (7): 745–50
PubMed CAS Google Scholar
Payne RW. Reliability theory and clinical psychology. J Clin Psychol 1989; 45 (2): 351–2
Article PubMed CAS Google Scholar
Strike PW. Statistical methods in laboratory medicine. Oxford: Butterworth-Heinemann, 1991
Google Scholar
Fetz CJ, Miller GE. An asymptotic test for the equality of coefficients of variation from k populations. Stat Med 1996; 15 (6): 646–58
Google Scholar
Allison DB. Limitations of coefficient of variation as index of measurement reliability [editorial]. Nutrition 1993; 9 (6): 559–61
PubMed CAS Google Scholar
Yao L, Sayre JW. Statistical concepts in the interpretation of serial bone densitometry. Invest Radiol 1994; 29 (10): 928–32
Article PubMed CAS Google Scholar
Detwiler JS, Jarisch W, Caritis SN. Statistical fluctuations in heart rate variability indices. Am J Obstet Gynecol 1980; 136 (2): 243–8
PubMed CAS Google Scholar
Stokes M. Reliability and repeatability of methods for measuring muscle in physiotherapy. Physiother Pract 1985; 1: 71–6
Article Google Scholar
Bishop D. Reliability of a 1-h endurance performance test in trained female cyclists. Med Sci Sports Exerc 1997; 29: 554–9
Article PubMed CAS Google Scholar
Bland JM, Altman DG. Comparing methods of measurement: why plotting difference against the standard method is misleading. Lancet 1995; 346 (8982): 1085–7
Article PubMed CAS Google Scholar
British Standards Institution. Precision of test methods I. Guide for the determination and reproducibility for a standard test method. BS5497: Pt 1. London: BSI, 1979
Google Scholar
de Jong JS, van Diest PJ, Baak JPA. In response [letter]. Lab Invest 1996; 75 (5): 756–8
Google Scholar
Wisen AG, Wohlfart B. A comparison between two exercise tests on cycle; a computerised test versus the Astrand test. Clin Physiol 1995; 15: 91–102
Article PubMed CAS Google Scholar
Wilmore JH, Costill DL. Physiology of sport and exercise. Champaign (IL): Human Kinetics, 1994
Google Scholar
Pollock ML. Quantification of endurance training programmes. Exerc Sports Sci Rev 1973; 1: 155–88
Article CAS Google Scholar
Doyle JR, Doyle JM. Measurement error is that which we have not yet explained. BMJ 1997; 314: 147–8
Article PubMed CAS Google Scholar
Schaefer F, Georgi M, Zieger A, et al. Usefulness of bioelectric impedance and skinfold measurements in predicting fat-free mass derived from total body potassium in children. Pediatr Res 1994; 35: 617–24
Article PubMed CAS Google Scholar
Webber J, Donaldson M, Allison SP, et al. Comparison of skinfold thickness, body mass index, bioelectrical impedance analysis and x-ray absorptiometry in assessing body composition in obese subjects. Clin Nutr 1994; 13: 177–82
Article PubMed CAS Google Scholar
Fuller NJ, Sawyer MB, Laskey MA, et al. Prediction of body composition in elderly men over 75 years of age. Ann Hum Biol 1996; 23: 127–47
Article PubMed CAS Google Scholar
Gutin B, Litaker M, Islam S, et al. Body composition measurement in 9-11 year old children by dual energy x-ray absorptiometry, skinfold thickness measures and bioimpedance analysis. Am J Clin Nutr 1996; 63: 287–92
PubMed CAS Google Scholar
Reilly JJ, Wilson J, McColl JH, et al. Ability of bioelectric impedance to predict fat-free mass in prepubertal children. Pediatr Res 1996; 39: 176–9
Article PubMed CAS Google Scholar
Wood TM. The changing nature of norm-referenced validity. In. Safrit MJ, Wood TM, editors, Measurement concepts in physical education and exercise science. Champaign (IL): Human Kinetics, 1989: 23–44
Google Scholar

Download references

Author information

Authors and Affiliations

Research Institute for Sport and Exercise Sciences, Liverpool John Moores University, Trueman Building, Webster Street, Liverpool, L3 2ET, England
Greg Atkinson & Alan M. Nevill

Authors

Greg Atkinson
View author publications
You can also search for this author in PubMed Google Scholar
Alan M. Nevill
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Greg Atkinson.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Atkinson, G., Nevill, A.M. Statistical Methods For Assessing Measurement Error (Reliability) in Variables Relevant to Sports Medicine. Sports Med 26, 217–238 (1998). https://doi.org/10.2165/00007256-199826040-00002

Download citation

Published: 23 September 2012
Issue Date: October 1998
DOI: https://doi.org/10.2165/00007256-199826040-00002

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Statistical Methods For Assessing Measurement Error (Reliability) in Variables Relevant to Sports Medicine

Abstract

Access this article

Similar content being viewed by others

The Mental Health of Elite Athletes: A Narrative Systematic Review

The Importance of Muscular Strength: Training Considerations

The Biomechanics of the Track and Field Sprint Start: A Narrative Review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Statistical Methods For Assessing Measurement Error (Reliability) in Variables Relevant to Sports Medicine

Abstract

Access this article

Similar content being viewed by others

The Mental Health of Elite Athletes: A Narrative Systematic Review

The Importance of Muscular Strength: Training Considerations

The Biomechanics of the Track and Field Sprint Start: A Narrative Review

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation