Statistics in Brief: The Importance of Sample Size in the Planning and Interpretation of Medical Research

Biau, David Jean; Kernéis, Solen; Porcher, Raphaël

doi:10.1007/s11999-008-0346-9

Statistics in Brief: The Importance of Sample Size in the Planning and Interpretation of Medical Research

Residents' Corner
Published: 20 June 2008

Volume 466, pages 2282–2288, (2008)
Cite this article

Clinical Orthopaedics and Related Research

David Jean Biau MD¹,
Solen Kernéis MD¹ &
Raphaël Porcher PhD¹

7517 Accesses
193 Citations
6 Altmetric
Explore all metrics

Abstract

The increasing volume of research by the medical community often leads to increasing numbers of contradictory findings and conclusions. Although the differences observed may represent true differences, the results also may differ because of sampling variability as all studies are performed on a limited number of specimens or patients. When planning a study reporting differences among groups of patients or describing some variable in a single group, sample size should be considered because it allows the researcher to control for the risk of reporting a false-negative finding (Type II error) or to estimate the precision his or her experiment will yield. Equally important, readers of medical journals should understand sample size because such understanding is essential to interpret the relevance of a finding with regard to their own patients. At the time of planning, the investigator must establish (1) a justifiable level of statistical significance, (2) the chances of detecting a difference of given magnitude between the groups compared, ie, the power, (3) this targeted difference (ie, effect size), and (4) the variability of the data (for quantitative data). We believe correct planning of experiments is an ethical issue of concern to the entire community.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical Power and Sample Size: Some Fundamentals for Clinician Researchers

Statistics: Setting the Stage

Power and Sample Size

References

Altman DG. Practical Statistics for Medical Research. London, UK: Chapman & Hall; 1991.
Google Scholar
Bacchetti P, Wolf LE, Segal MR, McCulloch CE. Ethics and sample size. Am J Epidemiol. 2005;161:105–110.
Article PubMed Google Scholar
Bailey CS, Fisher CG, Dvorak MF. Type II error in the spine surgical literature. Spine. 2004;29:1146–1149.
Article PubMed Google Scholar
Bauer P, Brannath W. The advantages and disadvantages of adaptive designs for clinical trials. Drug Discov Today. 2004;9:351–357.
Article PubMed Google Scholar
Bijur PE, Latimer CT, Gallagher EJ. Validation of a verbally administered numerical rating scale of acute pain for use in the emergency department. Acad Emerg Med. 2003;10:390–392.
Article PubMed Google Scholar
Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillsdale, NJ: Lawrence Earlbaum Associates; 1988.
Google Scholar
Ellenberg JH. Selection bias in observational and experimental studies. Stat Med. 1994;13:557–567.
Article PubMed CAS Google Scholar
Freedman KB, Back S, Bernstein J. Sample size and statistical power of randomised, controlled trials in orthopaedics. J Bone Joint Surg Br. 2001;83:397–402.
Article PubMed CAS Google Scholar
Goodman SN, Berlin JA. The use of predicted confidence intervals when planning experiments and the misuse of power when interpreting results. Ann Intern Med. 1994;121:200–206.
PubMed CAS Google Scholar
Halpern SD, Karlawish JH, Berlin JA. The continuing unethical conduct of underpowered clinical trials. JAMA. 2002;288:358–362.
Article PubMed Google Scholar
Handl M, Drzik M, Cerulli G, Povysil C, Chlpik J, Varga F, Amler E, Trc T. Reconstruction of the anterior cruciate ligament: dynamic strain evaluation of the graft. Knee Surg Sports Traumatol Arthrosc. 2007;15:233–241.
Article PubMed Google Scholar
Harris WH. Traumatic arthritis of the hip after dislocation and acetabular fractures: treatment by mold arthroplasty: an end-result study using a new method of result evaluation. J Bone Joint Surg Am. 1969;51:737–755.
PubMed CAS Google Scholar
Hoenig JM, Heisey DM. The abuse of power: the pervasive fallacy of power calculations for data analysis. The American Statistician. 2001;55:19–24.
Article Google Scholar
Hsieh FY, Bloch DA, Larsen MD. A simple method of sample size calculation for linear and logistic regression. Stat Med. 1998;17:1623–1634.
Article PubMed CAS Google Scholar
Jaeschke R, Singer J, Guyatt GH. Measurement of health status: ascertaining the minimal clinically important difference. Control Clin Trials. 1989;10:407–415.
Article PubMed CAS Google Scholar
Jennison C, Turnbull BW. Group Sequential Methods with Applications to Clinical Trials. Boca Raton, FL: Chapman & Hall/CRC; 2000.
Google Scholar
Kapoor B, Clement DJ, Kirkley A, Maffulli N. Current practice in the management of anterior cruciate ligament injuries in the United Kingdom. Br J Sports Med. 2004;38:542–544.
Article PubMed CAS Google Scholar
Kjaergard LL, Villumsen J, Gluud C. Reported methodologic quality and discrepancies between large and small randomized trials in meta-analyses. Ann Intern Med. 2001;135:982–989.
PubMed CAS Google Scholar
Machin D, Campbell MJ. Statistical Tables for the Design of Clinical Trials. Oxford, UK: Blackwell Scientific Publications; 1987.
Google Scholar
Marx RG, Jones EC, Angel M, Wickiewicz TL, Warren RF. Beliefs and attitudes of members of the American Academy of Orthopaedic Surgeons regarding the treatment of anterior cruciate ligament injury. Arthroscopy. 2003;19:762–770.
Article PubMed Google Scholar
Michener LA, McClure PW, Sennett BJ. American Shoulder and Elbow Surgeons Standardized Shoulder Assessment Form, patient self-report section: reliability, validity, and responsiveness. J Shoulder Elbow Surg. 2002;11:587–594.
Article PubMed Google Scholar
Mirza F, Mai DD, Kirkley A, Fowler PJ, Amendola A. Management of injuries to the anterior cruciate ligament: results of a survey of orthopaedic surgeons in Canada. Clin J Sport Med. 2000;10:85–88.
Article PubMed CAS Google Scholar
Moher D, Pham B, Jones A, Cook DJ, Jadad AR, Moher M, Tugwell P, Klassen TP. Does quality of reports of randomised trials affect estimates of intervention efficacy reported in meta-analyses? Lancet. 1998;352:609–613.
Article PubMed CAS Google Scholar
Pearson J, Neyman ES. On the use, interpretation of certain test criteria for purposes of statistical inference: Part I. Biometrika. 1928;20A:175–240.
Google Scholar
Pearson ES, Neyman J. On the use, interpretation of certain test criteria for purposes of statistical inference: Part II. Biometrika. 1928;20A:263–294.
Google Scholar
Rubinstein LV, Korn EL, Freidlin B, Hunsberger S, Ivy SP, Smith MA. Design issues of randomized phase II trials and a proposal for phase II screening trials. J Clin Oncol. 2005;23:7199–7206.
Article PubMed Google Scholar
Schulz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of bias: dimensions of methodological quality associated with estimates of treatment effects in controlled trials. JAMA. 1995;273:408–412.
Article PubMed CAS Google Scholar
Stenning SP, Parmar MK. Designing randomised trials: both large and small trials are needed. Ann Oncol. 2002;13(suppl 4):131–138.
PubMed Google Scholar
Sterne JA, Davey Smith G. Sifting the evidence: what’s wrong with significance tests? BMJ. 2001;322:226–231.
Article PubMed CAS Google Scholar
Vaeth M, Skovlund E. A simple approach to power and sample size calculations in logistic regression and Cox regression models. Stat Med. 2004;23:1781–1792.
Article PubMed Google Scholar

Download references

Acknowledgments

We thank the editor whose thorough readings of, and accurate comments on drafts of the manuscript have helped clarify the manuscript.

Author information

Authors and Affiliations

Département de Biostatistique et Informatique Médicale, INSERM – UMR-S 717, AP-HP, Université Paris 7, Hôpital Saint Louis, 1, avenue Claude-Vellefaux, Paris Cedex 10, 75475, France
David Jean Biau MD, Solen Kernéis MD & Raphaël Porcher PhD

Authors

David Jean Biau MD
View author publications
You can also search for this author in PubMed Google Scholar
Solen Kernéis MD
View author publications
You can also search for this author in PubMed Google Scholar
Raphaël Porcher PhD
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Jean Biau MD.

Additional information

Each author certifies that he or she has no commercial associations (eg, consultancies, stock ownership, equity interest, patent/licensing arrangements, etc) that might pose a conflict of interest in connection with the submitted article.

Appendices

Appendix 1

The sample size (n) per group for comparing two means with a two-sided two-sample t test is

$$ \hbox{n} = \frac{{2 \times ({\hbox{z}}_{{1 - \upalpha /2}} + \hbox{z}_{{1 - \upbeta }} )^{2} }} {{\hbox{d}_{\rm t} ^{2} }} + 0.25 \times \hbox{z}^{2}_{{1 - \upalpha /2}} $$

where z_1−α/2 and z_1−β are standard normal deviates for the probability of 1 − α/2 and 1 − β, respectively, and d_t = (μ₀ − μ₁)/σ is the targeted standardized difference between the two means.

The following values correspond to the example:

α = 0.05 (statistical significance level)
β = 0.10 (power of 90%)
|μ₀ − μ₁| = 10 (difference in the mean score between the two groups)
σ = 20 (standard deviation of the score in each group)
z_1−α/2 = 1.96
z_1−β = 1.28

Therefore:

$$ \hbox{n} = \frac{{2 \times (1.96 + 1.28)^{2} }} {{(10/20)^{2} }} + 0.25 \times 1.96^{2} = 85. $$

Two-sided tests which do not assume the direction of the difference (ie, that the mean value in one group would always be greater than that in the other) are generally preferred. The null hypothesis makes the assumption that there is no difference between the treatments compared, and a difference on one side or the other therefore is expected.

Appendix 2 Computation of Confidence Interval

To determine the estimation of a parameter, or alternatively the confidence interval, we use the distribution of the parameter estimate in repeated samples of the same size. For instance, consider a parameter with observed mean, m, and standard deviation, sd, in a given sample. If we assume that the distribution of the parameter in the sample is close to a normal distribution, the means, x_n, of several repeated samples of the same size have true mean, μ, the population mean, and estimated standard deviation,

$$ {\text{se}} = {{\text{sd}}} \mathord{\left/ {\vphantom {{{\text{sd}}} {{\surd{\text{n}} }}}} \right. \kern-\nulldelimiterspace} {{\surd{\text{n}} }}, $$

(1)

also known as standard error of the mean, and

$$ {{\left( {{\text{x}}_{{\text{n}}} - \upmu } \right)}} \mathord{\left/ {\vphantom {{{\left( {{\text{x}}_{{\text{n}}} - \mu } \right)}} {{\text{se}}}}} \right. \kern-\nulldelimiterspace} {{\text{se}}} $$

(2)

follows a t distribution. For a large sample, the t distribution becomes close to the normal distribution; however, for a smaller sample size the difference is not negligible and the t distribution is preferred. The precision of the estimation is

$$ 2 \times {\text{t}}_{{{\left( {1 - \upalpha /2} \right)},\;{\text{n}} - 1}} \times {\text{se}}, $$

(3)

and the confidence interval for μ is the range of values extending either side of the sample mean m by

$$ {\text{t}}_{{{\left( {1 - \upalpha /2} \right)},\;{\text{n}} - 1}} \times {\text{se}}. $$

(4)

For example, Handl et al. [11] in a biomechanical study of 21 fresh-frozen cadavers reported a mean ultimate load failure of 4-strand hamstring tendon constructs of 4546 N under dynamic loading with standard deviation of 1500 N. If we were to plan an experiment, the anticipated precision of the estimation at the 95% level would be

$$ 2 \times {\left( {{2.78 \times 1500} \mathord{\left/ {\vphantom {{2.78 \times 1500} {{\surd 5 }}}} \right. \kern-\nulldelimiterspace} {{\surd 5 }}} \right)} = 3725 $$

(5)

for five specimens,

$$ 2 \times {\left( {{2.26 \times 1500} \mathord{\left/ {\vphantom {{2.26 \times 1500} {{\surd {10} }}}} \right. \kern-\nulldelimiterspace} {{\surd {10} }}} \right)} = 2146\;{\text{for}}\;10\;{\text{specimens}}, $$

(6)

$$ 2 \times {\left( {{2.06 \times 1500} \mathord{\left/ {\vphantom {{2.06 \times 1500} {{\surd {25} }}}} \right. \kern-\nulldelimiterspace} {{\surd {25} }}} \right)} = 1238\;{\text{for}}\;25\;{\text{specimens}}, $$

(7)

$$ 2 \times {\left( {{2.01 \times 1500} \mathord{\left/ {\vphantom {{2.01 \times 1500} {{\surd {50} }}}} \right. \kern-\nulldelimiterspace} {{\surd {50} }}} \right)} = 853\;{\text{for}}\;50\;{\text{specimens}}, $$

(8)

$$ {\text{and}}\;2 \times {\left( {{1.98 \times 1500} \mathord{\left/ {\vphantom {{1.98 \times 1500} {{\surd {100} }}}} \right. \kern-\nulldelimiterspace} {{\surd {100} }}} \right)} = 595\;{\text{for}}\;100\;{\text{specimens}}{\text{.}} $$

(9)

The values 2.78, 2.26, 2.06, 2.01, and 1.98 correspond to the t distribution deviates for the probability of 1 − α/2, with 4, 9, 24, 49, and 99 (n − 1) degrees of freedom; the well known corresponding standard normal deviate is 1.96. Given an estimated mean of 4546 N, the corresponding 95% confidence intervals are 2683 N to 6408 N for five specimens, 3473 N to 5619 N for 10 specimens, 3927 N to 5165 N for 25 specimens, 4120 N to 4972 N for 50 specimens, and 4248 N to 4844 N for 100 specimens (Fig. 2).

Similarly, for a proportion p in a given sample with sufficient sample size to assume a nearly normal distribution, the confidence interval extends either side of the proportion p by

$$ {{{\text{z}}_{{\left( {1-\upalpha/2} \right)}}}} \times {\text{se}}\;{\text{with}}\;{\text{se}} = {\surd {{{\text{p}}{\left( {1 - {\text{p}}} \right)}} \mathord{\left/ {\vphantom {{{\text{p}}{\left( {1 - {\text{p}}} \right)}} {\text{n}}}} \right. \kern-\nulldelimiterspace} {\text{n}}} }. $$

(10)

For a small sample size, exact confidence interval for proportions should be used.

About this article

Cite this article

Biau, D.J., Kernéis, S. & Porcher, R. Statistics in Brief: The Importance of Sample Size in the Planning and Interpretation of Medical Research. Clin Orthop Relat Res 466, 2282–2288 (2008). https://doi.org/10.1007/s11999-008-0346-9

Download citation

Received: 01 November 2007
Accepted: 22 May 2008
Published: 20 June 2008
Issue Date: September 2008
DOI: https://doi.org/10.1007/s11999-008-0346-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistics in Brief: The Importance of Sample Size in the Planning and Interpretation of Medical Research

Abstract

Access this article

Similar content being viewed by others

Statistical Power and Sample Size: Some Fundamentals for Clinician Researchers

Statistics: Setting the Stage

Power and Sample Size

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1

Appendix 2

Computation of Confidence Interval

About this article

Cite this article

Keywords

Navigation

Statistics in Brief: The Importance of Sample Size in the Planning and Interpretation of Medical Research

Abstract

Access this article

Similar content being viewed by others

Statistical Power and Sample Size: Some Fundamentals for Clinician Researchers

Statistics: Setting the Stage

Power and Sample Size

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1

Appendix 2

Computation of Confidence Interval

About this article

Cite this article

Share this article

Keywords

Search

Navigation