Underestimation of Pearson’s product moment correlation statistic
Abstract
Pearson’s product moment correlation coefficient (more commonly Pearson’s r) tends to underestimate correlations that exist in the underlying population. This phenomenon is generally unappreciated in studies of ecology, although a range of corrections are suggested in the statistical literature. The use of Pearson’s r as the classical measure for correlation is widespread in ecology, where manipulative experiments are impractical across the large spatial scales concerned; it is therefore vital that ecologists are able to use this correlation measure as effectively as possible. Here, our literature review suggests that corrections for the issue of underestimation in Pearson’s r should not be adopted if either the data deviate from bivariate normality or sample size is greater than around 30. Through our simulations, we then aim to offer advice to researchers in ecology on situations where both distributions can be described as normal, but sample sizes are lower than around 30. We found that none of the methods currently offered in the literature to correct the underestimation bias offer consistently reliable performance, and so we do not recommend that they be implemented when making inferences about the behaviour of a population from a sample. We also suggest that, when considering the importance of the bias towards underestimation in Pearson’s product moment correlation coefficient for biological conclusions, the likely extent of the bias should be discussed. Unless sample size is very small, the issue of sample bias is unlikely to call for substantial modification of study conclusions.
Keywords
Association Bias Correlation Pearson’s r SamplingIntroduction
The essence of much of the statistical treatment of data is making inferences about an underlying population from a sample. For example, to explore the foraging behaviour of bumblebees we might collect a sample of 25 Bombus terrestris and explore the relationship between distance from the nest and body masses of these 25 individuals. We might expect that heavier individuals forage more widely. A natural way to quantify such a relationship would be through the Pearson’s product moment correlation coefficient (hereafter called Pearson’s r). Advice on the effective use of this statistical measure was recently summarised by Puth et al. (2014), who also presented the results of a survey of published papers that suggested that this measure of association was commonly used across biology. We found 26 papers published in Oecologia in the last 12 months, for which a primary outcome of the study involved calculation of this statistic (see Supplementary Information). In this hypothetical bumblebee example, interest lies not in the association between foraging range and body mass in this sample of 25 individuals, but in the underlying population. That is, we want to use the sample to make inferences about the association between these two traits in the underlying population of all individuals of this species that could theoretically have been included in this sample. In fact, Pearson’s r is unusual among commonly used statistical measures in that the sample measure is not an unbiased estimator of the population value. Specifically, the correlation measured on the sample tends to underestimate the correlation that exists in the whole population. This phenomenon is well known in the statistics literature (see below), but is generally not mentioned in statistics texts aimed at biologists. Consequently, this effect generally goes unacknowledged and unappreciated in the biology literature [but see brief mention on p. 566 of Sokal and Rohlf (1981), and more full treatment in DeGhett (2014) for exceptions]. The large spatial scale at which ecologists work makes manipulative experiments often impractical, so correlative studies are more common than in fields such as animal behaviour. For this reason, it is vital that ecologists use the classical measure of correlation (Pearson’s r) as effectively as they can. Our aim here is to provide a summary of existing evidence supplemented by our own investigations to offer researchers in ecology clear advice on what to do about the bias in Pearson’s r.
Materials and methods
Review of the existing literature
Thus, on the basis of previous literature, it is already possible to offer clear advice to the researcher in many situations. Correction for the issue of underestimation should not be adopted if either or both of the underlying distributions deviate from normality—in such a situation the issue of violation of the assumption of normality is more of a concern than that of underestimation, alternative measures of association may be appropriate; and Bishara and Hittner (2012) and Puth et al. (2014) provide clear advice on how to deal with this. Secondly, if sample size is greater than around 30, then the issue of underestimation is trivial, and so there is no benefit in complicating the analysis of data by applying a correction. In the next section, we focus on closing the gap in the literature, to offer advice on correction for the situation where both distributions are well approximated by the normal distribution and the sample size is low (N < 30). In our survey of 26 recent Oecologia papers, sample size was 30 or less in 6 cases and could not be determined from the paper in 12.
Plan of our simulation studies
Estimations of the 95% confidence interval for the population value of Pearson’s r using three methods: BCa bootstrapping, F statistics and Z-statistics
N | Method | ρ = 0.0 | ρ = 0.1 | ρ = 0.3 | ρ = 0.5 | ρ = 0.7 | ρ = 0.9 | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
r | r* | r | r* | r | r* | r | r* | r | r* | r | r* | ||
10 | BCa | 0.954 | 0.940 | 0.958 | 0.942 | 0.955 | 0.937 | 0.946 | 0.959 | 0.932 | 0.932 | 0.920 | 0.925 |
F | 0.965 | 0.976 | 0.959 | 0.970 | 0.964 | 0.965 | 0.955 | 0.980 | 0.949 | 0.968 | 0.95 | 0.957 | |
Fisher Z | 0.964 | 0.962 | 0.959 | 0.975 | 0.963 | 0.968 | 0.952 | 0.965 | 0.948 | 0.973 | 0.949 | 0.957 | |
20 | BCa | 0.933 | 0.929 | 0.921 | 0.933 | 0.923 | 0.926 | 0.933 | 0.914 | 0.923 | 0.912 | 0.919 | 0.899 |
F | 0.953 | 0.976 | 0.932 | 0.977 | 0.951 | 0.973 | 0.947 | 0.965 | 0.943 | 0.951 | 0.944 | 0.928 | |
Fisher Z | 0.952 | 0.959 | 0.932 | 0.955 | 0.95 | 0.962 | 0.944 | 0.961 | 0.943 | 0.956 | 0.943 | 0.958 | |
30 | BCa | 0.931 | 0.929 | 0.939 | 0.937 | 0.949 | 0.912 | 0.928 | 0.915 | 0.940 | 0.881 | 0.937 | 0.875 |
F | 0.943 | 0.980 | 0.954 | 0.985 | 0.962 | 0.970 | 0.956 | 0.973 | 0.950 | 0.939 | 0.953 | 0.903 | |
Fisher Z | 0.943 | 0.963 | 0.954 | 0.965 | 0.962 | 0.949 | 0.956 | 0.964 | 0.950 | 0.953 | 0.952 | 0.946 |
Testing the null hypothesis ρ = 0 (at the significance level α = 0.05) for N = 10, 20 and 30 for population correlations ρ = 0, 0.1, 0.3, 0.5, 0.7, 0.9
N | Method | ρ = 0.0 | ρ = 0.1 | ρ = 0.3 | ρ = 0.5 | ρ = 0.7 | ρ = 0.9 | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
r | r* | r | r* | r | r* | r | r* | r | r* | r | r* | ||
10 | t* | 0.023 | 0.032 | 0.039 | 0.069 | 0.116 | 0.181 | 0.323 | 0.361 | 0.665 | 0.738 | 0.978 | 0.994 |
Permutation | 0.059 | 0.046 | 0.061 | 0.056 | 0.123 | 0.158 | 0.315 | 0.321 | 0.652 | 0.678 | 0.982 | 0.979 | |
Fisher Z | 0.032 | 0.013 | 0.047 | 0.026 | 0.126 | 0.080 | 0.339 | 0.236 | 0.692 | 0.582 | 0.99 | 0.974 | |
20 | t* | 0.019 | 0.031 | 0.061 | 0.075 | 0.242 | 0.281 | 0.652 | 0.662 | 0.952 | 0.957 | 1 | 1 |
Permutation | 0.045 | 0.049 | 0.066 | 0.094 | 0.264 | 0.247 | 0.646 | 0.642 | 0.96 | 0.962 | 1 | 1 | |
Fisher Z | 0.027 | 0.024 | 0.061 | 0.053 | 0.251 | 0.238 | 0.631 | 0.587 | 0.969 | 0.938 | 1 | 1 | |
30 | t* | 0.030 | 0.025 | 0.072 | 0.069 | 0.365 | 0.397 | 0.828 | 0.809 | 0.994 | 0.995 | 1 | 1 |
Permutation | 0.053 | 0.041 | 0.072 | 0.084 | 0.375 | 0.372 | 0.827 | 0.838 | 0.993 | 0.991 | 1 | 1 | |
Fisher Z | 0.030 | 0.017 | 0.068 | 0.072 | 0.387 | 0.333 | 0.836 | 0.813 | 0.991 | 0.993 | 1 | 1 |
Results
Table 1 gives no evidence to support adoption of the OPA correction for calculation of confidence intervals. Regardless of the method used, correction does not cause a general tendency to give coverage values closer to the nominal 0.95 value. There is perhaps a tendency for correction to lead to confidence intervals that are too wide (hence with coverage above 0.95), but this tendency is not consistent.
We now turn to Table 2 for testing the null hypothesis that ρ = 0. Considering type I error rate first, we find that all methods are overwhelmingly conservative, with type I error rates being mostly below 0.05: something that correction does not substantially change. Turning to power (with ρ = 0.1, 0.3, 0.5, 0.7, 0.9), we find unsurprisingly that the power for all (corrected and uncorrected) methods increases with sample size and with the population value of ρ . Puth et al. (2014) did not find a strong difference in power between the three uncorrected methods, and our results agree with this. We find the same to be true when comparing powers of the three corrected versions. Most importantly, for any specific method we do not observe correction offering a conspicuous and consistent improvement in power. Hence, we do not find strong evidence in support of correcting calculated correlation coefficients as part of null hypothesis testing.
Figure 1 shows that it appears that—irrespective of the size of r—where sample sizes are > 15, there is very little difference between r and OPA(r), a similar trend can be seen for the correction to z in Fig. 1b. From Fig. 2, it can then be observed that, firstly, such small samples can produce a broad range of different r values across our 1000 samples. Secondly, the mean r of the 1000 samples is lower than the population value of 0.25 (i.e. it is downwardly biased, as expected), but the mean value of OPA(r) is noticeably (slightly) closer to 0.25 (so the correction slightly reduced bias on average). Finally, the standard deviation and the mean squared error of the OPA-corrected values are larger than for the r values; this suggests that the reduction in bias through the use of OPA corrections comes at a cost in imprecision—and imprecision is a more dominant feature than bias in this example situation.
Discussion
On the basis of our survey of the literature and our own simulations, we can offer clear advice to the many researchers in ecology who use Pearson’s r in the statistical treatment of their data.
Firstly, they should be aware that the value measured on their sample will be more often biased towards underestimating than overestimating the true value of the underlying population they are interested in. This possible bias was not discussed in any of the papers in our survey.
Further, they should be aware that testing the null hypothesis of no association is conservative, rejecting the null hypothesis when it is true at lower than the nominal rate α. This hypothesis was tested in 21 of the 26 papers in our survey; but none of these discussed the conservatism of this test.
Next, they should not attempt any of the methods offered in the literature for correcting bias. No method yet developed offers consistently reliable performance. Additionally, the fact that the standard deviation of OPA-corrected values (Fig. 2b) was greater than that for the r values (Fig. 2a) illustrates that any reduction in bias through corrections could increase imprecision.
Finally, when discussing the importance of this bias towards underestimation for the biological conclusions to be drawn from their study, they should quantify the likely extent of this bias. We see in Fig. 1a that (regardless of the size of the actual correlation ρ) as long as N > 15, the difference between r and OPA(r) is less than 5% of r. Sample size was less than 15 in 3 papers out of 26 in our survey. Thus, unless sample size is very small, the issue of sample bias is unlikely to call for substantial modification of biological conclusions. For such sample sizes, statistical power is likely to be very low (see Tables 1, 2) and thus imprecision may often be a greater concern than bias even in this situation. In our survey of 26 papers, 1 provided a confidence interval, and none of the others discussed precision in any way. We have demonstrated here three simple and general ways that such a confidence interval can be calculated as a very useful aid to discussing imprecision of estimation.
Notes
Acknowledgements
We thank the two reviewers and a handling editor for valuable comments.
Author contribution statement
RKH and GDR conducted the literature review, ran the simulation studies and wrote the manuscript. MTP and MN provided essential statistical knowledge and R code for running the simulations. MTP and MN also both provided editorial advice and MN suggested additional simulations to develop the usefulness of an earlier version.
Funding
This study received no funding.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
Statement of human and animal rights
This article does not contain any studies with human participants or animals performed by any of the authors.
Supplementary material
References
- Bishara AJ, Hittner JB (2012) Testing the significance of a correlation with nonnormal data: comparison of Pearson, Spearman, transformation, and resampling approaches. Psychol Methods 17:399–417. https://doi.org/10.1037/a0028087 CrossRefPubMedGoogle Scholar
- Bishara AJ, Hittner JB (2015) Reducing bias and error in the correlation coefficient due to nonnormality. Educ Psychol Meas 75:785–804. https://doi.org/10.1177/0013164414557639 CrossRefPubMedGoogle Scholar
- DeGhett VJ (2014) Effective use of Pearson’s product-moment correlation coefficient: an additional point. Anim Behav 98:e1–e2. https://doi.org/10.1016/j.anbehav.2014.10.006 CrossRefGoogle Scholar
- Gorsuch RL, Lehmann CS (2010) Correlation coefficients: mean bias and confidence interval distortions. J Methods Meas Soc Sci 1:52–65. https://doi.org/10.2458/jmm.v1i2.114 CrossRefGoogle Scholar
- Hotelling H (1953) New light on the correlation coefficient and its transforms. J R Stat Soc B 15:193–232. http://www.jstor.org/stable/2983768. Accessed 5 May 2018
- Jeyaratnam S (1992) Confidence intervals for the correlation coefficient. Stat Probabil Lett 15:389–393. https://doi.org/10.1016/0167-7152(92)90172-2 CrossRefGoogle Scholar
- Muddapur MV (1988) A simple test for correlation coefficient in a bivariate normal distribution. Sankhyā Ser B 50:60–68. http://www.jstor.org/stable/25052522. Accessed 5 May 2018
- Olkin I, Pratt JW (1958) Unbiased estimation of certain correlation coefficients. Ann Math Stat 29:201–211. http://www.jstor.org/stable/2237306. Accessed 5 May 2018
- Puth MT, Neuhäuser M, Ruxton GD (2014) Effective use of Pearson’s product–moment correlation coefficient. Anim Behav 93:183–189. https://doi.org/10.1016/j.anbehav.2014.05.003 CrossRefGoogle Scholar
- Shieh G (2010) Estimation of the simple correlation coefficient. Behav Res Methods 42:906–917. https://doi.org/10.3758/BRM.42.4.906 CrossRefPubMedGoogle Scholar
- Sinsomboonthong J, Chantapoon Y, Ratanaphadit K, Palakas S, Chelong IA, Sdoodee S, Termkietpisan W, Bowichean R, Thanachit S, Anusontpornperm S, Kheoruenromne I (2013) Bias correction in estimation of the population correlation coefficient. Kasetsart J (Nat Sci) 47:453–459. http://www.thaiscience.info/journals/Article/TKJN/10898081.pdf. Accessed 5 May 2018
- Sokal RR, Rohlf FJ (1981) Biometry, 2nd edn. WH Freeman, New YorkGoogle Scholar
- Zimmerman DW, Zumbo BD, Williams RH (2003) Bias in estimation and hypothesis testing of correlation. Psicológica 24:133–158. http://www.redalyc.org/html/169/16924109/. Accessed 5 May 2018
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.