In recent years, researchers have used Monte Carlo simulation methods to study the robustness and power of various analytic techniques. By means of simulation it is possible to generate not only normally distributed data but also data that reflect what is commonly found in real-world settings (Blanca, Arnau, López-Montiel, Bono, & Bendayan, 2013; Micceri, 1989). Thus, various Monte Carlo simulation studies have analyzed the fixed effects associated with time (repeated measures variable) using normally or nonnormally distributed data (Arnau, Bono, Blanca, & Bendayan, 2012; Arnau, Bono, & Vallejo, 2009; Kowalchuk, Keselman, Algina, & Wolfinger, 2004; Vallejo & Ato, 2006, among others). Most of these studies analyzed mixed models by generating data from an unstructured (UN) population covariance matrix with sphericity values of .57 and .75. In simulation studies of this kind one would ideally know if the sphericity estimated from the simulated data is equivalent to the sphericity that was fixed initially. However, no published studies have addressed this aspect.

In order to generate normally distributed data most simulation studies of repeated measures designs make use of the Cholesky decomposition of the correlation matrix (Lix, Algina, & Keselman, 2003). Among the various methods developed to generate nonnormal data (Fleishman, 1978; Headrick, 2002, 2004; L’Ecuyer, 1990; Marsaglia, 2003; Ramberg, Tadikamalla, Dudewicz, & Mykytka, 1979; Tadikamalla, 1980; Vale & Maurelli, 1983, among others), the method of Vale and Maurelli (1983) is one of the most widely used by simulation studies in the social sciences. According to Olvera Astivia and Zumbo (2015), this method has more than 130 citation counts on the ISI Web of Knowledge, and over 230 on Google Scholar. The procedures used to generate data can alter the sphericity of the fixed covariance matrix, since the process of data simulation involves two steps: the generation of a population covariance matrix from sphericity values, and the generation of normal or nonnormal data using this covariance matrix. It is in this second step that the Cholesky decomposition or the method of Vale and Maurelli would be applied.

The aim of the present study was to examine how the type of distribution, the sample size, the number of repeated measures, and the sphericity value of the population covariance matrix affect the sphericity estimation of simulated repeated measures data. In other words, we sought to determine the extent to which the fixed sphericity (population sphericity) differs from the estimated sphericity (sample sphericity). To this end, data were generated for both the normal distribution and nonnormal distributions commonly used in simulation studies. For each distribution we analyzed sphericity estimation bias in relation to different sample sizes, different numbers of repeated measures, and different sphericity values of the kind frequently found in simulation studies.

Vale–Maurelli method

The method of Vale and Maurelli (1983) is a multivariate extension of the method proposed by Fleishman (1978). The Fleishman method uses the polynomial transformation of normal variables:

$$ Y=a+bX+c{X}^2+d{X}^3, $$
(1)

where a, b, c, and d are the polynomial coefficients that control the first four moments of random variable Y, and X is a random variable distributed normally with mean zero and variance 1. The constant a is equal to –c.

The values of skewness (γ 1) and kurtosis (γ 2) are defined by

$$ {\gamma}_1=2c\left({b}^2+24bd+105{d}^2+2\right) $$
(2)

and

$$ {\gamma}_2=24\left(bd+{c}^2\left[1+{b}^2+28bd\right]+{d}^2\left[12+48bd+141{c}^2+225{d}^2\right]\right). $$
(3)

Vale and Maurelli (1983) extended this method to the generation of multivariate nonnormal distributions. To this end, they defined the vectors x and w, and the variable Y as:

$$ {\mathbf{x}}^{\mathrm{T}}=\left[1,X,{X}^2,{X}^3\right], $$
(4)
$$ {\mathbf{w}}^{\mathrm{T}}=\left[a,b,c,d\right], $$
(5)
$$ Y={\mathbf{w}}^{\mathrm{T}}\mathbf{x}, $$
(6)

where X is specified as in Eq. 1, and w T is the vector of polynomial weights that control the first four moments of the new nonnormal distribution Y.

Equation 7 represents the correlation coefficient of two nonnormal variables Y 1 and Y 2 generated from two normal variables X 1 and X 2

$$ {r}_{Y1Y2}=E\left({Y}_1{Y}_2\right)=E\left({\mathbf{w}}_1^T{\mathbf{x}}_1{\mathbf{x}}_2^T{\mathbf{w}}_2\right)={\mathbf{w}}_1^T{\mathbf{Rw}}_2, $$
(7)

where R = E(x 1 x T2 ).

The correlation between Y 1 and Y 2, expressed with the weights, is

$$ {r}_{Y1Y2}={\rho}_{X1X2}\left({b}_1{b}_2+3{b}_1{d}_2+3{d}_1{b}_2+9{d}_1{d}_2\right)+{r}_{X1X2}^2\left(2{c}_1{c}_2\right)+{r}_{X1X2}^3\left(6{d}_1{d}_2\right), $$
(8)

where ρ X1X2 is the correlation between the normal variables X 1 and X 2.

By solving for ρ X1X2, it is possible to find the intermediate correlation matrix and to specify all the elements that will serve to generate the data. In summary, the solution proposed by Vale and Maurelli (1983) calculates an intermediate correlation matrix. Its data are the same as the population correlation matrix and, given that one applies the Fleishman method to each marginal distribution, the correlation matrix is transformed to the desired one that is used to generate the data (Olvera Astivia & Zumbo, 2015).

A Monte Carlo study

Data were generated using SAS/IML (version 9.4), since this software is one of the most suitable for simulating data (Kashyap, Butt, & Bhattacharjee, 2009) and is also one of the most popular for implementing the Vale and Maurelli method (Keselman & Lix, 1997; Lix et al., 2003; Vallejo, Arnau, & Ato, 2007; Vallejo & Livacic-Rojas, 2005).

The first step involved generating the UN population covariance matrices from variances and correlations with sphericity values of ε = .57 and .75 for the different values of the repeated measures, K = 4, 6, 8, and 10 (Table 1). The sphericity value of the population covariance matrices was calculated using the Greenhouse–Geisser epsilon (Greenhouse & Geisser, 1959).

Table 1 UN population covariance matrices

In the next step, the RANNOR generator in SAS was used to obtain normally distributed multivariate pseudorandom observations by means of the Cholesky decomposition (Lix et al., 2003). The nonnormal data distributions were generated using the method of Vale and Maurelli (1983). For each nonnormal distribution, the vector of Eq. 5 was obtained using Fleishman (1978) coefficients in order to provide the desired degrees of multivariate skewness and kurtosis. Table 2 shows the Fleishman coefficients a, b, c, and d used to generate the nonnormal data. These coefficients correspond to exponential distributions, with fixed skewness (γ 1 = 0.8) and two values of kurtosis (γ 2 = 2.4 and 5.4), and to the log-normal distribution (γ 1 = 1.75 and γ 2 = 5.9).

Table 2 Values of Fleishman’s (1978) a, b, c, and d coefficients for each value of skewness and kurtosis for the distributions generated in the present study

Finally, the average sphericity of the UN covariance matrices of the simulated data was estimated using the Greenhouse–Geisser epsilon (Greenhouse & Geisser, 1959), which was obtained through proc glm in SAS. We then calculated the empirical bias between the initially fixed sphericity (population sphericity) and the sphericity estimated on the basis of the simulations (sample sphericity).

Study variables

Four variables were manipulated in this study.

Sample size

The sample sizes chosen were the same as or similar to the cell sizes most widely used in the simulation studies of repeated measures designs published since 1990 (Arnau, Bendayan, Blanca, & Bono, 2013a, b, 2014; Arnau et al., 2009; Keselman, Carriere, & Lix, 1993; Keselman, & Keselman, 1990; and Kowalchuk et al., 2004, among many others). On the basis of these studies, we chose to examine both very small (N = 5, 6, 7, and 10) and small (N = 12, 14, 15, 18, and 21) samples. In addition, and with the goal of determining the value of N at which the sphericity estimation bias approaches zero, we also included the medium (N = 30, 45, 60, and 75) and large (N = 90, 100, 200, 300, and 500) sample sizes that have been used in other simulation studies of repeated measures designs. The studies by Arnau et al. (2013a, b), Keselman, Algina, Kowalchuk, and Wolfinger (1998), Keselman et al. (1993), and Lix et al. (2003) examined medium group sizes. The study by Olvera Astivia and Zumbo (2015) examined large sample sizes with the aim of determining the properties of data generation algorithms for multivariate nonnormal data. In the study by Oberfeld and Franke (2013), both extremely small and larger sample sizes were examined, with the aim of evaluating the robustness of repeated measures analyses.

Degree of contamination of the distribution

The distributions selected were the normal distribution and a series of nonnormal distributions defined by the most common values of skewness and kurtosis, whether in simulation or empirical studies. In several simulation studies of repeated measures designs, the distributions were classified as either normal or slightly, moderately, or strongly biased distributions (Berkovits, Hancock, & Nevitt, 2000; Vallejo et al., 2007). Among the strongly biased distributions, a number of simulation studies have analyzed the log-normal distribution (Algina & Keselman, 1998; Keselman, Kowalchuk, & Boik, 2000; and Kowalchuk et al., 2004, among others).

The distributions used in the present study had positive values of skewness and kurtosis, given that such values are used in simulation studies and are also the most common found in distributions of psychological variables (Blanca et al., 2013). Regarding the degree of contamination, the extreme values chosen were γ 1 = 1.75 and γ 2 = 5.9, which correspond to the log-normal distribution, one of the most widely studied. The other two distributions analyzed had a fixed skewness, γ 1 = 0.8, and two values of kurtosis, γ 2 = 2.4 and γ 2 = 5.4. These values are well within the ranges of skewness and kurtosis that are observed in real-world settings (Blanca et al., 2013; Lei & Lomax, 2005), and they are also the values used in the study by Arnau et al. (2012).

Sphericity values

The sphericity indices used were ε = .57 and .75. The latter value was taken to be a good approximation to sphericity, whereas the former represented nonsphericity. Both values have been used in the majority of simulation studies of repeated measures designs (Algina & Keselman, 1998; Arnau et al., 2013a, b, 2014; Arnau et al., 2012; Arnau et al., 2009; Berkovits et al., 2000; and Keselman & Keselman, 1990, among many others).

Levels of the within-subjects factor

In the present study, we decided to use K = 4, 6, 8, and 10. It should be noted that the level of K = 4 is the most commonly found in simulation studies (Berkovits et al., 2000; Keselman et al., 2000; Kowalchuk et al., 2004; Lix et al., 2003; Tian & Wilcox, 2007; Vallejo et al., 2007). Eight repeated measures were used in the studies by Keselman et al. (2000), Kowalchuk and Keselman (2001), and Vallejo and Ato (2006). The intermediate value of K = 6 was also examined in the studies by Arnau et al. (2009), Arnau et al. (2012), Padilla and Algina (2007), and Wilcox (2006). Finally, we also analyzed an extreme number of repeated measures (K = 10), as was done in the simulation study by Ahmad, Werner, and Brunner (2008).

Each combination of sample size, distribution shape, sphericity, and number of repeated measures was replicated 1,000 times (18 × 4 × 2 × 4 × 1,000 = 576,000 simulations).

Data analysis

In order to simplify the statistical analysis, the sample size variable was recategorized into very small, small, medium, and large.

The univariate analyses of variance (ANOVAs) were performed using proc glm from SAS. Specifically, we conducted two separate 4 × 4 × 4 (N × Distribution × K) ANOVAs for each level of ε, and eight separate 4 × 4 (N × Distribution) ANOVAs for each level of ε and K. Post-hoc comparisons for each N and distribution were performed by means of the Bonferroni test. Polynomial contrasts using proc glm and forward multiple regression analyses using proc reg from SAS were performed for each level of K and sphericity, with the aim of examining the impacts of both distribution type and sample size on the empirical bias in sphericity. Finally, a 4 × 4 × 4 × 2 (N × Distribution × K × ε) ANOVA was conducted.

The values of empirical bias in the sphericity estimation were taken as the dependent variable, with sample size, distribution, sphericity, and the number of repeated measures being included as factors. Partial eta-squared η 2 p was calculated as a measure of effect size.

Results

In this section, we report the bias observed when estimating sphericity for the different sample sizes, numbers of repeated measures, and types of distribution when the sphericity of the population covariance matrix was .57 (Fig. 1) and .75 (Fig. 2). Bias was considered to be null when the deviation was close to zero, between –.080 and .080. This interval, which was chosen arbitrarily by the authors, is shown shaded in both figures.

Fig. 1
figure 1

Empirical bias with ε = .57 across the different Ns, distributions, and Ks (SK = skewness and KU = kurtosis)

Fig. 2
figure 2

Empirical bias with ε = .75 across the different Ns, distributions, and Ks (SK = skewness and KU = kurtosis)

In Fig. 1, which shows the empirical bias when ε = .57, it can be seen that with a small number of repeated measures (K = 4) the sphericity estimation is not biased, regardless of the distribution and sample size. However, as the number of repeated measures increases, the sphericity estimation shows a negative bias with very small and small sample sizes. This bias then approaches the interval between –.080 and .080 as sample size becomes medium or large. Thus, the N × K interaction is statistically significant [F(9, 224) = 58.857, p < .001, η 2 p = .703, observed power = 1]. There are also significant differences between sample sizes [F(3, 224) = 849.246, p < .001, η 2 p = .910, observed power = 1] and between distributions [F(3, 224) = 21.086, p < .001, η 2 p = .220, observed power = 1]. The normal distribution is the least biased, followed by the slightly skewed (γ 1 = 0.8 and γ 2 = 2.4), the moderately skewed (γ 1 = 0.8 and γ 2 = 5.4), and the severely skewed or log-normal (γ 1 = 1.75 and γ 2 = 5.9) distributions. Table 3 shows the results of the ANOVAs and the multiple comparisons for each of the plots shown in Fig. 1. The Bonferroni post-hoc tests indicate significant differences between all of the sample sizes considered (p < .001), except for the comparison of medium and large samples when K = 4. Regarding the distributions, significant differences are observed between the normal distribution and the log-normal distribution for any value of K, and also between the normal distribution and the moderately skewed distribution for K = 6 and K = 8. Finally, none of the N × Distribution interactions is statistically significant.

Table 3 F tests and Bonferroni post-hoc tests for ε = .57

Figure 2, which depicts the empirical bias when ε = .75, shows a notable increase in bias in comparison with Fig. 1. With K = 4, the bias is negative for very small samples, irrespective of their distribution. For small samples, bias is observed with nonnormal distributions. As the value of K increases, so does the extent to which sphericity is underestimated, this being the case even for medium-sized samples. This is reflected in the analysis of the N × K interaction [F(9, 224) = 33.711, p < .001, η 2 p = .575, observed power = 1]. As in Fig. 1, the effects of sample size and the type of distribution are statistically significant: F(3, 224) = 1,036.241, p < .001, η 2 p = .933, observed power = 1; and F(3, 224) = 23.879, p < .001, η 2 p = .242, observed power = 1. The multiple comparisons (Table 4) yield results similar to those of the previous analysis (Table 3). For K = 4, however, differences are now also observed between the slightly skewed and the log-normal distribution, and between the normal and the moderately skewed distribution, whereas for K = 6 and K = 8, there are differences between the slightly skewed and the log-normal distribution.

Table 4 F tests and Bonferroni post-hoc tests for ε = .75

The polynomial coefficients for each level of K and ε are shown in Table 5. The linear and quadratic components are significant (p < .001) in all of the models of analysis. The weight of the linear component is greater than that of the quadratic component, and both increase in line with the values of K and ε. The linear contrast estimates increase in a positive direction, whereas the increase in the quadratic contrast estimates follows a negative direction.

Table 5 Polynomial coefficients of the empirical bias for each level of K and ε

Equation 9 analyzes the regression model that includes N and the distribution for the different values of K and ε:

$$ \mathrm{Empirical}\;\mathrm{bias}={b}_0+{b}_1N+{b}_2\mathrm{Distribution}+e, $$
(9)

where b 0 is the constant, b i are the unstandardized estimated coefficients in the regression analysis for each of the explanatory variables defined previously, and e is the error term. The unstandardized estimated coefficients represent the predicted change in empirical bias for a one-unit change in the explanatory variable when all other explanatory variables are held constant. The b 1 and b 2 coefficients estimated using Eq. 9 are shown in Table 6. The results reveal a positive relationship between empirical bias and N, and a negative relationship between empirical bias and distribution. Note that sphericity is underestimated, such that the bias approaches zero as sample size increases, whereas the bias increases as the data deviate from the normal distribution. These effects are heightened as the value of K increases and when ε = .75.

Table 6 Forward multiple regression of the empirical bias using Eq. 9

If we compare the different plots shown in Figs. 1 and 2, it can be seen that the profile of the sphericity estimation bias for spherical matrices (ε = .75) and K = 4 is similar to that for nonspherical matrices (ε = .57) with K = 6, and that the profile of spherical matrices with K = 6 is similar to that of nonspherical matrices with K = 10. In other words, the profile of estimation bias for spherical matrices approaches that of nonspherical matrices as the number of repeated measures increases. The K × ε interaction is significant [F(3, 448) = 7.066, p < .001, η 2 p = .045, observed power = .981]. As the value of K increases, so does the difference in bias between ε = .57 and ε = .75. The N × ε interaction is also significant [F(3, 448) = 121.861, p < .001, η 2 p = .449, observed power = 1]. With very small and small sample sizes, bias is greater when ε = .75. Finally, the Distribution × ε interaction is not significant [F(3, 448) = 2.494, p = .059, η 2 p = .016, observed power = .617], whereas the effect of the sphericity variable is statistically significant [F(1, 448) = 924.704, p < .001, η 2 p = .674, observed power = 1].

In conclusion, the results confirm that the underestimation of sphericity is greater with very small and small sample sizes, as the number of repeated measures increases, and as the distribution deviates from normality. These effects are observed to a greater extent when the covariance matrix is spherical. Note that a negative bias is produced even with normal distributions.

Discussion

In this study, the Cholesky decomposition of the correlation matrix was used to generate normally distributed data, whereas nonnormal data were generated using the method of Vale and Maurelli. It is possible that these methods altered the covariance between the variables and, therefore, the value of sphericity. In addition, the covariances among the variables differed across the different distribution shapes, sample sizes, numbers of repeated measures, and sphericity.

We determined the range of estimated sphericity values of the covariance matrices that were generated. With nonnormal data, the sample sphericity would tend to decrease as the population sphericity increases, and therefore, the generated sphericity could be affected, especially for values of ε = .75. Thus, with spherical matrices the bias is greater with nonnormal distributions, smaller sample sizes, and as the value of K increases. This effect is also observed with normal distributions, albeit to a lesser extent. It can be stated, therefore, that with simulated data there will always be a mismatch between the population sphericity and the sample sphericity.

The results of this study suggest that as the sphericity of the population covariance matrix approaches 1, the sphericity calculated on the basis of simulated data tends to decrease. Furthermore, there is a certain equivalence between the profiles of sphericity estimation bias, since the sphericity estimation of spherical population matrices is similar to that of nonspherical matrices when the number of repeated measures in the latter increases. In other words, less bias is produced with nonspherical matrices, but it increases in line with the value of K, such that these matrices then behave as if they were spherical. These results are in line with what one would expect, because when estimating the error matrix for the calculation of the Greenhouse–Geisser epsilon, bias increases in line with the size of this matrix, and this bias is even greater when the population covariance matrix has a sphericity value close to 1.

The results also indicate that the population covariance matrix is transformed after generating nonnormal data by means of the Vale–Maurelli method. The same occurs, albeit to a lesser extent, when using the Cholesky decomposition to generate normal data. With both methods, sphericity is underestimated, especially when N is very small or small. This is due to the direct relationship between sample size and the variance estimation.

In summary, the estimation of sample sphericity is influenced not only by the type of distribution and the population sphericity, but also—and notably—by the number of repeated measures and the sample size. An inverse relationship between N and K is clearly observed (Oberfeld & Franke, 2013). When N is very small, an increase in K leads to greater bias than is the case when these two conditions are not fulfilled. None of these aspects has been considered before, and we have followed the data generation procedures typically used in simulation studies. Consequently, researchers should exercise caution when interpreting the results of their simulations, especially when working with small sample sizes. At all events, we believe that these results highlight an interesting point that could be addressed in future studies. In the context of such studies, the profiles of empirical bias presented here (Figs. 1 and 2) could be used by researchers to identify the extent to which the sphericity estimation is biased. The obtained results can be extended to real data, with applications for applied research in which it is necessary to know to what extent population sphericity and sample sphericity match and to ensure the power of the statistical model (e.g., Gracia, García, & Lila, 2008, 2014).

A final point to consider is that the results obtained here are limited to the conditions studied. Furthermore, the study has focused on the generation of unstructured population covariance matrices. In future studies, it would therefore be interesting to determine profiles of sphericity estimation bias for other population matrices, such as the first-order autoregressive covariance matrix, which provides a good fit to repeated measures data (Arnau et al., 2012; Keselman et al., 1998). Another avenue of interest would be to generate data when sample sizes are not equal for each value of K—that is, when there are missing data.