Our previous simulation models investigated the role of the intercept when estimating the Gaussian copula approach. While these focused studies help us understand the role of the intercept and sample size, they only use a single nonnormal distribution of the endogenous regressor (i.e., the uniform distribution) and a fixed error term correlation of 0.50. In this study, we broaden our scope and investigate three additional factors that are potentially important for the performance of the Gaussian copula approach. First, the level of the error correlation with the endogenous regressor defines the endogeneity problem’s severity, potentially affecting both the bias and the power of the Gaussian copula approach (for detailed expectations regarding the different assessment criteria, see Web Appendix 4, Table WA.4.1). Second, the approach requires nonnormality of the endogenous regressor, and Study 1 has highlighted that even the uniform distribution might not be sufficiently nonnormal to identify the model in smaller sample sizes. We therefore vary the endogenous regressor’s distribution. Third, we systematically vary the ratio of explained to unexplained variance (i.e., the R2) in the regression model. This is potentially important because the endogenous regressor’s different distributions imply different variance for this variable. Combined with a fixed error term variance, this would lead to different ratio of explained to unexplained variance, potentially confounding the effect of the distribution with R2 levels. In addition, the ratio of explained to unexplained variance influences the uncertainty in the parameter estimates (i.e., the parameters’ standard errors), potentially influencing the approach’s statistical power. Since most researchers use an intercept to estimate their regression models in practice, we will only focus on the performance of models estimated with intercept in this study. Finally, we again estimate our models with the control function and the maximum likelihood approach. The simulation’s detailed design, which is very similar to that of the previous studies, can be found in Web Appendix WA.4.
Since the endogenous regressor’s nonnormality is a prerequisite to apply the Gaussian copula approach, in practice, researchers usually test whether the encountered distribution is significantly different from a normal distribution. However, it is currently unknown when the endogenous regressor’s distribution is sufficiently nonnormal to allow the application of the Gaussian copula approach. We therefore also assess different nonnormality tests and simple moment measures, like skewness and kurtosis, to identify situations which support the reliable usage of the Gaussian copula approach. Our literature review reveals that 34 of the 69 (49.3%) studies use the Shapiro–Wilk test, 4 (5.8%) the Kolmogorov–Smirnov test, 4 (5.8%) the Kolmogorov–Smirnov test with Lilliefors correction, 2 (2.9%) the Anderson–Darling test, and 2 (2.9%) Mardia’s coefficient. Moreover, only two studies (2.9%) analyze the skewness. The remaining 21 studies (30.4%) do not test or do not report how they tested nonnormality. To assess which nonnormality test best captures the degree of nonnormality needed to identify the Gaussian copula approach, we include these and additional tests (i.e., Cramer-von Mises, Shapiro-Francia, Jarque–Bera, D’Agostino, and Bonett-Seier) that the literature suggests (e.g., Mbah & Paothong, 2015; Yap & Sim, 2011) in our simulation study.
The results presentation begins with the main effects of the potentially relevant factors, namely the sample size, R2, and endogeneity (error correlations), as well as their different levels, on the Gaussian copula’s performance (i.e., power and bias). Thereafter, we assess the effect of the endogenous regressor’s distribution (nonnormality) on power and bias. Next, we present the results of skewness and kurtosis as well as different nonnormality tests’ suitability to reliably identify endogeneity with Gaussian copula models. We focus our presentation on the control function approach’s results, which is the most common approach by far. Overall, the maximum likelihood approach yields similar results. The detailed results of the maximum likelihood approach are presented in the Web Appendix 4 (Table WA.4.2, Fig. WA.4.1).
Main effects of design factors
The results in Table 1 show the main effects of the sample size, R2, and endogeneity levels (error correlations) when averaged across the other simulation factors with regard to the mean and relative bias of the endogenous regressor and statistical power of the copula term and endogenous regressor (at the 5% error level).
We start the analysis by focusing on the copula term and the endogenous regressor’s statistical power. With respect to the copula term’s power, we confirm that it strongly depends on the sample size and only reaches acceptable levels beyond 2,000 observations. Moreover, the copula term’s power does not depend on the R2 level, but, as expected, depends strongly on the endogeneity level (i.e., the error correlation): the higher the error term correlation (i.e., the more severe the endogeneity problem), the higher the copula term’s power to identify endogeneity. This picture changes somewhat when we examine the endogenous regressor’s results. The endogenous regressor’s power again depends on the sample size, but also, as expected, on the R2 level: the power increases with increasing R2. It should be noted that, in this study, the endogenous regressor’s average power is higher than in the previous studies, because we consider higher R2 levels than in the original replication model (where we have an R2 of only about 10%). In contrast, the endogenous regressor’s power only depends marginally on the error term’s correlation level
With regard to the endogenous regressor’s bias, we find that it again depends strongly on the sample size. The bias decreases with increasing sample sizes. The bias is on average lower in this simulation than in the previous simulations, because we consider different endogeneity and R2 levels. However, the relative bias (i.e., the copula model’s bias divided by the endogenous model’s bias) follows the same pattern as our other simulation studies, reaching about 50% for sample sizes of 100 observations, which is similar to Studies 1 to 3 when we include the intercept in the estimation. Moreover, the endogenous regressor’s bias decreases with higher R2 levels, but the relative bias does not depend on R2 (i.e., the copula bias decreases with the same magnitude relative to the original regression’s bias). Finally, the bias also depends on the endogeneity level, with increasing bias with increasing error correlations. However, the endogeneity level again does not affect the relative bias, because the bias in the copula model increases proportionally to the bias in the original regression without copula.
We conclude and reconfirm that the Gaussian copula’s performance depends strongly on the sample size, with substantial effects on both power and bias. In contrast, the endogeneity level does not affect the copula model’s ability to correct the endogeneity bias as indicated by the relative bias, but does affect the copula term’s power. The higher the error term correlation (i.e., the more severe the endogeneity problem), the greater the power to identify endogeneity. Finally, we find that the R2 level is not relevant for the copula performance, as it neither affects the power nor relative bias. In the following analyses, we will therefore not further consider R2 variations and only focus on the interplay between the level of endogeneity, the sample size, and the distributional form.
We substantiate these findings by using a (logistic) regression with the copula and the endogenous regressor’s power, as well as the endogenous regressor’s mean and relative bias, as dependent variables and the design parameters as independent variables. The results indicate that the R2 level does not have a significant influence on the copula term’s power, or on the relative bias, while all the other simulation factors have significant effects (see the Web Appendix 4, Table WA.4.3).
Endogenous regressor’s distribution
Next, we analyze the power and relative bias of different distributional forms (i.e., different levels of nonnormality) when varying the sample sizes and the endogeneity levels. The results show that complex interactions between the distribution, sample size, and endogeneity level influence a copula term’s power (see Web Appendix 4, Fig. WA.4.2). For weak endogeneity problems, even heavily nonnormal distributions, like the log-normal or gamma distribution, show quite low power unless the sample sizes are very large. However, for larger error correlations, strongly nonnormal distributions also have sufficient power if the sample sizes are smaller.
In contrast, the endogeneity level does not affect the endogenous regressor’s relative bias. Our analysis indicates that only a combination of sample size and distributional form affects the relative bias and that larger sample sizes and the distributions’ higher nonnormality reduce the endogenous regressor’s relative bias (Fig. 6). Interestingly, we also observe a few situations in which heavily nonnormal distributions (i.e., some of the gamma, log-normal, and chi2) over-compensate the endogeneity bias in smaller sample sizes, resulting in a bias in the opposite direction of the original endogeneity bias (e.g., underestimating instead of overestimating the coefficient).
Since endogeneity is not observable a-priori, researchers can only assess the distribution’s nonnormality and the sample size to decide whether the Gaussian copula approach could be applied. Accordingly, several Gaussian copula applications in our literature review test the endogenous regressor’s nonnormality by using a nonnormality test, mostly the Shapiro–Wilk test. However, common nonnormality tests’ high sensitivity to small deviations from normality is a problem. In our simulation, for example, the Shapiro–Wilk test reports a significant (at p < 0.05) finding in 96% (94% with p < 0.01) of all the cases (Table 2). Only the D’Agostino and Bonett-Seier tests have sensitivity rates below 90%. In contrast, the copula term is only significant in 67% of the cases. Consequently, nonnormality test cannot help researchers directly decide whether a distribution is sufficiently nonnormal to apply the Gaussian copula approach. Owing to our simulation study, we find that the correspondence between the copula and the nonnormality test’s significance is relatively low (between 61% and 76%), with no test clearly outperforming the other (for the correspondence analysis, see the Web Appendix 4, Table WA.4.4). This outcome is roughly equivalent to the copula term’s power (i.e., 67%).
The analyzed p-values (i.e., 0.05 and 0.01) represent arbitrary cut-off levels that may reduce the correspondence greatly. We therefore also assessed the correlation between the copula term’s bootstrap t-statistic and the nonnormality tests’ test statistic. Table 2 shows that the Anderson–Darling and Cramer-von Mises tests have the highest correlation with the copula term’s bootstrap t-statistic. In addition, the results indicate that kurtosis and skewness alone are not good predictors of the copula term’s t-statistic. Nevertheless, it is interesting that skewness seems to be more important than kurtosis. Finally, we also find that the correlation between the VIF and copula t-statistic is not very pronounced.
Discussion and boundary conditions analysis
Based on Study 4’s simulation results, we find that the amount of explained variance has no noticeable influence on the Gaussian copula’s power. In contrast, and as expected, the endogeneity level has a strong effect (i.e., it is harder to identify a small endogeneity problem). However, even for high levels of endogeneity the Gaussian copula approach still performs poorly when sample sizes are small. We also confirm the sample size’s strong effect on the Gaussian copula’s power and bias, and the importance of the endogenous regressor’s nonnormality to identify the Gaussian copula’s parameter estimates. Consequently, researchers should use the Gaussian copula approach cautiously if they suspect the endogeneity problem is not pronounced (i.e., a small error correlation), the sample size is small, or the nonnormality is insufficient.
While the sample size is observable and the nonnormality can be analyzed, the Gaussian copula approach’s objective is to determine the endogeneity level, which is unknown a-priori. However, a failure to identify a significant copula does not necessarily imply the absence of endogeneity. It could imply a relatively small endogeneity problem (which might be negligible), but it could also imply an insufficient sample size or nonnormality. A sufficient sample size and the careful assessment of nonnormality are therefore particularly important for the Gaussian copula approach’s application.
Popular nonnormality tests, such as the Shapiro–Wilk test, which, according to our literature review, is the one most often used in Gaussian copula applications, do not identify sufficient nonnormality with common p < 0.05 (or p < 0.01) thresholds. These tests are too sensitive to small deviations from nonnormality that could lead to insignificant copula terms, even for substantial endogeneity problems (i.e., large error correlations). In addition, the nonnormality should specifically stem from skewness and not (only) from kurtosis. Our results show that nonnormal distributions with high kurtosis, but small skewness, perform relatively poorly regarding identifying the copula term with small to medium sample sizes. Researchers are therefore also advised to report these more descriptive nonnormality statistics when describing their variables’ nonnormality. Finally, we find that the Cramer-von Mises tests and the Anderson–Darling test seem to be the most promising candidates for identifying sufficient nonnormality, because they correlate best with the copula term’s t-statistic. This is not surprising, as both tests build on the empirical cumulative distribution function, which also underlies the Gaussian copula approach. The Cramer-von Mises test statistic is the integral of the squared deviation of the endogenous regressor’s empirical distribution and the theoretical normal distribution. The Anderson–Darling test is an extension of the Cramer-von Mises test that adds a weighting factor to put more weight on the distribution’s tails.
Using our simulation results, we subsequently derive actionable boundary conditions for the required nonnormality and sample size, and provide recommendations that could help researchers identify situations with sufficiently high copula term power in regression models with endogeneity. In general, we find a complex relationship between the sample size, the endogenous regressor’s nonnormality, and the Gaussian copula’s power level. We reveal, for example, that the lower the number of observations, the higher the skewness levels required to obtain power levels of 80% and higher (Web Appendix, Fig. WA.4.3). Similarly, we find that smaller sample sizes require higher levels of the Anderson–Darling and Cramer-von Mises test statistics for a copula power of at least 80%. These two test statistics’ required levels decrease with a higher number of observations. In contrast, we observe no clear pattern for the kurtosis, which is in line with its low correlation with the t-statistic.
To turn these findings into more actionable recommendations, we consider all observable characteristics of our models (e.g., sample size, skewness, kurtosis, R2, and nonnormality test statistics) to derive thresholds that will ensure that the Gaussian copula approach has a high power level. Researchers can use these thresholds as an approximate point of orientation to ensure the method’s effective use in their applications. We do so by employing decision tree analysis, using the C5.0 algorithm (Kuhn et al., 2020). Based on our simulation study’s results (i.e., Study 4 of regression models with intercept), our goal is to identify situations where the Gaussian copula approach has a power of at least 80%. Figure WA.4.4 (Web Appendix) shows a decision tree result in which we consider sample size, skewness, kurtosis, and R² for predicting the copula’s power (the latter two are not relevant and therefore do not appear in the decision tree). The classification error is 6.4% with 8 false negatives and 6 false positive out of 220 simulation design conditions (i.e., 20 distributions times, 11 sample sizes). According to the results, the sample size should be larger than 1,000 observations if the skewness is larger than 0.774. If the skewness is equal to or smaller than this level, more than 2,000 observations are required to obtain an 80% power level. For smaller sample sizes in the range between 400 to 1,000 observations, a skewness level of 1.932 is required to obtain adequate power. None of our distributions achieves a sufficient power level for the copula term for sample sizes of 200 observations or smaller. Please note that these findings are derived from the outcomes of the simulation studies, which are constrained by the parameter space of the simulation design. Therefore, these thresholds are an approximate point of reference to guide decision-making. Moreover, researchers must ensure that their empirical examples meet the other necessary conditions for using the Gaussian copula approach that we investigate in this research (see Fig. 8 for a comprehensive summary).
We ran similar decision tree analyses that considered the Anderson–Darling and the Cramer-von Mises test statistics (see the Web Appendix 4, Fig. WA.4.5). For example, if the Anderson–Darling (Cramer-von Mises) test statistic has a value larger than 18.964 (3.488), the Gaussian copula’s power is 80% and higher. With a sample size of more than 1,000 observations, a somewhat lower level of the test statistic, but larger than 15.159 (2.628), can achieve this power level.Footnote 6
In summary, the endogenous variable’s nonnormality, as indicated by minimum levels of skewness, and the Anderson–Darling or the Cramer-von Mises test statistics, in combination with a sufficiently large sample size, may ensure that the Gaussian copula approach has adequate power. Our study results suggest that researchers need to ensure that there are relatively high nonnormality levels, which should stem from the endogenous variable’s skewness, and a relatively large sample size, in order to apply the Gaussian copula approach adequately in regression models with intercept.