STANDARDIZED REGRESSION COEFFICIENTS AND NEWLY PROPOSED ESTIMATORS FOR R2 IN MULTIPLY IMPUTED DATA

Whenever statistical analyses are applied to multiply imputed datasets, specific formulas are needed to combine the results into one overall analysis, also called combination rules. In the context of regression analysis, combination rules for the unstandardized regression coefficients, the t-tests of the regression coefficients, and the F-tests for testing R2 for significance have long been established. However, there is still no general agreement on how to combine the point estimators of R2 in multiple regression applied to multiply imputed datasets. Additionally, no combination rules for standardized regression coefficients and their confidence intervals seem to have been developed at all. In the current article, two sets of combination rules for the standardized regression coefficients and their confidence intervals are proposed, and their statistical properties are discussed. Additionally, two improved point estimators of R2 in multiply imputed data are proposed, which in their computation use the pooled standardized regression coefficients. Simulations show that the proposed pooled standardized coefficients produce only small bias and that their 95% confidence intervals produce coverage close to the theoretical 95%. Furthermore, the simulations show that the newly proposed pooled estimates for R2 are less biased than two earlier proposed pooled estimates.


Introduction
Multiple imputation (Rubin 1987) is a widely recommended procedure to deal with missing data. The complete multiple-imputation procedure works as follows: First, for each missing value M values are filled in, based on a stochastic model that accurately describes the data. This results in M plausible complete versions of the incomplete dataset. Next, the statistical analysis of interest is applied to each of the M imputed datasets, resulting in M different outcomes of the same analysis. Finally, the M results are combined into one pooled statistical analysis, using specific formulas which take into account the variance of the specific parameter estimates as a result of the variance of the imputed values in the standard errors and p-values. These specific formulas are henceforth denoted combination rules.
In the specific context of regression analysis, several combination rules are available for pooling various important statistics. These combination rules involve formulas for pooling the regression coefficients (Rubin 1987), their t-tests (Rubin 1987), the degrees of freedom of these t-tests (Barnard and Rubin 1999), the F-test for testing R 2 for significance (Van Ginkel 2019;Rubin 1987), the F-test for testing R 2 -change for significance (Van Ginkel 2019;Rubin 1987), and the degrees of freedom for these F-tests (Reiter 2007). 186 PSYCHOMETRIKA Although combination rules for the above-mentioned statistics in regression have been wellestablished, little has been written on how to combine an important statistic in regression, namely R 2 . A few exceptions are Harel (2009) and Van Ginkel (2019). The small amount of literature on this topic is quite remarkable as R 2 is reported in virtually any study that uses linear regression as an indication of how well the model fits the data.
Additionally, the literature that does exist on this topic does not generally agree on what is the best way to pool R 2 . Harel (2009) proposed a pooled estimate for R 2 , which uses a Fisher z transformation. Van Ginkel (2019) criticized this method, arguing that its theoretical justification is incorrect. Van Ginkel proposed two alternatives, namely averaging R 2 across imputed datasets, and a transformation of the pooled F-value for testing R 2 for significance into a pooled estimate of R 2 . Van Ginkel showed that despite the incorrect theoretical justification, Harel's method still produced the best results in terms of bias, followed by averaging R 2 which produced slightly larger bias. Regarding the results of R 2 , this study is rather unsatisfactory as the theoretically least sound method (Harel's method) produced smallest bias, while the method with the best theoretical justification (transforming the F-value) produced the largest. Additionally, Van Ginkel pointed out a number of disadvantages of both Harel's method and averaging R 2 , which could be potential sources of bias, additional to the bias that R 2 already has in itself (e.g., Shieh 2008). In short, to date no pooled measure for R 2 in multiply imputed data seems to have been proposed that is without any theoretical objections.
Besides no general agreement on combination rules for R 2 in regression, no combination rules for the pooling of standardized regression coefficients have, to the author's knowledge, been developed at all. However, standardized coefficients are reported in many studies that use linear regression. In fact, in many research articles standardized regression coefficients are reported instead of unstandardized regression coefficients, as standardized coefficients give an indication of the effect size of the specific predictor, independent of the scales of the variables. The fact that standardized regression coefficients are reported regularly and are occasionally even reported instead of unstandardized regression coefficients shows that there is clearly a need for combination rules for standardized regression coefficients in multiply imputed datasets.
Van  stated that with a lack of a better alternative, an ad hoc method for combining standardized regression coefficients may be used, such as averaging the coefficients across the M imputed datasets. However, since standardized regression coefficients can be expressed as a function of the unstandardized regression coefficient, the standard deviation of the predictor, and the standard deviation of the outcome variable, there is another way to look at how the betas across multiply imputed datasets could be pooled: calculate the average of the regression coefficient across imputed datasets, calculate the average variance of the predictor and of the outcome variable across imputed datasets, and insert these quantities into the formula for the standardized regression coefficient.
Since the above-mentioned approaches are not equivalent, this firstly raises the question which approach (if any) is the correct one, and secondly, how to define "correct" in the context of multiple imputation. Normally, a convenient criterion for determining whether a parameter estimate is "correct," is whether it is unbiased when all the assumptions for the specific analysis have been met. However, since betas are biased even in complete data (Yuan and Chan 2011), showing analytically or by means of simulations whether these approaches give unbiased estimates is not an option here. Instead, a number of other statistical properties of betas in complete data will have to be defined. Once defined, it must be verified which of these approaches (if any) has all of these statistical properties. If neither has all of the properties, the next question becomes which approach will give the least bias in results and bias closest to that of the same data without missing values.
In the current paper, four different conditions are defined that have to be met in complete data for a parameter estimate to be a standardized regression coefficient. It should be noted that 187 these conditions are not based on statistical literature; they either follow from the properties of standardized coefficients directly, or have been found empirically in the author's search for combination rules for standardized coefficients, and the mathematical proof for them was derived afterwards. For both proposed sets of combination rules, it is discussed which of these conditions they satisfy. Additionally, combination rules are proposed for the confidence intervals for these standardized regression coefficients, based on the delta method (e.g., Jones and Waller 2013).
Next, these two approaches are used for the calculation of two newly proposed pooled estimates of R 2 in multiply imputed datasets. The two proposed pooled estimates of R 2 are supposed to overcome the problems of the approaches proposed by Harel (2009) andVan Ginkel (2019). After discussing the proposed combination rules for standardized coefficient and R 2 , a simulation study investigating the bias and coverage of the proposed combination rules will be discussed. Finally, a conclusion will be drawn about which combination rules for both R 2 and standardized coefficients are to be preferred.

Pooled Estimates for Standardized Regression Coefficients in Multiply Imputed Data
Define b j as the unstandardized population regression coefficient of predictor x j ( j = 1, . . . , k). Furthermore, define σ x j and σ y as the population standard deviations of x j and outcome variable y, respectively. Lastly, define β as a vector containing the standardized population regression coefficients of variables x 1 . . . x k , which can be computed as: (1) In complete data, a sample estimate for β, denotedβ, can be computed as follows: Suppose thatb j is a sample estimate of the unstandardized regression coefficient of predictor variable x j , s x j is the standard deviation of x j , and s y is the standard deviation of y. The set of standardized regression coefficients in the sample is:β (2) The first proposed set of pooled estimates for standardized regression coefficients in multiply imputed datasets is obtained by computing unbiased pooled estimates of b j , σ 2 x j , and σ 2 y , and substitute the estimate b j and the square roots of the estimates of σ 2 x j and σ 2 y into Eq. 1. Because the standardization of the b j takes place after the pooling of b j , σ 2 x j , and σ 2 y , the resulting pooled beta is denotedβ PS , where the subscript PS stands for Pooling before Standardization. Definē 188 PSYCHOMETRIKA The second set of pooled estimates is obtained by applying Rubin's rules for point estimates to standardized coefficients directly. In other words, average theβ j,m 's across M imputed datasets, and do this for all j. Here, the standardization takes place before the pooling, so the resulting coefficient is denotedβ SP , where SP denotes Standardization before Pooling.β SP is computed as: Next, the four conditions that can be defined for standardized regression coefficients in complete data are given. These four conditions are: 1. A standardized regression coefficient can be computed using an unbiased estimator of b j and estimators of σ x j and σ y of which the squared values are unbiased estimators of σ 2 x j and σ 2 y , and by inserting these estimates into Eq. 1. 2. Standardized regression coefficients can also be computed by performing a regression analysis to the standardized predictors and outcome variable. The resulting regression coefficients are standardized regression coefficients. 3. A t-test calculated for an unstandardized regression coefficient should be insensitive to standardization of the variables. Hence, the same t-test calculated for the corresponding standardized regression coefficient will have the same value as of the unstandardized regression coefficient. 4. When computed for simple regression,β 1 has the same value as the standardized regression coefficientβ 1,y that would be obtained if in the analysis the predictor x 1 and outcome variable y were switched. Bothβ 1 andβ 1,y are equal to the correlation between x 1 and y.
Regarding condition 3, it is important to note that the use of the standard t-test for regression coefficients would only be justified for standardized coefficients if the term s 2 x 1 s 2 y were a fixed quantity (Jones and Waller 2013, p. 437). Since this is not the case, alternative combination rules for t-tests testing standardized coefficients are needed, based on the delta method (Jones and Waller 2013, pp. 439-440). Such combinations rules are proposed later on. The more general point of condition 3 is that a t-test should be insensitive to linear transformations, regardless of its usefulness for the transformed data. Proof of the above four conditions in complete data is provided in Appendix I. In Appendix II, it is analytically derived which of the two sets of combination rules for the point estimates of standardized coefficients (β PS andβ SP ) meet which of the four conditions. Table 1 gives a summary of the analytical results in Appendix II.

Statistical Tests for Standardized Regression Coefficients in Multiply Imputed Data
As noted earlier, t-tests for standardized regression coefficients cannot be used for the standardized regression coefficients because the estimates of the variances of X and y are not fixed quantities. For complete data, Jones and Waller (2013, p. 440; also, see Yuan and Chan 2011) discuss a t-test forβ j using a standard error based on the delta method. Defineĉ j as the jth diagonal element of the covariance matrix of the predictors, S X . This standard error is computed as: Overview of which set of combination rules (β PS andβ SP ) satisfy which of the four conditions of standardized regression coefficients

Condition
Combination rulē β PSβSP 1.β j can be computed using unbiased estimators of b j , σ 2 x j , and σ 2 y . Y e s N o 2.β j can be computed by performing regression to standardized X and y Yes Yes 3. t-test calculated forb j has same value as same t-test calculated forβ j Yes No 4. In simple regression,β 1 ,β 1,y , and r x 1 y are the same.
No Yes For multiply imputed data, Applying Rubin's rules for standard errors, and extending the principle of Pooling before Standardization, the proposed within-imputation variance, between-imputation variance, and total variance ofβ j,PS are: The corresponding t-value testingβ j,PS for significance is: where df is approximated using an approximation from Barnard and Rubin (1999). Applying Rubin's rules for standard errors to the standard errors ofβ j,SP , the withinimputation variance, between-imputation variance, and total variance ofβ j,SP become: The corresponding t-value testingβ j,SP for significance is: Of the two above-described methods, the confidence interval and statistical test forβ j,SP have a better theoretical justification because the combination rules forβ j,SP as defined above are simply Rubin's combination rules applied to theβ j,m s directly, without any modification. When in complete data the sampling distribution ofβ j is (approximately) normal with the standard error as in Eq. 5, and in the incomplete-data case the imputation method is Bayesianly proper (Schafer 1997, p. 105) in principle, all conditions for applying Rubin's rules (1987, Chap. 3) have been met. In the next subsection, combination rules for R 2 that useβ PS andβ SP are derived.

Pooled Estimates for R 2 in Multiply Imputed Data
Thus far, three different point estimates for R 2 for multiple regression in multiply imputed data have been proposed. Harel (2009) proposed a pooled estimate based on a Fisher z transformation. This procedure works as follows: First, a Fisher z transformation is carried out to the square root of each R 2 m (m = 1, . . . , M). Next, the Fisher z transformations are averaged across the M imputed datasets. Finally, the average Fisher z-transformed square root of R 2 is transformed back using an inverse Fisher z transformation, and squaring the result. The resulting pooled estimate is denoted R 2 .
Van Ginkel (2019) criticized Harel's approach, arguing that the use of a Fisher z transformation is based on an incorrect justification. Instead, he proposed two alternative pooled estimates for R 2 : The average of the R 2 m s across imputed datasets, denotedR 2 , and an estimate that is computed from the pooled F-value for testing R 2 for significance, denoted R 2 F . In a simulation study, Van Ginkel (2019) showed that R 2 F was substantially lower than the R 2 of the same data without missing values and was thus not recommended.R 2 and R 2 , on the other hand, were somewhat more positively biased than the R 2 of the same data without missing values. This overestimation was somewhat larger forR 2 than for R 2 .
Although bothR 2 and R 2 gave substantially better estimates of R 2 than R 2 F , Van Ginkel (2019) still pointed out a potential problem of both methods. When ρ 2 is close to zero, both R 2 and R 2 will be higher than R 2 in case of no missing data. This phenomenon can best be demonstrated using the bivariate case with only one predictor. When there is hardly a relationship between x 1 and y, it may happen that in some imputed datasets the correlation between x 1 and y is slightly negative and in other imputed datasets slightly positive. Since in simple regression R 2 equals the squared correlation between both variables, averaging R 2 m s will ignore the sign differences in correlations across imputed dataset. Consequently,R 2 will be higher than the square of the average correlation across imputed datasets. R 2 has the same problem because it also ignores sign differences between correlations across imputed datasets.
In the multiple-regression case, something similar may happen when in one imputed dataset the regression coefficient of variable x j is slightly negative and in another imputed dataset slightly positive. If the averaging takes place at the level of (a transformation of) R 2 rather than at the level of the individual regression coefficient in coming to a pooled estimate of R 2 , predictors with almost zero relation with the outcome variable will give a larger contribution to the pooled estimate of R 2 than they are supposed to give.
For simple regression, the problem may be resolved by averaging the correlation across imputed datasets, and taking the square of the average. Unfortunately, this cannot be done in multiple regression because in multiple regression the only correlation that can be squared to arrive at R 2 is the correlation between the observed and predicted values of y, which is always positive (in simple regression, this correlation equals the absolute value of the correlation between x 1 and y). However, we can formulate a generalization of squaring the average correlation to multiple regression by making use of the fact that in complete data, R 2 may also be expressed as: By filling in pooled estimates for r x j y andβ j , we can come at a pooled estimate of R 2 in multiply imputed data, in which variables with weak relations with the outcome variable will not have a disproportionally large contribution. Using the two sets of pooled estimates for standardized regression coefficients that were previously defined, two pooled versions of R 2 are proposed. The first pooled estimate uses the elements inβ PS and the pooled standard deviationss x j ands y . As already mentioned, in complete data and simple regression the standardized coefficient and the correlation between the predictor and the outcome are equal. Making use of this fact, pooled estimates of the correlations between the predictors and the outcome are computed by carrying out simple regressions to the M imputed datasets with y as the outcome variable, for each predictor x j . For each variable, a pooled standardized simple regression coefficient is obtained using the method ofβ PS . This estimate is denoted r x j y,PS .
As a pooled estimate forβ j in Eq. 6, we useβ j,PS . The resulting pooled estimate for R 2 is: Although it is shown in Table 1 (condition 4) that in simple regressionβ 1,PS is not equal to the average correlation between x 1 and y, it was still decided to use r x j y,PS (which equalsβ 1,PS in a simple regression of x j on y) as an estimate of the correlation inR 2 PS . The reason is that if the average correlation across imputed datasets were used, the resulting pooled R 2 would not be equal to the square of the pooled standardized regression coefficient anymore, in case of simple regression.
The second pooled estimate of R 2 is based onβ SP . Definer x j y as the average correlation between x j and y across imputed datasets (which is, in a simple regression of x j on y, equal tō β 1,SP ). Substitutingr x j y for r x j y in Eq. 6, andβ j,SP obtained from the multiple regression of X on y forβ j , we get: x j yβ j,SP .

Evaluating Advantages and Disadvantages of each Approach
An advantage ofR 2 SP overR 2 PS is that in simple regression,R SP , |r x 1 y |, and |β 1,SP |, are all equal; just like in complete data, R, |r x j y |, and |β 1 | are equal (condition 4, Table 1). The same is not true forR 2 PS , of which the square root in case of simple regression will not equal |r x j y,PS |. This also illustrates an important advantage ofβ SP overβ PS because the elements inβ SP are used for calculatingR 2 SP . Thus, when in a multiply imputed dataset a simple regression is carried out, the values ofR 2 SP ,r x 1 y ,β 1,SP , and evenβ 1,y,SP all relate to one another the way they are supposed to be related.
A disadvantage ofβ SP is that it violates condition 3 (Table 1). This is an advantage ofβ PS over β SP . However, the question is to what extent this violation of condition 3 is problematic. Given that standardized regression coefficients are tested using a different t-test anyway, this violation of assumption might as well simply be ignored.
Since all proposed pooled estimates of R 2 and β have their disadvantages, and none of them meet all of the necessary conditions, an important question becomes how well they perform in terms of bias and coverage. If one method clearly produces a less biased estimate of the population parameter in question and the population parameter is better covered by the confidence interval than the other, the method producing the least bias and best coverage is the preferred one. To answer the question about bias and coverage, all methods discussed here were investigated in a simulation study, which is discussed next.

Fixed Design Characteristics
The general form of the simulation model in this study was: (a regression model with four predictors). The values of b 0 to b 4 were varied and are discussed later on in the independent variables section. The means of predictors x 1 to x 4 were μ = (μ 1 , μ 2 , μ 3 , μ 4 ) = (2, 5, 10, 20). The covariance matrix for the predictors is: Values of μ and were based on earlier studies by Harel (2009) and Van Ginkel (2019). The variance of the error term ε i was set at σ 2 ε = 0.6. Using these means and covariances, in each design cell, one thousand (D = 1000) replicated sets of predictors were drawn from a specific population distribution and with a specific sample size (both to be discussed in the independent variables section). Additionally, for each case in the sample, an ε i was drawn from N (0, σ 2 ε ). In each replicated dataset, y was constructed using the model in Eq. 7.
Next, the complete datasets were made incomplete with a specific percentage of missingness, and using a specific missingness mechanism. The resulting incomplete datasets were multiply imputed using fully conditional specification (FCS; Van Buuren 2012, p. 108-116;Van Buuren et al. 2006) using the mice package (Van Buuren and Groothuis-Oudshoorn 2011) in R (R Development Core Team 2018). The number of imputations was fixed at M = 25.

Independent Variables
Value of ρ 2 Two values of ρ 2 were studied. Their values and the values of their corresponding regression coefficients were: The corresponding standardized regression coefficients (computed using Eq. 2) were: Sample size Two different sample sizes were studied: N = 100 and N = 500. Population distribution To see how robust the proposed combination rules for standardized regression coefficients and R 2 were to different types of population distributions, the predictors were simulated under a multivariate normal distribution and under a multivariate lognormal distribution with the same means and covariances. Skewness and kurtosis of predictors x 1 to x 4 were (4.75, 1.43, 0.68, 0.48) and (57.60, 3.85, 0.84, 0.41), respectively. Random values from a multivariate normal distribution were generated using the mvnorm procedure in the MASS package (Venables and Ripley 2002) in R; random values from a lognormal distribution were generated using the mvlognormal function in the MethylCapSig package (Ayyala et al. 2016).
Within the mice package, there are two different variants of FCS, namely the regression approach (e.g., Van Buuren 2012, p. 13; Little and Schenker 1995, p. 60) and predictive mean matching (Van Buuren 2012, p. 68-74;Van Buuren et al. 2006;Rubin 1986). The regression approach imputes each variable using a normal regression model with the other variables as predictors for the missing data. When the data are multivariate normally distributed, this is the preferred method for imputation. However, when the data are not normally distributed (such as in case of a multivariate lognormal distribution), the regression approach may not preserve the shapes of the distributions of the variables. Predictive mean matching is a method that has been shown to better preserve distributional shapes when data are not normally distributed than the regression approach (Marshall et al. 2010a;Marshall et al. 2010b). Thus, for the current simulation study it was decided to impute the data using the regression approach when the data were normally distributed, and to use predictive mean matching when the data were multivariate lognormally distributed.
Percentage of missingness In each simulated complete dataset, missingness was simulated with three different percentages of missing data: 12.5%, 25%, and 50%. These percentages were based on Van Ginkel (2019).
Missingness mechanism Missingness was simulated according to two missingness mechanisms, namely missing completely at random (MCAR) and missing at random (MAR; Little and Rubin 2002, p. 10). Under MCAR, variable x 1 was always observed; for the remaining variables x 2 to y, the following strategy was adopted: An N × k matrix A with uniform random numbers ranging from 0 to 1 was generated. For a specific proportion of missingness c, the c × N × k highest entries in matrix A were removed in matrix [x 2 , x 3 , x 4 , y].
Under MAR, besides matrix A the following matrix W = x 1 1 k −[min(x 1 )−0.01]×1 N 1 k was computed. The c × N ×k highest entries in matrix W * A were removed in matrix [x 2 , x 3 , x 3 , y]. Thus, the higher the value of x 1 , the higher the probability of missing data on the other variables.
Set of combination rules The influence of two different sets combination rules on the estimates of the standardized regression coefficients and their confidence intervals was studied, namelyβ PS andβ SP . For estimates of R 2 , the influence of four different combination rules was studied: the two newly proposed pooled estimates,R 2 PS andR 2 SP , and two existing methods, namely R 2 (Harel 2009), andR 2 (Van Ginkel 2019). Methods R 2 andR 2 were included to serve as a benchmark for comparing the new methods with. The development ofR 2 PS andR 2 SP as pooled estimates of R 2 can be considered successful when they produce less bias than the existing simple method that lacks a theoretical justification (R 2 ) and produce at least as little bias as (but preferably less bias than) the existing method that works better thanR 2 , but has an incorrect justification (R 2 ).

Dependent Variables
The dependent variables in this study were 1) the biases in the sample estimates of parameters β 1 to β 4 , 2) the number of times β 1 to β 4 were covered by the confidence intervals of their corresponding sample estimates, denoted coverage percentages, and 3) the bias in the sample estimate of ρ 2 . For comparison, the same outcome measures in the original data without missing values are reported as well.

Results
When visually inspecting the results of the standardized regression coefficients, it turned out that with the exception of estimates and coverage percentages of β 1 , results were very similar across standardized coefficients of the different predictors. Unlike the other β j estimates, estimates for β 1 had coverage percentages as low as 90% under ρ 2 = 0.455 and a multivariate lognormal distribution of the predictors, for both the imputed data and the original data without missing values. Under a multivariate lognormal distribution, x 1 was more heavily skewed to the right than the others (skewness of above 4 versus values around 1 for the other variables). Since Yuan and Chan (2011) point out that the coverages of the confidence intervals of standardized regression coefficients are influenced by the shape of the multivariate distribution of X as well, and the poor coverage of β 1 was found for both the imputed data and the original data without missing values, it was concluded that this was most probably not a problem of multiple imputation but of the delta method-based confidence interval itself. For this reason, it was decided to ignore this finding and focus only on the remaining three standardized regression coefficients. Because of the size of the simulation design, a selection of factors to be displayed in a results table was made, based on effect sizes of ANOVAs. 1 These effect sizes revealed that in general, 196 PSYCHOMETRIKA the value of ρ 2 , sample size, missingness mechanism, and set of combination rules were the most influential factors. The results of the simulation study are given in Table 2 for all combinations of the different levels of these specific factors. Results for the estimated β j s are aggregated across predictors, because it was found that differences in bias and coverage between different predictors were not substantial. The complete results are provided in supplemental material.
The table shows that in general, the differences between the estimates ofβ PS andβ SP are negligibly small and often not even visible in the third decimal. In general, the bias in the estimated β j s is close to 0. Additionally, for both methodsβ PS andβ SP , the coverage of the confidence intervals is close to the theoretical 95% in all situations.
Like the results forβ PS andβ SP , the results forR 2 PS andR 2 SP are similar across all situations displayed in the table. However, there are clear differences betweenR 2 PS andR 2 SP on the one hand, and R 2 andR 2 on the other hand. For the newly proposed estimates, the biases are substantially smaller than the biases of R 2 andR 2 are.

Discussion
In the current paper, two procedures for pooling standardized regression coefficients (β PS andβ SP ), and two new pooled estimates of R 2 (R 2 PS andR 2 SP ) in multiply imputed datasets, were proposed and investigated. Neither of the two procedures for standardized coefficients meets all the defined conditions for a statistic to be a standardized coefficient, as shown in Appendix II. However, simulation results showed that both methods produce very similar estimates and produce estimates that are close to the estimates of the original data and close to the corresponding population values. With the exception of β 1 under ρ 2 = 0.45 and in skewed data, confidence intervals of both methods produced coverage percentages close to 95% for all other regression coefficients, and in all situations. Additionally, the two newly proposed pooled estimates of R 2 , namelyR 2 PS andR 2 SP , outperformed the existing estimates R 2 andR 2 in terms of bias. Both methods may thus be better alternatives than R 2 andR 2 .
Although the poor coverage of the confidence interval of β 1 in some situations may seem disappointing at first, it does not imply that the development of these combination rules for confidence intervals of standardized coefficients has been unsuccessful. Yuan and Chan (2011) already showed that even in complete data the coverage of the confidence intervals of the standardized regression coefficients depends on the distribution of the predictors. The fact that the coverage of β 1 under ρ 2 = 0.45 and in skewed data was not any worse in the multiply imputed data than in the original data, suggests that this is a more of a problem of confidence intervals of standardized regression coefficients in general than of the proposed combination rules.
Additionally, it is worth mentioning that variable x 1 did not have any missing data. Consequently, the estimates of β 1 and its confidence interval may have been less affected by imputed values and the combination rules than the estimates of the other standardized coefficients (although not entirely unaffected as imputed values on other variables will affect the estimation of the entire regression model, including β 1 ). The fact that coverage of β 1 was poor even though x 1 did not contain any missing data while the coverages of other coefficients was satisfactory, makes an even stronger case for the hypothesis that the poor coverage of β 1 was caused by the distribution of x 1 rather than by the proposed combination rules.
The poor results found for the coverage of β 1 do, however, raise the question how to obtain a valid confidence interval for standardized coefficients in multiply imputed data, when the predictors are highly skewed. In general, when analytic confidence intervals break down, bootstrap confidence intervals may provide a solution. Van Ginkel and Kiers (2011) proposed combination rules for bootstrap confidence intervals of principal component loadings. Their procedure may be generalizable to confidence intervals for standardized regression coefficients. However, whether these combination rules would give accurate coverage when applied to standardized coefficients is the topic of another study.
The remaining question is whetherβ PS andR 2 PS should be preferred, orβ SP andR 2 SP . This question is not easily answered. When looking at the conditions that have to be met for regression coefficients to be standardized coefficients,β PS meets three of the four conditions, whileβ SP meets only two. Using this as a criterion, it would follow thatβ PS is the preferred method for pooling standardized regression coefficients and that the correspondingR 2 PS is the preferred method for pooling R 2 .
However, when looking at the specific conditions thatβ SP does not satisfy, it may be wondered to what extent this is problematic. The first condition thatβ SP does not satisfy is that a standardized regression coefficient can be computed using unbiased estimators of b j , σ 2 x j and σ 2 y . This violation is mainly a problem from a theoretical point of view. As long as it does not cause substantial differences in simulation results betweenβ SP andβ PS , the violation of this condition has little relevance.
The second condition thatβ SP violates is that a t-test calculated for an unstandardized regression coefficient should be insensitive to standardization of the variables. On the one hand, this could be considered a disadvantage ofβ SP compared toβ PS . On the other hand, the fact that this condition is violated forβ SP actually illustrates an important property of the specific data transformation used for standardized regression coefficients. Although this transformation is linear, it is sample dependent because it depends on the sampling estimates s 2 x j and s 2 y . As these estimates get affected by the imputed values in multiply imputed data as well, it makes sense that the standardization is slightly different for each imputed dataset.
The different standardizations across imputed datasets may cause the pooled t-tests of the unstandardized regression coefficient to be different from the same pooled t-tests applied to the standardized variables. However, the fact that the specific standardization of variables is dependent on the sample, is a reason why a different t-test (Jones and Waller 2013) is used for standardized regression coefficients in the first place. When standardized regression coefficients are calculated for each imputed datasets in the same way as one would normally do in complete data, and Rubin's rules are applied to the resulting coefficients and their correct standard errors next, this will automatically lead to the combination rules for point estimates and standard errors as defined by combination methodβ SP . Given that with the corrected standard error (Jones and Waller 2013) the sampling distribution of a standardized regression coefficient is (approximately) normal and Rubin's combination rules have been defined under the assumption of a normal sampling distribution of the specific parameter, there seems to be no theoretical objection to usinḡ β SP as a method for pooling standardized regression coefficients and their confidence intervals.
The combination rules for the confidence intervals defined by methodβ PS , on the other hand, are not a direct application of Rubin's rules to the standardized regression coefficients of each individual imputed dataset. Of course, this is only a problem from a theoretical point of view as the simulation results show that confidence intervals of methodsβ PS andβ SP cover the population coefficients equally well.
Given that the theoretical objections to either method given so far are largely irrelevant when both methods perform equally well in terms of bias and coverage, the question is whether there is any reason left to prefer one method over the other. The answer to that question may lie in the only condition that the point estimates produced byβ PS do not satisfy. When simple regression is applied to a complete dataset, the estimates of β 1 , β 1,y , and ρ x 1 y are all equal. This is not the case for the estimates produced byβ PS , while it is the case forβ SP . For interpretational purposes, it may be important that researchers are not faced with inconsistencies among the estimates of β 1 , β 1,y , and ρ x 1 y . Even when the differences are small, this could still cause some confusion on the part of the researcher. It could therefore be argued thatβ SP andR 2 SP are slightly preferred as pooled estimates for standardized coefficients and percentage of explained variance overβ PS and R 2 PS , respectively. To conclude, this paper has proposed two sets of combination rules for standardized regression coefficients and their confidence intervals, and R 2 in multiply imputed datasets. Since the proposed methods all perform well in terms of bias and coverage, and the proposed methods for R 2 produce less bias than the earlier proposed measures of R 2 in various situations, one of these sets of combination rules may henceforth be used in future studies that use multiple regression in a multiply imputed dataset, with a slight preference forβ SP andR 2 SP .
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix I Below proof will be given that in complete data the standardized coefficients meet the conditions stated in Table 1. Condition 1 For complete data, there is not much to prove about standardized coefficients satisfying Condition 1 as that condition already follows from the formula for standardized regression coefficients itself (Eq. 2). Condition 2 Suppose that we have a complete dataset with N cases, which consists of predictor variable matrix: and outcome variable matrix: A X is a (k + 1) × (k + 1) transformation matrix that transforms X into a predictor matrix with standardized variables. A X and A −1 X are defined as: respectively. Next, suppose that A y is a 2 × 2 transformation matrix that transforms y into a standardized outcome variable matrix. More specifically: Standardized predictors and a standardized outcome variable may be obtained by means of: Using these standardized predictors, it can be shown that regression coefficients obtained from a regression using Z X and Z y as predictors and an outcome variable, respectively, will provide standardized regression coefficients, as defined in Eq. 2. First, the unstandardized regression coefficients are computed as:b = (X X) −1 X y.
Equivalently, the standardized regression coefficients are computed as: By substituting A X X and yA y for X and y, respectively, it follows that Eqs. 2 and 9 are equivalent: (ignoring the first row and column, which are only meant for the intercepts in matrices X and y). Condition 3 Before proving condition 3 for complete data, it is once again emphasized that the standard t-test for testing a regression coefficient is not suitable for standardized coefficients.
Having said that, in complete data a linear transformation of the data does not affect the outcome of a t-test. Since standardizing the data is a linear transformation, the values of t-tests testing the resulting regression coefficients (which are now standardized coefficients) will not have been affected by this standardization. To show this, the formula for the t-test of a regression coefficient for unstandardized regression coefficients in complete data is given first. Suppose s e is the standard deviation of the error of the regression model, coefficientb j is tested for significance the following t-statistic: with n-k -1 degrees of freedom. Now suppose both the predictors and outcome variable are standardized, the (centered) values of the outcome variable are all divided by s y . Consequently, when a regression analysis is applied to these standardized variables, the standard deviation of the error in y becomes s e,β = s e s y . Furthermore, the standardized coefficient of variable x j iŝ β j = s x j s yb j (Eq. 2). Substituting these quantities into Eq. 11, and making use of the fact that s z j = 1, we get: .

Condition 4
In simple linear regression performed on complete data, the standardized coefficient must equal the correlation between x 1 and y. An implication of this is that when y is regressed on x 1 , the standardized regression coefficient has the same value as the standardized regression coefficient that is obtained when x 1 is regressed on y, namely the correlation. If in complete data we regress variable x 1 on variable y, and we writeβ 1 as a function of r x 1 y we get: If we do the same forβ 1,y when we regress y on x, we get: s 2 x 1 ands 2 y are unbiased estimates, the use of their square roots for the computation ofβ PS will not introduce additional bias on top of the bias that a standardized regression coefficient has in itself. To summarize, sinceb 1 ,s 2 x 1 , ands 2 y are all unbiased estimates under Bayesianly proper imputations,β PS satisfies condition 1.
Next, the question is whetherβ SP satisfies condition 1 as well. Since all three statistics in the term s x j ,m s y,mb j,m behind the summation sign (Eq. 4) depend on m, no separate unbiased pooled estimates of b j , σ 2 x j , and σ 2 y can be derived which, when inserted in Eq. 1, will provide the same pooled standardized regression coefficients as in Eq. 4. In short,β SP does not satisfy condition 1.
Condition 2 First, it is verified whetherβ PS satisfies condition 2. Calculatings x j for all j, these estimates can be used to construct a transformation matrix for standardizing the set of predictors in imputed dataset m, denoted X m : The standardized versions of X m and y m are computed asZ X m = XÃ X m andZ y m = yÃ y m , respectively. Using the same steps as in Eq. 9 for complete data, it now follows that: , which is equivalent to Eq. 3, except for the first row and column, which are omitted in Eq. 3. The idea behind this standardization is that within each imputed dataset the variable means of the specific imputed dataset m are used for the centering to ensure that the pooled intercept will be 0. At the same time, the normalization is done by dividing the same estimates of the standard deviations across imputed datasets to ensure that the resulting standardized regression coefficients will be the same as in Eq. 3. This type of standardizing data is similar to standardization that is used in three-way models (Kroonenberg 2008, p. 127), where centering takes place at the level of fibers within a slice (in multiple imputation this corresponds with the computation the mean of a variable x j within a specific imputed dataset m), and normalization takes place at the level of the complete slice (in multiple imputation, this corresponds with computing one standard deviation of a variable x j across all imputed datasets). To summarize, when using a specific form of standardization and applying Rubin's rules for point estimates (i.e., averaging),β PS satisfies condition 2 as well.
As forβ SP , the only difference withβ PS is that in the standardization both the centering and normalization take place at the level of imputed dataset m. The standardized regression coefficients inβ SP are the result of standardizing the predictors and outcome variable per imputed dataset m, applying a regression analysis to the standardized variables for each imputed dataset, and pooling the resulting regression coefficients using Rubin's rules for point estimates (i.e., averaging). In short, using the general combination rules,β SP satisfies condition 2 as well.
Condition 3 When t-tests are calculated forβ PS in multiply imputed data using Rubin's rules (1987, Chap. 3), the question is whether they will give the same t-tests as when the unstandardized coefficients inb are tested for significance using the same rules. When applying Rubin's rules to an elementb j inb, we get: