Fungible weights are sets of alternative, suboptimal weights that may be used to examine parameter sensitivity in multiple linear regression models by way of a minor decrement in R2. Each set of weights yields the same value of R2, as well as the same correlation between the ordinary least squares (OLS) and alternative predicted values (Waller, 2008). The degree of discrepancy between sets of fungible weights and the OLS weights is independent of sample size, and their behavior is determined by factors other than those that determine the confidence interval for regression weights (Jones, 2013; Pek, Chalmers, & Monette, 2016). Under some conditions, even very small decrements in R2 can be associated with substantially different regression weights for the predictor variables. In those cases, the weights are considered sensitive and provide a poor basis for scientific conclusions (Green, 1977), and it is therefore prudent to ensure that the weights obtained in a study are not too sensitive.

The uncertainty of parameter estimates we focus on here is not the uncertainty that stems from sampling variation, but uncertainty stemming from possible model inaccuracies, which are rarely known in practice. Although the two types of uncertainty are related (Pek et al., 2016), tests for one cannot serve as tests for the other. Methods of examining parameter sensitivity can be understood as relaxing the degree of certainty about the model’s accuracy (rather than relaxing the degree of certainty about precision, as with standard errors and confidence intervals) and as an acknowledgment that all models are wrong and almost necessarily biased (Box, 1976; Edwards, 2013), due to, for instance, missing variables, incorrect error terms, or missing (interaction) terms, or because measurement error is ignored (Jaccard & Wan, 1995). Although the sources of bias may cancel each other out to some degree, in general it can be expected that more inaccurate models will also be more biased. Here we will make use of predictor-specific fungible weight intervals (FIs) to indicate the sensitivity of parameter estimates to possible model inaccuracy, in analogy to how confidence intervals (CIs) reflect the imprecision stemming from sampling variation. The two sources of uncertainty are different, and as will be shown, the two types of intervals are different, as well.

Before continuing, we wish to make clear that weights that yield a lower value of R2 are not necessarily less accurate. On the contrary, the accurate weights will yield a lower value of R2 if the model is incorrect. The estimated parameters in a linear regression model are optimal weights given the data and the model, but these estimates are unlikely to reflect the true effects associated with each variable if the model is not completely accurate (i.e., if it is not the “true model”). Examining the weights associated with a decrement in R2 obviously does not guarantee recovery of the true effects, but it nonetheless provides an opportunity to investigate the potential consequences of an inaccurate model on the regression effect estimates, without being limited to any specific type of model inaccuracy.

Suppose that a model A is an accurate model and model B is an inaccurate model that does not include one or more predictors. If one were to use the true parameter values from model A in the same model B, the result would be a value of R2 lower than that obtained with OLS estimation, and the accurate, unbiased parameters from model A would seem inferior to the OLS estimates. For example, suppose that the true model A includes a set of four predictor variables and a criterion variable with all rs = .4, and so all βs = .182 and R2 = .291. If a predictor is excluded in the inaccurate model B, then the OLS estimates for the remaining three predictors would all be βs = .222 and R2 = .267. However, if the true regression weights from model A were used in model B, then R2 = .218, which is clearly inferior to the .267 value obtained with OLS. Similarly, when the true model A is a model with five predictor variables and all rs = .4, then R2 = .308 and all βs = .154. However, if only three of the five predictors are used in a model B, the OLS estimates are again βs = .222 and R2 = .267, whereas the correct values (from model A) used in model B yield an even larger reduction, R2 = .185. Counterintuitively, then, optimal weights are not necessarily correct weights. Whereas these examples are calculated working with given true weights and known violations in an incorrect model, in practice the calculation of the sensitivity of weights is a form of reverse calculation that does not guarantee recovery of the true weights unless the specific violations are known (and we do not claim here that fungible weights may be used to recover the true model, as there has been insufficient research to make such a claim). Instead, consideration of alternative weights provides a general indication of how much the obtained parameters may reflect bias due to model violations of any form.

Fungible regression weights

Fungible regression weights are alternative weights that yield predicted values for the criterion variable that yield a prespecified criterion correlation with the OLS predicted values and identical suboptimal values of R2—thus, the term fungible weights. We denote the OLS vector of weights as b, and the vector of alternative weights as a. Similarly, we denote the OLS value of R2 as \( {R}_b^2 \) and the fungible value as \( {R}_a^2 \), the predicted OLS weight values as \( {\widehat{y}}_b \), the predicted fungible weight values as \( {\widehat{y}}_a \), and the prespecified correlation between the two as \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \). For two predictor variables, there are two sets of alternative weights that satisfy the constraints, but for three or more predictors, there are an infinite number of alternative weights that do so (fungible weights are not defined for a single predictor). The full mathematical derivation of fungible weights may be found in Waller (2008), and a means to identify the minimum and maximally discrepant weight sets (i.e., the fungible extrema) may be found in Waller and Jones (2009). Fungible weights for logistic regression may be found in Jones and Waller (2016), and Lee, MacCallum, and Browne (2018) developed fungible parameters for structural equation modeling. Alternative weights that yield the same value of \( {R}_a^2 \) but do not satisfy the prespecified correlation are known as exchangeable weights (Pek et al., 2016).

Although we wish to minimize the use of equations throughout the present article, to reach a broader audience, a brief explanation of the geometry of fungible weights will be useful for the discussion of our results. Geometrically, fungible weight sets with three or more predictors lie at the intersection of a p – 1 dimensional (hyper) plane and a p-dimensional (hyper)ellipsoid. The intersection is a p – 1 dimensional ellipse or (hyper)ellipsoid. With two predictors, the weights sets are the two points at which a line intersects with an ellipse. The intersecting ellipse emerges because each weight is a tightly constrained function of the others, because of the prespecified correlation \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \) (Waller & Jones, 2009). The (hyper) plane is characterized by the set of weight vectors that satisfy the constraint that they yield the same value of \( {R}_a^2 \)—specifically,

$$ {R}_a^2={\boldsymbol{a}}^{\prime }{\boldsymbol{R}}_{\boldsymbol{xx}}\boldsymbol{b} $$

and the (hyper) ellipsoid is characterized by the set of weight vectors that satisfy

$$ {R}_a^2={\boldsymbol{a}}^{\prime }{\boldsymbol{R}}_{\boldsymbol{xx}}\boldsymbol{a} $$

where Rxx is the predictor correlation matrix (Jones & Waller, 2016; Waller & Jones, 2009). Fungible weights are those that satisfy both equations.

We will refer to the ellipsoid as the all-possible-regressions ellipsoid, because for a given \( {R}_a^2 \) and predictor matrix, an infinite number of possible weight sets will yield the same \( {R}_a^2 \) (this is also true for \( {R}_b^2 \); Waller & Jones, 2011). Parameter sensitivity is evaluated on the basis of how tight the set of weights are around the parameters associated with optimal model fit, with tighter sets of weights providing a stronger basis for inferences from the OLS estimates. In other words, smaller ellipses indicate less sensitive parameters.

An example of fungible weights is shown in Fig. 1 with three predictor variables, with the variance explained for the OLS estimates being \( {R}_b^2 \) = .647. Given \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \) = .90, .95, and .99, the resultant alternative values are \( {R}_a^2 \) = .524, .584, and .634, respectively. The single dot represents the OLS estimates, and the ellipses represent the fungible weight sets for a given \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \) value. As can be seen, there are large discrepancies between the OLS estimates and some of the fungible weights. In the case of \( {\beta}_{X_1} \), the OLS weight is .073 and is significant for N = 450, whereas the fungible weights include both positive and negative weights for all shown values of \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \). For \( {\beta}_{X_2} \), the OLS weight is .239 and is significant for samples as small as N = 50, yet for \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \) = .90, the fungible weight sets also include negative weights.

Fig. 1
figure 1

Fungible weights based on the case that \( {r}_{X_1{X}_2} \) = .1, \( {r}_{X_1{X}_3} \) = .2, \( {r}_{X_2{X}_3} \) = .3, \( {r}_{X_1Y} \) = .2, \( {r}_{X_2Y} \) = .4, and \( {r}_{X_3Y} \) = .6, with associated regression weights of \( {\beta}_{X_1} \) = .073, \( {\beta}_{X_2} \) = .239, and \( {\beta}_{X_3} \) = .514. The variance explained for the OLS estimates is \( {R}_b^2 \) = .647, and \( {R}_a^2 \) = .524, .584, and .634, for \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \) = .90, .95, and .99, respectively

Although it is easy to see that the sizes of the fungible weight ellipses differ for each predictor in this example, it is not immediately apparent what factors other than the arbitrary value of \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \) might contribute to the fungible ellipses. Several factors that affect fungible weights are detailed in Jones (2013). We distinguish these factors as being related to the predictor correlation matrix, to the predictor–criterion correlation vector, or to both. Factors related to the predictor correlation matrix may be understood as a matter of multicollinearity. Specifically, the eigenvalues of Rxx determine the shape of the ellipsoid, and the orientation of this ellipsoid is determined by the eigenvectors. As multicollinearity increases, the first eigenvalue will increase and the last eigenvalue will decrease, approaching zero. Roughly equal eigenvalues yield an ellipsoid that is roughly (hyper) spheroidal, and fungible weights yield (hyper) ellipses that are more circular. In contrast, the more discrepant the eigenvalues are, the thinner the ellipsoid becomes in at least one dimension (e.g., cigar or pancake shaped). The axes of the ellipsoid are calculated as follows:

$$ {l}_i=2\sqrt{\frac{R_{\alpha}^2}{\lambda_i}} $$

where li denotes the ith axis length, and λi denotes the ith ordered eigenvalue of the predictor matrix (Waller & Jones, 2011). It follows from the effects of the eigenvalues that measures of multicollinearity may help explain the fungible weight intervals without necessarily playing a direct role. The determinant is equal to the product of the eigenvalues of the matrix, so that as the discrepancy between the first and last eigenvalues increases, the determinant decreases because the last eigenvalue approaches zero. Similarly, the ratio between the first and last eigenvalues is known as the condition number, which is another measure of multicollinearity. Additionally, the familiar variance inflation factor (VIF) tends to increase as the discrepancy between the eigenvalues increases.

Factors related to the predictor–criterion correlation vector may be understood as being due to the \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \) constraint for the predicted values. The orientation of the eigenvectors of Rxx with respect to the vector of standardized regression coefficients (i.e., the correlation between the two) plays a role because, as we previously mentioned, the eigenvectors determine the orientation of the ellipsoid, and the intersecting plane is defined by the set of weight vectors that satisfy \( {R}_a^2={\boldsymbol{a}}^{\prime }{\boldsymbol{R}}_{\boldsymbol{xx}}\boldsymbol{b} \). As a result, where the plane intersects the ellipsoid is determined by both b and Rxx, with different intersection points being associated with different curvatures and thicknesses of the ellipsoid (Jones, 2013). Weight vectors more closely related to the eigenvectors will be more closely related to the variance of the predictors. Additionally, the angle of the intersecting plane is determined by the predictor–criterion correlations, because the correlation vector rxy is orthogonal to the (hyper) plane by design (Waller & Jones, 2009), so the correlations can be expected to predict fungible weight behavior.

Finally, larger values of \( {R}_b^2 \) lead to larger ellipsoids composed of all weights that yield the same value of \( {R}_a^2 \), given the same value of \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \). To understand this, consider that just as there are an infinite number of weight vectors that satisfy the equation for the (hyper) ellipsoid, \( {R}_a^2={\boldsymbol{a}}^{\prime }{\boldsymbol{R}}_{\boldsymbol{xx}}\boldsymbol{a} \), there are also an infinite number of weight vectors that satisfy \( {R}_b^2={\boldsymbol{b}}^{\prime }{\boldsymbol{R}}_{\boldsymbol{xx}}\boldsymbol{b} \) (Waller & Jones, 2011). As a result, as \( {R}_b^2 \) increases, so too must the absolute value of some components of either Rxx or b. The number of predictors also impacts the fungible ellipses indirectly, due to the possible predictor correlations being constrained by the number of predictors (Jones, 2013), but we do not consider this matter further here.

Here, rather than the intersection ellipse, we focus our investigation on the range of each weight separately. We will refer to the range as the fungible interval, a sort of “validity interval” analogous to the reliability interval provided by a confidence interval. We use the range per predictor for a few reasons. In addition to the ease of interpreting the difference between two values, it is also the case that the familiar confidence interval is a range of weight values per predictor. There is also heavy bimodality in the distribution of the fungible weights, with peaks near the boundaries (Waller, 2008), and as a result, the range implies relatively little loss of information. Finally, we focus on the range rather than the fungible extrema (Waller & Jones, 2009) because the range is far more computationally simple. Fungible extrema are the two weight sets that either minimize or maximize the cosine (equivalently, the correlation) between the fungible and OLS weights. Though there is overlap between the minimally and maximally discrepant weight sets and the minimum and maximum weights for each predictor, the weight sets that include either the minimum or the maximum values of a weight are not necessarily fungible extrema, because there are 2p2 weights for the range end points, given three or more predictors (where p denotes the number of predictors), but only 2p weights associated with the extrema.

Our purpose here, then, is to consider both how much the fungible interval can vary in size and what factors can help explain the differences, with the goal of providing a general sense of when the parameters may be sensitive to the effects of unknown model violations. Though the factors we explore are theoretically motivated, we take a relatively more applied approach in our reporting, with a preference for simple and familiar explanatory factors. We will work from correlation matrices in our studies, since they are always available to researchers, but knowledge of how the model is inaccurate is rarely so.

Fungible intervals for the two-predictor case

We begin with the two-predictor case for fungible intervals, because for two predictors there are only two weight sets. The two sets are the two end points of a line—that is, the fungible interval. In the next step we will consider the more complicated three-dimensional, three-predictor case, from which we can draw some possible generalizations to even larger predictor sets.

Method

For this study, the possible correlations between the two predictor variables and the criterion variable were – .5, – .4, – .3, – .2, – .1, 0, .1, .2, .3, .4, and .5.Footnote 1 Because there are three correlations in a three-variable correlation matrix, this resulted in 113 = 1,331 matrices. We follow the example of confidence intervals and use the following criterion values: \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \) = .90, .95, and .99. These criterion values result in relatively modest drops in variance explained. For example, for a value of \( {R}_b^2 \) = .25, the resultant \( {R}_a^2 \)values would be .203, .226, and .245, respectively.

For each matrix (all combinations here produce valid correlation matrices, although some—e.g., all rs = – .5 and \( {R}_b^2 \) = 1—are extremely unlikely and may be computationally difficult in practice), we calculated the OLS weights and the two fungible weight pairs for each \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \) value. All weights derived are standardized weights. Additionally, to provide some sense of the magnitude of the fungible intervals and provide a point of comparison, we also calculated 95% confidence intervals based on N = 100. To calculate the two sets of alternative weights, we used the R function provided in Waller (2008), with a small modification to allow estimation with two predictors.

Results and discussion

Comparison with confidence intervals

As with confidence levels and interval size, lower \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \) values result in larger intervals than do higher \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \) values. These results are to be expected because, generally speaking, increasingly discrepant predictions necessitate increasingly discrepant variable weights. The minimum and maximum interval sizes for \( {\beta}_{X_1} \) and \( {\beta}_{X_2} \) were identical: For each criterion correlation value, the largest intervals were 0.453, 0.343, and 0.161, for \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \)values of .90, .95, and .99, respectively, and the smallest intervals were equal to zero—that is, the fungible weights were equal to the OLS weights. For comparison, the 95% confidence intervals ranged in size from .179 to .465. The magnitudes of each predictor’s fungible intervals were perfectly correlated across all three criterion values.

Figure 2 shows the end points for the two types of intervals. Since the plot for \( {\beta}_{X_2} \) is identical, we hereafter use the subscript i to denote a specific predictor, and the subscript i* to denote the other. The OLS point estimates and confidence interval end points are shown in black. The fungible interval end points are shown for \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \) = .90 and .99, in dark and light gray, respectively. The end points for \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \) = .90 overlap with those for \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \) = .99. The end points for .95 lie between these two sets of end points, and so are not shown. Both intervals are symmetrical around a set of weights. Specifically, the confidence intervals are symmetrical about the OLS weights, whereas the fungible interval end points are symmetrical about the transformed OLS weights (not shown) that are at the center of the fungible ellipse. These transformed weights are equal to \( \frac{R_a^2}{R_b^2}b \) (Waller & Jones, 2009). Because these weights are necessarily closer to 0 than the OLS weights, so too are the fungible interval end points.

Fig. 2
figure 2

End points of the confidence and fungible intervals for two predictors, plotted against the OLS point estimates. Confidence intervals are based on N = 100. The values of \( {r}_{{\widehat{\mathrm{y}}}_{\mathrm{a}}{\widehat{\mathrm{y}}}_{\mathrm{b}}} \) used here are .90 (dark gray) and .99 (light gray). Some end points for \( {r}_{{\widehat{\mathrm{y}}}_{\mathrm{a}}{\widehat{\mathrm{y}}}_{\mathrm{b}}} \) = .90 are overlapped by the points for \( {r}_{{\widehat{\mathrm{y}}}_{\mathrm{a}}{\widehat{\mathrm{y}}}_{\mathrm{b}}} \) = .99

Figure 3 shows the magnitude of the intervals (the difference between the lowest and highest weights), plotted in relation to the OLS values of \( {\beta}_{X_i} \). We use \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \) = .99 in the figure in order to minimize overlap between the confidence and fungible intervals; smaller criterion values simply result in increased distance between points without affecting the overall pattern. It is immediately apparent that the fungible and confidence interval magnitudes follow quite different patterns. Larger values of \( {\beta}_{X_i} \) are generally associated with tighter confidence intervals, but the fungible interval magnitudes are almost completely unrelated to the value of \( {\beta}_{X_i} \). Additionally, whereas the confidence intervals cluster in a singular arcing swath, the fungible intervals cluster in six groupings. Of note is that these clusters are not equally diffuse. From bottom to top, the first cluster includes entirely zero-magnitude intervals, and each subsequent cluster is increasingly diffuse. Considering the conditions in this study, this suggests that the absolute magnitude of one of the correlations may be the primary predictor of the fungible interval size, and that there is an interaction between it and at least one other variable.

Fig. 3
figure 3

Fungible interval (FI) (\( {r}_{{\widehat{\mathrm{y}}}_{\mathrm{a}}{\widehat{\mathrm{y}}}_{\mathrm{b}}} \) = .99) and 95% confidence interval (CI) magnitudes plotted against the value of \( {\beta}_{X_i}. \) Confidence intervals are based on N = 100

Explanatory factors for fungible interval size

Because we are using regressions to describe regressions, using the same regression terminology (e.g., predictor and criterion or (in) dependent variables) for both can easily become confusing. For the regressions associated with fungible weights, we use the predictor and criterion terminology to refer to the variables involved, and for the regressions to explain the fungible interval size, we use the terms explanatory and explained variable(s).

We considered the following explanatory variables, in both pairs and trios, with all interactions included: \( \left|{r}_{X_iY}\right| \), \( \left|{r}_{X_{i\ast }Y}\right| \), \( \left|{r}_{X_1{X}_2}\right| \); VIF (there is only one value with two predictors); the determinant, condition number, and eigenvalues of the predictor matrix; the OLS regression weights; and\( {R}_b^2 \). We also considered the correlations (direction cosines) between the predictor matrix eigenvectors and the OLS weight vectors, and the axes of the ellipse (what would be the all-possible-regressions ellipsoid for three predictors). The explanatory variable weights we report are for standardized explanatory variables, but the explained variable—the fungible interval—is not standardized. We take this approach because standardization of the explanatory variables eases interpretation of the coefficients by making them comparable within the same analysis, but standardization of the explained variable would mask differences in the widths of the fungible intervals associated with values of \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \). All weights are exact, since no sampling was involved in this study (the confidence intervals were calculated directly from the standard errors based on correlation matrices and the assumed N = 100)

In contrast to the complicated, multifactorial determination of the shape and orientation of the (hyper) ellipsoid and orientation of the intersecting (hyper) plane (Jones, 2013) for three or more predictors, we found that the range for a given predictor’s fungible interval in the two-predictor case is very simply determined. As can be seen in Table 1, the magnitude of the \( {\beta}_{X_i} \) fungible interval is almost completely explained by |rXi ∗ Y|. This single variable is sufficient to yield \( {R}_b^2 \) = .990. By including VIF in the regression, the result is a model with \( {R}_b^2 \) = .997; all weights in these models were positive. With the addition of the interaction term, the result is \( {R}_b^2 \) = 1, so these three terms perfectly explain the range for the two-predictor case. We also briefly note that the combination of \( \mid {r}_{X_{i\ast }Y}\mid \) and the second axis of the ellipse resulted in \( {R}_b^2 \) = .992. We will revisit this point in the three-predictor case. Figure 4 illustrates the relationship between \( \mid {r}_{X_{i\ast }Y}\mid \) and the range of \( {\beta}_{X_i} \), with the magnitude of the fungible interval increasing with values of \( \mid {r}_{X_{i\ast }Y}\mid \). The increasing spread reflects the interaction between \( \mid {r}_{X_{i\ast }Y}\mid \) and VIF.

Table 1 Regression weights of explanatory variables for fungible intervals in the two-predictor case
Fig. 4
figure 4

Magnitudes of the \( {\beta}_{X_i} \) fungible intervals in the two-predictor case, plotted against ∣rXi ∗ Y∣. The increased spread of the interval magnitudes as ∣rXi ∗ Y∣ increases is because of its interaction with VIF

The results presented here may be understood by considering that satisfying the \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \) constraint requires that when the weight of one predictor goes up, the weights of the others go down. This is perfectly the case for two predictors, and thus for the two pairs of fungible weights, but for more predictors, it follows from the elliptical form. The results in Fig. 4 show that the fungible interval for the weight of one predictor increases with the correlation between the other predictor and the criterion variable, and slightly more so if VIF is high. Varying the weights of a predictor has fewer consequences for the prediction when the other variable is highly correlated with the criterion variable, and can therefore compensate for the weight changes, so there is more freedom for the weight to move. Consistent with this, when ∣rXi ∗ Y∣ is equal to 0, then the magnitude of the fungible interval for ∣rXiY∣ is also equal to 0. Additionally, the intersecting plane is, by design, orthogonal to the predictor–criterion correlation vector (Waller & Jones, 2009, p. 594). In other words, the angle of the plane and the resultant intersection ellipse are entirely determined by the predictor–criterion correlations. It follows that if the correlation of one of the two predictors with the criterion is zero, then the fungible interval for the other predictor is zero, as can be seen in Fig. 1 of Waller and Jones (2009, p. 592). Our finding that the fungible interval shrinks to zero if the other correlation is zero confirms the geometric analysis in Waller and Jones (2009).

When X1 and X2 are highly correlated (high VIF), then the range is somewhat larger, because then one weight can better compensate for the other. The higher the correlation between the two predictors, the less it matters if one weight is increased at the cost of another, but this compensation is far from the primary factor determining the interval size. If X1 is highly related to Y and X2 is not related, then decreasing \( {\beta}_{X_1} \) and increasing \( {\beta}_{X_2} \) will have a large detrimental effect, because giving X2 a larger weight adds noise to the prediction, leaving almost no freedom for \( {\beta}_{X_1} \) to change while still satisfying \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \), regardless of multicollinearity. The reverse is true if X2 is highly correlated with Y but X1 is not: Decreasing the value of \( {\beta}_{X_2} \) and increasing the value of \( {\beta}_{X_1} \) will, for the most part, simply add noise. It appears, then, that if there is a good alternative predictor, it does not matter too much which one does the predictive work, and this is slightly more the case if the two predictors are highly correlated.

Fungible weights with three predictors

Method

For this study, the possible values for \( {r}_{{\widehat{\mathrm{y}}}_{\mathrm{a}}{\widehat{\mathrm{y}}}_{\mathrm{b}}} \) were again. 90, .95, and .99. We limited ourselves to rs = – .5, – .3, – .1, 0, .1, .3, and .5 to make the number of conditions computationally manageable. In a four-variable system there are six correlations, so in this case, with seven possible values, 76 = 117,649 matrices were generated. Of these, 109,129 were valid correlation matrices, and for each we derived the OLS regression weights and confidence intervals based on N = 100, and calculated 1,000 fungible weight trios using Waller’s (2008) R function. These 1,000 sets were sufficient to recover the shape of the fungible weight ellipse as well as any trends, and there was only a minor loss in precision.

Results and discussion

Comparison with confidence intervals

Figure 5 shows the end points for both the confidence and fungible intervals with three predictors, plotted in relation to \( {\beta}_{X_1} \) (the plots for \( {\beta}_{X_2} \)and \( {\beta}_{X_3} \) are identical). As in the two-predictor case, the confidence intervals are symmetric about the OLS point estimates, and the fungible intervals are symmetric about the \( \frac{R_a^2}{R_b^2}b \) transformed weights. The OLS point estimates and confidence interval end points are shown in black. The fungible interval end points are those for \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \) = .90 and .99 and are again shown with dark and light gray, respectively. The end points for \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \) = .90 overlap with those for \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \) = .99, and we again do not show the points for \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \) = .95 because they are between the two criterion values shown. Unlike in the previous study with two predictors, here the points for \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \) = .99 overlap with the 95% confidence interval end points.

Fig. 5
figure 5

End points of the confidence and fungible intervals for three predictors, plotted against the OLS point estimates, shown in black. Confidence intervals are for N = 100. The values of \( {r}_{{\widehat{\mathrm{y}}}_{\mathrm{a}}{\widehat{\mathrm{y}}}_{\mathrm{b}}} \) used here are .90 (dark gray) and .99 (light gray). Some interval end points associated with \( {r}_{{\widehat{\mathrm{y}}}_{\mathrm{a}}{\widehat{\mathrm{y}}}_{\mathrm{b}}} \) = .90, as well as for the 95% confidence intervals, overlap with the points for \( {r}_{{\widehat{\mathrm{y}}}_{\mathrm{a}}{\widehat{\mathrm{y}}}_{\mathrm{b}}} \) = .99

Figure 6 shows the magnitude of the fungible intervals (the difference between the lowest and highest bounds) plotted in relation to \( {\beta}_{X_1} \) (the plots for \( {\beta}_{X_2} \)and \( {\beta}_{X_3} \) are again identical). We use \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \) = .99 to minimize overlap of the points. Points are shaded according to whether both values of ∣rXi ∗ Y∣ are equal to 0 or at least one is equal to .1, .3, or .5, with darker points representing larger values. The minimum and maximum interval sizes for each β were identical, and the largest intervals were 1.241, 0.938, and 0.441, for \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \) values of .90, .95, and .99, respectively, and the smallest intervals were equal to zero—that is, no variability in the weights. For comparison, the 95% confidence intervals ranged in size from .048 to .757. Fungible intervals of size zero occurred when all other predictors were uncorrelated with the criterion and the magnitudes of a given weight’s fungible intervals were perfectly correlated across all three criterion values.

Fig. 6
figure 6

Fungible interval (FI) (\( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \) = .99) and 95% confidence interval (CI) magnitudes plotted against the value of \( {\beta}_{X_i} \). Darker triangles indicate increasing values of at least one ∣rXi ∗ Y∣. As can be seen, as the correlation magnitude increases, so too does the minimum magnitude for the FIs

The fungible and confidence intervals again followed different patterns, similar to those observed in the two-predictor case. Larger values of β are generally associated with tighter confidence intervals, but the fungible interval magnitudes are almost completely unrelated to the value of β. The confidence intervals again form a large cluster that slopes downward as the absolute value of β increases, and the fungible intervals cluster in four overlapping groupings that are increasingly diffuse as the other correlations increase. There are four clusters here rather than six because there were only four correlation absolute values in this study (i.e., no |.2| or |.4|).

Another result of interest from this study is that the maximum range is much larger here than in the two-predictor case: 1.241 versus 0.453 for \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \) = .90. This difference occurs because the additional predictor affords more room for the weights to vary without strongly affecting the predicted values. The implication of this is that the more predictors there are in a regression model, the greater the potential parameter sensitivity and uncertainty regarding the correct values of individual effects. This holds true regardless of multicollinearity between the predictors.

Predictors of the fungible intervals

While the fungible interval magnitude for the two-predictor case was very simply—and all but completely—explained by the magnitude of the other predictor’s correlation with the criterion variable, \( {r}_{X_{i\ast }Y} \), an ellipse is a far more complicated shape than a line, so the other factors associated with shape and orientation (Jones, 2013) may have larger effects in this case. As a result, we again considered the following variables, in both pairs and trios, with interactions included: \( \left|{r}_{X_1Y}\right| \), \( \left|{r}_{X_2Y}\right| \), \( \left|{r}_{X_3Y}\right| \), \( {R}_b^2 \), \( {VIF}_{X_1} \), \( {VIF}_{X_2} \), \( {VIF}_{X_3} \); the determinant, condition number, and three eigenvalues of the predictor correlation matrix Rxx; the correlations between the predictor matrix eigenvectors and the OLS weights; as well as the three axes of the ellipsoid of all possible regressions given the predictor correlation matrix and a fixed \( {R}_{\alpha}^2 \).

The combination that yielded the most variance explained consisted of both values of \( {r}_{X_{i\ast }Y} \) (e.g., \( \left|{r}_{X_2Y}\right| \) and \( \left|{r}_{X_3Y}\right| \) for X1) along with the third axis of the ellipsoid of all possible regressions for a given \( {R}_{\alpha}^2 \), as defined above. However, this combination of predictors yielded an \( {R}_b^2 \) value that was only .013 larger than the combination of the two values of \( {r}_{X_{i\ast }Y} \) and the value of \( {VIF}_{X_i} \). The coefficient magnitudes were also similar, and the signs identical. Because the VIF is far more familiar to and easily understood by most researchers, as well as readily available in statistical software, we will discuss the model with the VIF instead of the third axis as the third predictor. The results of the regression using the third axis may be found in Appendix A. The highest observed \( {R}_b^2 \) values for other combinations of three predictors are also shown, along with the values for four and five predictors; of note is that \( {VIF}_{X_i} \) consistently emerged as a strong explanatory variable. Our use of VIF is also, of course, in keeping with the results of the two-predictor case, easing discussion. We will, however, return to this point in the General Discussion. Table 2 shows the results for \( {\beta}_{X_1} \). Although the effects are again attributable to the values of \( \left|{r}_{X_{i\ast }Y}\right| \) in general, we explicitly reference each predictor in order to avoid confusion when discussing the interactions.

Table 2 Regression weights of explanatory variables for \( {\beta}_{X_1} \) fungible intervals in the three-predictor case

As in the previous study, the range of the fungible parameters increases with the magnitude of the correlations of the other predictors with the criterion variable. In this case, the other correlations, \( \left|{r}_{X_2Y}\right| \) and \( \left|{r}_{X_3Y}\right|, \) are sufficient to yield \( {R}_b^2 \) = .778, with all weights being positive. Including their interaction yields \( {R}_b^2 \) = .839 and adding \( {VIF}_{X_1} \) results in \( {R}_b^2 \) = .898. Allowing for interactions between \( {VIF}_{X_1} \) and the two correlations results in \( {R}_b^2 \) = .910. Figure 7 displays the relationship between the fungible interval for \( {\beta}_{X_1} \) as a function of \( \left|{r}_{X_2Y}\right| \) and \( \left|{r}_{X_3Y}\right| \), and like Fig. 4, it shows an increase in magnitude as the absolute correlation values of the other predictors increases. Interval magnitudes are symmetric across positive and negative correlation values.

Fig. 7
figure 7

Fungible interval (FI) magnitudes for \( {\beta}_{X_1} \) and \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \) = .99, plotted in relation to \( \left|{r}_{X_2Y}\right| \) and \( \left|{r}_{X_3Y}\right| \)

This pattern, with a positive effect of the two single correlations between other predictors and the criterion and a negative interaction of the two correlations, means that together the two correlations have a disjunctive effect: It is sufficient for one of the two correlations to be large for the range to be large, as well. As in the two-predictor case, increasingly large correlations between the other predictors and the criterion variable allow for more variation in the weights without simply adding noise; that is, they compensate for the changes in weights, with collinear predictors providing a modest compensatory effect for the changes in weights.

Function to calculate fungible interval

To facilitate the use of fungible intervals, we provide an R (R Core Team, 2018) function in Appendix B that is a wrapper function for Waller’s (2008) original fungible function. The function accepts a predictor covariance matrix as input, rxx, and a vector of predictor–criterion covariances, rxy. If the input is a correlation matrix, then the reported weights will be standardized weights. Interactions must be calculated and included as separate variables in the covariance matrix. Allowing for the estimation of fungible weights with two predictors required a small modification to the original function, to allow two predictors; this is commented in the code. The calculation of fungible weights is otherwise unchanged (though there are some minor changes to the output provided by the function).

There are also two options to be made. The first is ryayb, which reflects the desired value of \( {r}_{{\widehat{y}}_a{\widehat{y}}_b} \). This can be any value between 0 and 1, but an appropriate test of parameter sensitivity requires a relatively high value, and there is only one set of weights for values of 0 and 1. The default value for ryayb is .99. The other option is sets, which is the number of desired fungible weight sets, with each set comprising one weight per predictor. This defaults to 1,000. Smaller values will result in less precise fungible intervals. If there are only two predictors, then the two weight sets will be repeated and a smaller value—for example, 10—can recover the fungible intervals without providing excessive and unnecessary repeated output.

The function outputs the minimum and maximum values of the fungible weights associated with each predictor, as well as the size of the interval. The OLS weights are also provided, for ease of comparison. Additionally, the values of \( {R}_b^2 \) and \( {R}_a^2 \) are provided, as well as the size of the difference between them. An example is provided in Appendix B.

Discussion

Our results show that the magnitude of the range of fungible weights for a given predictor, Xi, is primarily explained by the size of the correlation(s) between the other predictor variable(s) and the criterion variable, with multicollinearity contributing to the size by way of interactions with each \( \mid {r}_{X_{\ast i}Y}\mid \). Note that although multicollinearity did contribute to interval size, it did so far less than the predictor–criterion correlations, and no multicollinearity was necessary for an interval to be large. That the range is so simply explained affords a few straightforward points of discussion with high relevance for inferences made from regression analyses.

First, parameter sensitivity increases in cases when other predictors are highly correlated with the criterion variable, since they may compensate for any changes in a predictor’s weight. To what degree this will impact a researcher’s conclusions will depend, of course, on the sort of conclusions being drawn. If one wishes to draw inferences about relative predictor importance—for example, ranking—then large intervals are a concern, regardless of the value of a regression weight, particularly if the intervals overlap. However, if one is concerned only with sign and significance, then the importance of identifying sensitive weights increases as the regression weights decrease and the magnitudes of other correlations increase. This is because the fungible interval for a given predictor is increasingly likely to include near-zero weights, or even those of opposite sign. It is well-known that small effects are difficult to study because they tend to have lower power; a downwardly biased estimate will be even less likely to be statistically significant. Furthermore, in extreme cases, a biased estimate may be significant but have the wrong sign, as is implied by the example shown in Fig. 1.

Second, though we could not consider this in detail here, it seems that larger sets of predictors will lead to more sensitive weights, with each individual \( \mid {r}_{X_{i\ast }Y}\mid \) contributing less to the interval size. The maximum size of the fungible intervals was significantly higher in the three-predictor than in the two-predictor case (1.241 vs. 0.453), and whereas for two predictors \( \mid {r}_{X_{i\ast }Y}\mid \) was sufficient to explain 99% of the variation, for three predictors, each \( \mid {r}_{X_{i\ast }Y}\mid \) only explained about 39%, for a total of 78%. A negative interaction between these two variables explained another 6%, for a total of 84%. This suggests that increasing the number of predictors results in a cumulative effect on interval size by way of increasingly smaller and nonlinear effects of each individual \( \mid {r}_{X_{i\ast }Y}\mid \) on interval size. A perfect disjunctive relationship would mean that one other predictor with a high correlation would be sufficient. Because researchers generally use more than three predictors (e.g., control variables), it will be important for future research to explore the behavior of fungible intervals in relation to the number of predictors.

Third, as is to be expected, predictor multicollinearity results in less trustworthy weights. This effect is, however, surprisingly small here, and it is not necessary that any multicollinearity be present for the weights to be sensitive. It does appear, however, that the effects of multicollinearity may increase with the number of predictors. In the two-predictor case, the effects of VIF only explained 1% of the variance beyond \( \mid {r}_{X_{\ast i}Y}\mid \), but in the three-predictor case, it explained an additional 7% of the variance. It is also worth noting that although VIF is both familiar and readily available, for the three-predictor case the third, smallest axis of the all-possible-regressions ellipsoid explained 1% more variance than VIF in the three-predictor case (VIF was nonetheless a consistently strong explanatory variable; see Appendix A). It is difficult to extrapolate to a larger number of predictors based on only two sets of results, but given that there are only two weight sets for the geometrically simple two-predictor case, but an infinite number for the more complex cases of three or more predictors, it seems that the smallest axis of the (hyper) ellipsoid may be a better predictor of interval size for three or more predictors. Whether or not this is the case will require additional research, but it appears that in practice, measures of multicollinearity are largely substitutable with respect to fungible interval size (see Appendix A).

Finally, our results are of note for mediation models (where the total effect of a predictor variable X on a criterion variable Y is decomposed into a direct effect and an indirect effect involving a mediator M; Hayes, 2013). If the correlations between X and Y and M and Y are both high, then it follows that the corresponding regression weights used for the direct and indirect effects will be sensitive. Mediation models are also unusual in that multicollinearity is to some degree desirable, as a larger correlation between X and M will increase the size of the indirect effect but also result in a larger value of VIF. Estimated mediation effects are then likely to be particularly sensitive (cf. Sobel, 2008).

A limitation of our study is that we exclusively focused on the range of fungible weights per predictor. Although this simplification allowed us to illustrate the differences relative to confidence intervals and to derive some simple and clear conclusions, a summary of sensitivity that jointly considers all predictors (e.g., the area encompassed by the ellipse) might be of greater interest in other contexts, because it would provide a global summary of parameter sensitivity for all predictors considered jointly. Additionally, since we did not consider four or more predictors here, we were unable to test whether the smallest axis of the ellipsoid is a better predictor than VIF in general, nor could we explore how much the variance explained by each \( \mid {r}_{X_{\ast i}Y}\mid \) drops as the number of predictors increases.

Conclusion

Although users of regression are aware that in some sense their models may be inaccurate and that the estimates that an analysis yields may well be biased, it is difficult to know the nature of such bias, as well as any potential effects on the trustworthiness of the effects. It follows, then, that it is difficult to identify in advance the conditions under which the estimated parameters will be sensitive due to individual violations—that is, be less trustworthy, in the sense of less valid (Green, 1977). However, our results here suggest that it is not necessary to have knowledge of specific model violations in order to identify situations in which the parameters of interest will be sensitive. The potential consequences can be assessed in terms of an informative fungible interval for different values of the prespecified correlation, without assuming a specific type of model violation. Knowledge of the specific violation would, of course, give more specific indications than is possible with a fungible interval, but such knowledge is commonly not available, and if it were, then the violation could be remedied directly.

Finally, we hope that the function we provide will ease exploring parameter sensitivity for users of regression analysis. Although the fungible intervals should not be used for null hypothesis testing of regression weights, they are still useful as indications of the confidence one may have in the weights, from the viewpoint of model validity.

Author note

This study was not preregistered and makes use of no data. However, code for the simulations and analyses is available upon request to the corresponding author.