SPSS macros to compare any two fitted values from a regression model

Abstract

In regression models with first-order terms only, the coefficient for a given variable is typically interpreted as the change in the fitted value of Y for a one-unit increase in that variable, with all other variables held constant. Therefore, each regression coefficient represents the difference between two fitted values of Y. But the coefficients represent only a fraction of the possible fitted value comparisons that might be of interest to researchers. For many fitted value comparisons that are not captured by any of the regression coefficients, common statistical software packages do not provide the standard errors needed to compute confidence intervals or carry out statistical tests—particularly in more complex models that include interactions, polynomial terms, or regression splines. We describe two SPSS macros that implement a matrix algebra method for comparing any two fitted values from a regression model. The !OLScomp and !MLEcomp macros are for use with models fitted via ordinary least squares and maximum likelihood estimation, respectively. The output from the macros includes the standard error of the difference between the two fitted values, a 95% confidence interval for the difference, and a corresponding statistical test with its p-value.

SPSS macros to compare any two fitted values from a regression model

Students are often introduced to regression models via ordinary least squares (OLS) linear regression, starting with the simple linear regression model. That model can be written as

$$ Y = a + bX + e $$
(1)

where Y is a continuous outcome (or dependent) variable, X is a continuous explanatory (or predictor) variable,Footnote 1 a is the constant (or intercept), b is the slope of the linear relationship between X and Y, and e is the fitting error. It can also be written as

$$ Y\prime = a + bX $$
(2)

where Y′ = Ye = the fitted (or predicted) value of Y.

Students may then be introduced to the multiple linear regression model (with first-order terms only),Footnote 2 which can be written as

$$ Y = a + {b_1}{X_1} + {b_2}{X_2} + ... + {b_p}{X_p} + e $$
(3)

with p representing the number of predictor variables in the model. The multiple regression model can also be written as

$$ Y\prime = a + {b_1}{X_1} + {b_2}{X_2} + ... + {b_p}{X_p} $$
(4)

with Y′ once again representing the fitted value of Y.

In any regression model, the constant a gives the fitted value of Y when all explanatory variables are equal to 0. For the simple linear regression model (Eqs. 1 and 2), b gives the change in the fitted value of Y for a one-unit increase in X. For the multiple linear regression model with first-order terms only (Eqs. 3 and 4), letting j range from 1 to p, b j gives the change in the fitted value of Y for a one-unit increase in X j while controlling for all of the other explanatory variables in the model (i.e., while holding them all constant).

As the phrase “change in the fitted value of Y” suggests, the b-coefficients from the single predictor model and the multiple regression model with no higher order terms represent the difference between two fitted values of Y. Starting with the single predictor model, let Y′ X be the fitted value of Y when some value of X is plugged into Eq. 2, and Y′ X+1 the fitted value of Y when X + 1 is plugged into Eq. 2:

$$ {Y\prime_X} = a + bX $$
(5)
$$ {Y\prime_{{\left( {X + 1} \right)}}} = a + b\left( {X + 1} \right) = a + bX + b. $$
(6)

Subtracting Eq. 5 from Eq. 6 results in b, representing the change in the fitted value of Y for a one-unit increase in X (see Eq. 7):

$$ {Y\prime_{{X + 1}}} - {Y\prime_X} = \left( {a + bX + b} \right) - \left( {a + bX} \right) = a + bX + b - a - bX = b. $$
(7)

The coefficients for a multiple regression model (with first-order terms only) can be obtained in the same manner. For example, in a model with two explanatory variables, three different fitted values of Y can be obtained as follows:

$$ {Y\prime_{{{X_1},{X_2}}}} = a + {b_1}{X_1} + {b_2}{X_2} $$
(8)
$$ {Y\prime_{{{X_1} + 1,{X_2}}}} = a + {b_1}\left( {{X_1} + 1} \right) + {b_2}{X_2} = a + {b_1}{X_1} + {b_2}{X_2} + {b_1} $$
(9)
$$ {Y\prime_{{{X_1},{X_2} + 1}}} = a + {b_1}{X_1} + {b_2}\left( {{X_2} + 1} \right) = a + {b_1}{X_1} + {b_2}{X_2} + {b_2}. $$
(10)

Subtracting Eq. 8 from 9 leaves b 1, which represents the change in the fitted value of Y for a one-unit increase in X 1 with X 2 held constant. Similarly, subtracting Eq. 8 from 10 leaves b 2, which represents the change in the fitted value of Y for a one-unit increase in X 2 with X 1 held constant.

Some users of regression models may think that the only fitted value comparisons available to them are the ones that correspond to regression coefficients. But in reality, a host of other fitted value comparisons are possible, and from a research or hypothesis-testing point of view, many of them might be more meaningful than the small subset of comparisons provided by the coefficients. For example, if X 1 and X 2 are positively correlated, it may not be very realistic to hold one of them constant while increasing the other by one unit. Rather, it might be more informative to determine the change in the fitted value of Y when X 1 increases by 5 and X 2 increases by 10. Eq. 11 shows the fitted value of Y for X 1 + 5 and X 2 + 10:

$$ {Y\prime_{{{X_1} + 5,{X_2} + 10}}} = a + {b_1}\left( {{X_1} + 5} \right) + {b_2}\left( {{X_2} + 10} \right) = a + {b_1}{X_1} + 5{b_1} + {b_2}{X_2} + 10{b_2}. $$
(11)

Subtracting Eq. 8 from Eq. 11 results in the weighted combination of two coefficients, 5b 1 + 10b 2 (see Eq. 12):

$$ \begin{array}{*{20}{c}} {Y_{{{{X}_{1}} + 5,{{X}_{2}} + 10}}^{\prime } - Y{{\prime }_{{{{X}_{1}},{{X}_{2}}}}} = \left( {a + {{b}_{1}}{{X}_{1}} + 5{{b}_{1}} + {{b}_{2}}{{X}_{2}} + 10{{b}_{2}}} \right)} \\ { - \left( {a + {{b}_{1}}{{X}_{1}} + {{b}_{2}}{{X}_{2}}} \right)} \\ { = a + {{b}_{1}}{{X}_{1}} + 5{{b}_{1}} + {{b}_{2}}{{X}_{2}} + 10{{b}_{2}}} \\ { - a - {{b}_{1}}{{X}_{1}} - {{b}_{2}}{{X}_{2}}} \\ { = a - a + {{b}_{1}}{{X}_{1}} - {{b}_{1}}{{X}_{1}} + {{b}_{2}}{{X}_{2}}} \\ { - {{b}_{2}}{{X}_{2}} + 5{{b}_{1}} + 10{{b}_{2}}} \\ { = 5{{b}_{1}} + 10{{b}_{2}}} \\ \end{array} $$
(12)

Fitted value comparisons of interest may also entail weighted combinations of two or more coefficients in models that contain interactions, polynomial terms, or regression splines. For example, a few years ago, we fitted the logistic regression model shown in Eq. 13 Footnote 3:

$$ Y\prime = \ln \left( {\frac{p}{{1 - p}}} \right) = a + {{b}_{1}}BAC + {{b}_{2}}BA{{C}^{2}} + \,{\text{other variables}} $$
(13)

In this example, the outcome variable was the presence of an unsafe driving action, and the main explanatory variable was blood alcohol concentration (BAC). But the inclusion of BAC-squared in the model made the relationship between BAC and the log-odds of the outcome curvilinear rather than linear. Because one of our colleagues wished to report odds ratios for BAC values of .05 and .08, both relative to BAC = 0, we made two fitted value comparisons involving Eqs. 14, 15, and 16:

$$ {Y\prime_{{BAC = 0}}} = a + {b_1}0 + {b_2}{0^2} + {\text{ other variables}} $$
(14)
$$ {Y\prime_{{BAC = 0.05}}} = a + {b_1}0.05 + {b_2}{0.05^2} + {\text{ other variables}} $$
(15)
$$ {Y\prime_{{BAC = 0.08}}} = a + {b_1}0.08 + {b_2}{0.08^2} + {\text{ other variables}}{.} $$
(16)

Subtracting Eq. 14 from 15 leaves b 1 × 0.05 + b 2 × 0.052, and subtracting Eq. 14 from 16 leaves b 1 × 0.08 + b 2 × 0.082. These two weighted combinations of b 1 and b 2 give the logarithms of the desired odds ratios: The odds ratio for BAC = 0.05, OR0.05, is equal to Exp(b 1 × 0.05 + b 2 × 0.052), and OR0.08 = Exp(b 1 × 0.08 + b 2 × 0.082).

In order to compute a 95% confidence interval for one of these fitted value comparisons, or to test whether the difference between the two fitted values is statistically significant, we need the standard error of the difference between the two fitted values. When a fitted value comparison of interest is captured by one of the regression coefficients, the needed standard error is reported in the regression output, as are the 95% confidence interval and the corresponding statistical test. But in situations like the two described above, the desired fitted value comparison is not captured by any of the regression coefficients, and therefore, the needed standard error is not reported in the regression output. That standard error must be computed using a rather complicated formula involving variances and covariances. The two SPSS macros described in this article implement that formula.

The !OLScomp and !MLEcomp macros

In the remainder of this article, we describe two SPSS macros for comparing any two fitted values from a regression model and demonstrate their use.Footnote 4 The !OLScomp and !MLEcomp macros are for use with regression models fitted by ordinary least squares (OLS) and maximum likelihood estimation (MLE), respectively.Footnote 5 Both macros implement the matrix algebra method described by Johnson and Wichern (2002, p. 77, Equation 2–43).Footnote 6 However, they differ somewhat in terms of macro arguments and the type of data file that is required as input.

The !OLScomp macro

The !OLScomp macro works on the same raw data file used to run the regression model of interest. It requires the following input in the form of macro arguments:

  • Y = the outcome variable for the regression model.

  • XList = a list of explanatory variables for the regression model.

  • Set1 = a column vector of values plugged into the regression equation to compute the first fitted value of Y.

  • Set2 = a column vector of values plugged into the regression equation to compute the second fitted value of Y.

  • Title = a title to be displayed on the output from the macro.

The outcome variable handed to the macro as the Y argument is read into a column vector called Y. The XList variables are included in a design matrix called X, the first column of which is a vector of 1s. This means that the constant is included when !OLScomp runs the regression model. With the Y and X matrices in hand, the macro then uses the usual matrix equation to compute B, a column vector of regression coefficients (see Eq. 17).Footnote 7 After some intermediate computations to work out the MS error for the regression model, the macro computes CovB, the covariance matrix for the coefficients (Eq. 18).Footnote 8

$$ B = {\left( {{X^T}X} \right)^{{{ - 1}}}}{{\text{X}}^{\text{T}}}{\text{Y}} $$
(17)
$$ CovB = {({X^T}X)^{{ - 1}}}M{S_{{error}}} $$
(18)

For the Set1 and Set2 arguments handed to the macro, the first value is an indicator for inclusion of the constant in the regression equation (1 = include the constant, 0 = exclude the constant) and should typically be set equal to 1.Footnote 9 The other values in Set1 and Set2 are either indicators for categorical variables or actual values of quantitative explanatory variables. As will be seen in the examples to follow, the Set1 and Set2 vectors must be enclosed in braces (i.e., {}) with the elements separated by semicolons. (In the SPSS matrix language, matrices are enclosed in braces, and semicolons indicate the end of a row.)

The following matrix algebra computations are then carried out:

$$ {\text{Dif}}{{\text{f}}_{{1}}}{\text{ = Se}}{{\text{t}}_{{1}}} - {\text{Se}}{{\text{t}}_{{2}}} $$
(19)
$$ {\text{Dif}}{{\text{f}}_{{2}}}{\text{ = Se}}{{\text{t}}_{{2}}} - {\text{Se}}{{\text{t}}_{{1}}} $$
(20)
$$ {\text{F}}{{\text{V}}_{{1}}}{\text{ = the first fitted value of }}Y{\text{ using Se}}{{\text{t}}_{{1}}}{\text{ values = }}{B^T}{\text{Se}}{{\text{t}}_{{1}}} $$
(21)
$$ {\text{F}}{{\text{V}}_{{2}}}{\text{ = the second fitted value of }}Y{\text{ using Se}}{{\text{t}}_{{2}}}{\text{ values = }}{B^T}{\text{Se}}{{\text{t}}_{{2}}} $$
(22)
$$ {\text{F}}{{\text{V}}_{{1}}} - {\text{F}}{{\text{V}}_{{2}}}{\text{ = first fitted value minus second fitted value = }}{B^T}{\text{Dif}}{{\text{f}}_{{1}}} $$
(23)
$$ {\text{F}}{{\text{V}}_{{2}}} - {\text{F}}{{\text{V}}_{{1}}}{\text{ = second fitted value minus first fitted value = }}{B^T}{\text{Dif}}{{\text{f}}_{{2}}} $$
(24)
$$ SE{\text{(F}}{{\text{V}}_{{1}}}{\text{) = standard error of F}}{{\text{V}}_{{1}}}{ = }\sqrt {{{\text{Set}}_{{_{{1}}}}^{\text{T}} \times CovB \times {\text{Se}}{{\text{t}}_{{1}}}}} $$
(25)
$$ SE{\text{(F}}{{\text{V}}_{{2}}}{\text{) = standard error of F}}{{\text{V}}_{{2}}}{ = }\sqrt {{{\text{Set}}_{{_{{2}}}}^{\text{T}} \times CovB \times {\text{Se}}{{\text{t}}_{{2}}}}} $$
(26)
$$ SE{\text{(F}}{{\text{V}}_{{1}}} - {\text{F}}{{\text{V}}_{{2}}}{) = }\sqrt {{{\text{Diff}}_{{1}}^{\text{T}} \times CovB \times {\text{Dif}}{{\text{f}}_{{1}}}}} $$
(27)
$$ SE{\text{(F}}{{\text{V}}_{{2}}} - {\text{F}}{{\text{V}}_{{1}}}{) = }\sqrt {{{\text{Diff}}_{{2}}^{\text{T}} \times CovB \times {\text{Dif}}{{\text{f}}_{{2}}}}} $$
(28)

Finally, 95% confidence intervals and t-tests are computed using the usual methods (see Appendix 1 for details).

The !MLEcomp macro

As was noted earlier, the !OLScomp macro computes the OLS regression model via the usual matrix algebra equations. Therefore, it is able to internally generate the B and CovB matrices that are needed to compute the desired standard errors, confidence intervals, and statistical tests. But that approach will not work for !MLEcomp, because maximum likelihood estimation is an iterative process, and the regression equation cannot be obtained by solving some matrix equations. Therefore, unlike !OLScomp, !MLEcomp does not use a raw data file as input. Rather, it requires an input file that contains both B, the vector of regression coefficients, and CovB, the covariance matrix for B. This means that users of !MLEcomp must first run their model using a standard SPSS procedure (e.g., GENLIN) and have it save B and CovB. Some data management is then required, but it is not too onerous. An example is provided in the online Supplementary Material (see syntax file Example_2.sps).Footnote 10 Note that the variable containing the regression coefficients must be named B and the variables making up CovB must be contiguous and in the same order as the coefficients.

The macro arguments for !MLEcomp are as follows:

  • FirstCovB = the variable name for the first column of CovB

  • LastCovB = the variable name for the last column of CovB

  • Set1 and Set2, the values that must be plugged into the regression equation to compute the first and second fitted values

  • ExpB = an indicator (1 = Yes, 0 = No) for exponentiation of the fitted value and its confidence limits (used for !MLEcomp only)Footnote 11

  • Title = a title to appear on output from !MLEcomp.

!MLEcomp then carries out the computations shown in Eqs. 19, 20, 21, 22, 23, 24, 25, 26, 27, and 28. Whereas !OLScomp computes t-tests, !MLEcomp computes Wald tests (see Appendix 1 for details). And finally, if the ExpB argument is set to 1, !MLEcomp also exponentiates FV1, FV2, FV1 − FV2, FV2 − FV1 and the limits of their respective confidence intervals.

Examples

We now turn to some examples that illustrate the use of both !OLScomp and !MLEcomp in situations where the fitted value comparisons of interest do not correspond to coefficients reported in the regression output.

Example 1: A linear regression model with a quadratic term

The first example uses a linear regression model with Y = miles per gallon (MPG) and X = vehicle weight in pounds.Footnote 12 Because the relationship is curvilinear (see Fig. 1), weight-squared was also included in the model. The SPSS syntax shown in Syntax Box 1 was used to run the model via the REGRESSION procedure.

Fig. 1
figure1

Miles per gallon as a function of vehicle weight

Syntax Box 1. SPSS syntax to run the linear regression model for Example 1

The model is statistically significant, F(2, 394) = 489.123, p < .001; and R 2 = .713. As was expected, the regression coefficients for weight and weight-squared are both statistically significant (see Table 1). The significance of the weight-squared term indicates that the effect (on MPG) of increasing vehicle weight by a given amount depends on how heavy the vehicle is. This dependency can be illustrated by making the following comparisons: 3,000 lbs–2,000 lbs, 4,000–3,000, and 5,000–4,000.

Table 1 Regression coefficients for Example 1

Syntax Box 2 shows how to make the first of those three fitted value comparisons using !OLScomp. As was noted earlier, the first value in the Set1 and Set2 vectors is an indicator for inclusion of the constant (or intercept) in the model (1 = include, 0 = exclude). It must be set to 1 in this case, because the two fitted values being compared were derived from a model that includes the constant. For this example, the second value is vehicle weight, and the third value is weight-squared. Note that rather than entering the actual value of weight squared, one can use “**,” which is the exponentiation operator in SPSS: “**2” means to the power of 2 (or squared), “**3” means to the power of 3 (cubed), and so on. In the example below, 3,000**2 = 3,0002 = 9,000,000; and 2,000**2 = 2,0002 = 4,000,000.

Syntax Box 2. !OLScomp syntax to compare fitted values of MPG for vehicles weighing 3,000 lbs versus 2,000 lbs
Output Box 1. Output from the !OLScomp command shown in Syntax Box 2

Output from the !OLScomp command in Syntax Box 2 is shown in Output Box 1. The output is organized into three sections. The first section shows the R 2 value for the model and the omnibus F-test. These values can be compared with the output generated via the REGRESSION procedure (or UNIANOVA, etc.) to verify that !OLScomp has run the same model.

The second part of the output shows the Set1 and Set2 values that were handed to !OLScomp along with the Set1 − Set2 and Set2 − Set1 differences. Note that even though the values were 3000**2 and 2000**2 when the macro was called, they appear as 9000000 and 4000000, respectively, in the output.

The final part of the output shows the two fitted values (FV1 and FV2) along with the differences between them (FV1 − FV2 and FV2 − FV1).Footnote 13

  • FV1 = 22.025 = fitted MPG for a 3,000 lb vehicle

  • FV2 = 32.106 = fitted MPG for a 2,000 lb vehicle

  • FV1 − FV2 = 22.025 − 32.106 = −10.081

  • FV2 − FV1 = 32.106 − 22.025 = 10.081.

Importantly, the standard errors and 95% confidence intervals are also reported for each of these estimates, as are t-tests and p-values. The null hypothesis for each of the t-tests states that the parameter being tested (i.e., the fitted value, or the difference between two fitted values) is equal to zero.

!OLScomp syntax to make the other two comparisons of interest (4,000–3,000 and 5,000–4,000) is shown in Syntax Box 3. The corresponding output is shown in Output Box 2.

Note that for each of the three analyses (Output Boxes 1 and 2), the FV1 − FV2 difference is negative, which means that MPG is lower for the heavier vehicle in each case, as was expected. But note that even though the difference in vehicle weight is 1,000 lbs in every case, the effect of a 1,000-lb change is not constant. This is because of the curvilinear relationship between weight and MPG (see Fig. 1). One way to think of the weight-squared term in the model is that weight interacts with itself. In other words, the effect of increasing vehicle weight by 1,000 lbs depends on the value of weight. As the results from !OLScomp show, increasing weight from 2,000 to 3,000 leads to a drop of about 10 MPG, whereas increasing weight from 4,000 to 5,000 leads to a drop of only about 3 MPG.

Syntax Box 3. !OLScomp syntax to compare fitted values of MPG for vehicles weighing 4,000 lbs versus 3,000 lbs and 5,000 lbs versus 4,000 lbs
Output Box 2. Output from the !OLScomp commands shown in Syntax Box 3

Example 2: Logistic regression with a categorical × continuous interaction

This second example demonstrates use of the !MLEcomp macro. We use a logistic regression model with education level, number of years with current employer, and the interaction between them as predictors of defaulting on a bank loan. The data are from the bankloan.sav sample data file that comes with SPSS. For education level, the ED variable in the data file was recoded into a new variable (EDUC) with the college grad and postgraduate degree categories combined. This was done because the number of cases in the postgraduate degree category was very low (n = 5 out of 700 total cases). The model was run via the GENLIN command shown in Syntax Box 4.Footnote 14 The OMS commandsFootnote 15 were included to write the regression coefficients to a new data set called “B” and the covariance matrix for B to a new data set called “covB.”

Syntax Box 4. Syntax to run the logistic regression model for Example 2

Table 2 shows the parameter estimates for this model, and Fig. 2 shows fitted values of the log-odds of defaulting on a loan as a function of the two explanatory variables and their interaction. One drawback to GENLIN is that it does not report the multiple degree of freedom Wald tests for EDUC and the EDUC × EMPLOY interaction. The LOGISTIC REGRESSION procedure, on the other hand, does include those multiple degree of freedom Wald tests in its table of regression coefficients. For EDUC, Wald = 8.577, df = 3, p = .035, and for EDUC*EMPLOY, Wald = 10.519, df = 3, p = .015.

Table 2 Regression coefficients for Example 2
Fig. 2
figure2

The log-odds of loan default as a function of years with current employer and education level. The circles indicate combinations of the two explanatory variables that exist in the data file

One strategy that could be used to illustrate the nature of the interaction would be to compare fitted values for several combinations of EDUC and EMPLOY to a common reference point. In this example, the reference point EDUC = 1 (high school not completed) and EMPLOY = 0 (0 years with current employer) was chosen. Fitted values for the following combinations of EDUC and EMPLOY were then compared with that common reference point:

  • EDUC = 1 (did not complete high school) and EMPLOY = 10

  • EDUC = 2 (high school degree) and EMPLOY = 10

  • EDUC = 3 (some college) and EMPLOY = 10

  • EDUC = 4 (college or postgrad degree) and EMPLOY = 10.

Before the !MLEcomp macro could be used to make those comparisons, some data management steps had to be carried out on the “B” and “covB” data sets created via GENLIN and OMS. The details of those data management steps are not described in detail here, but the result can be seen in Fig. 3.Footnote 16 Note that the regression coefficients are in variable B (as required by !MLEcomp) and that the covariance matrix for the coefficients is in variables Intercept to educ2employ.

Fig. 3
figure3

Data set containing the vector of regression coefficients (variable B) and the covariance matrix for the coefficients (variables Intercept to educ2employ) for Example 2

Syntax Box 5. !MLEcomp commands to carry out the desired comparisons for the logistic regression model in Example 2

Syntax Box 5 shows !MLEcomp commands to carry out the four analyses described above. Lines that begin with an asterisk are comments. The first value in the Set1 and Set2 matrices contains an indicator for inclusion of the intercept (1 = include, 0 = exclude) and must be set to 1. The other seven values in Set1 and Set2 are for educ = 4, educ = 3, educ = 2, employ, [educ = 4]*employ, [educ = 3]*employ, and [educ = 2]*employ. Note that these are in the same order as in the table of parameter estimates from GENLIN (see Table 2), which is also the same order as in the covariance matrix for the parameter estimates.Footnote 17 It is also important to note that the redundant variables (educ = 1 and [educ = 1]*employ) are not represented in Set1 and Set2.Footnote 18

Finally, the ExpB argument is set to 1 in each case in order to dislay Exp(B) and its 95% confidence interval in the output from !MLEcomp. For a logistic regression model, Exp(B) provides an odds ratio.

For the sake of brevity, we will show the results for only the fourth analysis (see Output Box 3). Results for the first three analyses can be obtained by running syntax file Example_2.sps, which is available as part of the online Supplementary Material.

Output Box 3. Results for Example 2, Analysis 4

The fourth analysis compares people with a college or postgraduate degree (EDUC = 4) who have been with their current employer for 10 years (EMPLOY = 10) with the reference group for all analyses (EDUC = 1 and EMPLOY = 0). One difference from the !OLScomp output shown earlier is that the t-tests have been replaced with Wald tests, which are customary for models fitted via MLE. Moreover, because the ExpB argument was set to 1 when invoking !MLEcomp, Exp(B) and its 95% confidence interval are shown in the output. The Exp(B) value on the FV1–FV2 line indicates that the odds of defaulting on a loan are .552 times lower for college graduates who have been with the same employer for 10 years (95% CI, 0.190–1.610); or stated the other way, Exp(B) from the FV2–FV1 line tells us that the odds of defaulting are 1.810 times greater for those who have not completed high school and have just started working for their current employer (95% CI, 0.621–5.275). Note, however, that in both cases, the confidence intervals include a value of 1, which means that the odds ratios are not statistically significant at the .05 level (p = .277).

Example 3: Comparing two differences

The first two examples have shown how the !OLScomp and !MLEcomp macros can be used to compare two fitted values from the same regression model. However, !OLScomp and !MLEcomp can also be used to examine differences of differences. The linear regression model from Example 1 (MPG as a function of vehicle weight) can be used to illustrate.

Recall that for the first analysis in Example 1 (3,000 lbs–2,000 lbs), the FV1 − FV2 difference was −10.081; and for the third analysis (5,000 lbs–4,000 lbs), the FV1 − FV2 difference was −3.085. The difference between those differences can be analyzed by making Set1 = S1 − S2 from the first analysis and Set2 = S1 − S2 from the third analysis. The !OLScomp command to do this is shown in Syntax Box 6, and the results in Output Box 4. The first point to note is that the results for FV1 and FV2 in this analysis match exactly the FV1–FV2 results from first and third analyses. Second, note that the difference between −10.081 and −3.085—that is, the difference between the differences—is −6.996, with the standard error and 95% confidence interval as shown.

Syntax Box 6. Using !OLScomp to examine a difference of differences
Output Box 4. !OLScomp results for a difference of differences

Differences of differences can be analyzed in the same fashion using !MLEcomp, as shown in syntax file MLEcomp_validation.SPS, which is available as part of the online Supplementary Material. Details are not provided here, but note that when the results are exponentiated, a difference of differences becomes a ratio of ratios. In the case of logistic regression, for example, it would be a ratio of odds ratios.

Validation of the macros

To validate the macros, we ran models that have only categorical explanatory variables and then compared results from the !OLScomp and !MLEcomp macros with results for the same fitted value comparisons obtained via estimated marginal means (EMMEANS) with pairwise comparisons. The macros were validated in all cases. Interested readers can find a detailed report of the validation analyses in Appendix 2.

Summary

When one fits a regression model, there is always interest in comparing fitted values of the outcome variable for different combinations of the explanatory variables. Some of those fitted value comparisons may be captured by single regression coefficients. When that is the case, the standard error needed for computing a confidence interval or for carrying out a statistical test on the difference is available in the regression output. However, other fitted value comparisons of interest may not be captured by a single regression coefficient. Rather, they are captured by the weighted combination of two or more regression coefficients. In this case, the standard error needed to make the fitted value comparison does not appear in the regression output. But the !OLScomp and !MLEcomp macros described in this article can be used to compare any two fitted values from regression models fitted by OLS and MLE, respectively. The output from both macros includes the difference between the two fitted values, its standard error, a 95% confidence interval on the difference, and a statistical test of the null hypothesis that the difference equals zero.

Notes

  1. 1.

    As noted by one of the reviewers, X can also be dichotomous. When X is dichotomous, the simple linear regression model is equivalent to an independent groups t-test (equal variances version).

  2. 2.

    Such models are sometimes described as main effects only models, but that terminology is technically incorrect because partial effects in a regression model are not the same thing as main effects in a factorial ANOVA model.

  3. 3.

    In Eq. 13, p does not represent the number of outcome variables. Rather, it represents the probability that Y, the dichotomous outcome variable, is equal to 1 rather than 0.

  4. 4.

    The use of SPSS macros requires some knowledge of SPSS syntax. Readers who are not familiar with SPSS syntax may find the following tutorials helpful: http://www.spsstools.net/LearningSyntax.htm; http://www.lrz.de/~wlm/ein_spss.htm. A good introduction to SPSS macros can be found here: http://www.spsstools.net/MacroTutorial.htm.

  5. 5.

    Linear regression and related models (e.g., ANOVA, ANCOVA) are typically fitted using OLS. Logistic regression and Poisson regression are typical examples of models fitted via MLE. The macro definitions for !OLScomp and !MLEcomp can be seen in SPSS syntax file OLScomp_and_MLEcomp_macros.SPS, which is available as part of the online Supplementary Material. Other syntax files available there contain demonstrations of how to use the macros.

  6. 6.

    We thank Ray Koopman (personal communication) for alerting us to this method.

  7. 7.

    In Eq. 17 and subsequent matrix equations, the superscript “T” means “transpose.” For example, X T is the transpose of matrix X. Some algebra textbooks use a prime rather than a superscript T—for example, X′. But since we have already used Y′ to represent the fitted value of Y in a regression model, we think it will be less confusing for readers to use a superscript T to indicate transposition of a matrix. The superscript “−1” indicates matrix inversion. We use the SPSS matrix language function GINV(), rather than INV(), to compute the inverse of a matrix. GINV() gives the Moore–Penrose generalized inverse. By chance, we discovered that our Example 1 model would not run using INV() unless we rescaled the explanatory variable, but it does run using GINV(). Readers who have SPSS can find more information in the Command Syntax Reference Manual by looking under MATRIX–END MATRIX → COMPUTE Statement → Matrix Functions. Readers who do not have SPSS can go to http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp and search on INV and GINV.

  8. 8.

    Equations 17 and 18 can be found in multiple references—for example, Fox (2008, p. 196).

  9. 9.

    When the macros are used to compare two fitted values of Y, the first variable in Set1 and Set2 must be set to 1. But when the macros are used to examine a difference of differences, as in Example 3, the first variable may have to be set to 0 instead.

  10. 10.

    Please note that some of the data management steps contained in this syntax file operate on data sets generated via the OMS command. Variable names created by OMS depend on the output language that is in effect. For example, with English as the output language, our OMS command generates a variable called “Intercept”; but with Spanish as the output language, the same variable is named “Intersección.” In order to avoid this difference, which would cause an error, we issue a command near the top of the syntax file to set the output language to English. Users of non-English versions will have to reset to their preferred output language after running our syntax file. In order to assist users who are not comfortable using syntax for data management, we also include a list of steps that can be carried out manually to create the needed BcovB data set.

  11. 11.

    For models that use log or logit transformations, it is customary to exponentiate the regression coefficient. In logistic regression, for example, Exp(B) = e B = an odds ratio.

  12. 12.

    All examples in this article use sample data files that come with SPSS. (For readers who may not have access to them, the data files are also available as part of the online Supplementary Material.) These examples are intended only to illustrate the use of the !OLScomp and !MLEcomp macros. Substantive conclusions about the subject matter of the examples are unwarranted and drawn at the reader’s own risk. Example 1 uses data file Cars.sav file. Note that one odd case with vehicle weight less than 1,000 lbs was excluded.

  13. 13.

    A reviewer questioned why the macros report both the FV1 − FV2 and FV2 − FV1 differences, given that they always have the same absolute value and the same standard error. We report both so that if a user decides after the fact that FV2 − FV1 is more easily interpreted than FV1 − FV2, he or she is spared the trouble of having to rerun the macro with Set1 and Set2 swapped, or to compute the confidence interval by hand (which would needlessly introduce a possible source of error). This method of reporting is also consistent with how SPSS reports pairwise comparisons of means via EMMEANS with the COMPARE option.

  14. 14.

    We used GENLIN because it produces a covariance matrix for the coefficients that is easier to work with than the one provided by the LOGISTIC REGRESSION procedure. However, we also ran the model via LOGISTIC REGRESSION, because (unlike GENLIN) it provides multiple degree of freedom tests for categorical explanatory variables with two or more levels.

  15. 15.

    OMS stands for output management system. It allows one to direct output that would ordinarily appear in the output viewer (e.g., tables of regression coefficients) to a data file for further processing.

  16. 16.

    Syntax file Example_2.SPS, which is available as part of the online Supplementary Material, shows how the data management steps were carried out.

  17. 17.

    The category numbers for EDUC appear in descending order because of the “order = descending” option we included in the GENLIN command. That was done in order to make EDUC = 1 (high school not completed) the reference category.

  18. 18.

    The redundant variables represent the reference categories for EDUC and EDUC*EMPLOY.

  19. 19.

    This is the omnibus F-test, or the F-test for the “corrected model,” as it says in the UNIANOVA output.

References

  1. Fox, J. (2008). Applied regression analysis and generalized linear models (2nd ed.). Los Angeles: Sage.

    Google Scholar 

  2. Johnson, R. A., & Wichern, D. W. (2002). Applied Multivariate Statistical Analysis (5th ed.). Upper Saddle River: Prentice Hall.

    Google Scholar 

Download references

Acknowledgments

We thank Hillary Maxwell, Mary Lou Schmuck, and Michel Bédard for helpful feedback on earlier drafts of the manuscript. We also thank Marta García-Granero for providing SPSS matrix language code to compute area under the t-distribution. Finally, we acknowledge the helpful contributions of two anonymous reviewers.

Competing interests

None of the authors have any competing interests.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Bruce Weaver.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(SAV 13 kb)

ESM 2

(SAV 24 kb)

ESM 3

(SPS 9 kb)

ESM 4

(SPS 9 kb)

ESM 5

(SPS 5 kb)

ESM 6

(SPS 9 kb)

ESM 7

(SPS 12 kb)

ESM 8

(SPS 6 kb)

Appendices

Appendix 1 Computation confidence intervals and statistical tests

Ninety-five percent confidence intervals are computed as point estimate ± a critical value times the standard error of the point estimate.

  • 95% CI for FV1 = FV1 ± critical value × SE(FV1)

  • 95% CI for FV2 = FV2 ± critical value × SE(FV2)

  • 95% CI for (FV1 − FV2) = (FV1 − FV2) ± critical value × SE(FV1 − FV2)

  • 95% CI for (FV2 − FV1) = (FV2 − FV1) ± critical value × SE(FV2 − FV1)

For regression models fitted via MLE, the critical value is 1.96, which is the 97.5th percentile of the standard normal distribution. For models fitted via OLS, the critical value is the 97.5th percentile of the t-distribution with degrees of freedom equal to the error degrees of freedom from the regression model. (The error degrees of freedom for an OLS model are equal to the sample size minus the total number of model parameters, including the constant.)

Statistical tests are also carried out. For models fitted via MLE, Wald tests are computed as follows:

  • Wald1 = Wald test value for FV1 = (FV1/SE(FV1))2

  • Wald2 = (FV2/SE(FV2))2

  • Wald1−2 = ((FV1 − FV2)/SE(FV1 − FV2))2

  • Wald2−1 = ((FV2 − FV1)/SE(FV2 − FV1))2.

And for models fitted via OLS, t-tests are computed as follows:

  • t 1 = FV1/SE(FV1)

  • t 2 = FV2/SE(FV2)

  • t 1−2 = (FV1 − FV2)/SE(FV1 − FV2)

  • t 2−1 = (FV2 − FV1)/SE(FV2 − FV1).

Note that because of the squaring, Wald1−2 and Wald2−1 will have exactly the same value. For OLS models, t 1−2 and t 2−1 will have the same magnitude but opposite signs. The null hypothesis for all of these tests is that the parameter being tested (i.e., FV1, FV2, or the difference between them) equals zero. When the null hypothesis for a Wald test is true, the test value is distributed as chi-square with one degree of freedom (df). When the null hypothesis for a t-test is true, the test value is distributed as t with df = df error from the regression model, which is equal to the sample size minus the number of parameters estimated by the model (including the constant).

Finally, for certain models fitted via MLE, it may be desirable to exponentiate the fitted values and their confidence intervals. For logistic regression, for example, e B, which can also be written as Exp(B), equals the odds ratio. For Poisson regression, Exp(B) gives the rate ratio, sometimes called the incidence rate ratio.

Appendix 2 Validation of the macros

To validate !OLScomp, we ran a 2 × 2 ANOVA model using data from the Employee data.sav file that comes with SPSS. However, our OLScomp_validation.sps syntax file actually uses a copy of that data file that was renamed to Employee_data.sav, with an underscore character (“_”) between Employee and data. This was done because when a file with spaces embedded in its name is downloaded from a Web site, the spaces are often replaced with underscore characters. Therefore, we inserted the underscore character to ensure that the file name after download matches the file name we use in syntax file OLScomp_validation.sps.

The outcome variable for this validation analysis was salary (in thousands of dollars), and the two between-subjects factors were male gender (1 = male, 0 = female) and minority status (1 = yes, 0 = no). The male × minority interaction was also included. The most common way to run this model in SPSS is with the UNIANOVA procedure, although one could obtain exactly the same model using the REGRESSION procedure. One advantage of UNIANOVA is that it can generate a table of estimated marginal means (EMMEANS) for the interaction term, including pairwise comparisons that give the simple main effects of one factor at each level of the other factor (e.g., the yes − no difference for minority within males and females separately). SyntaxBox 7 shows the UNIANOVA syntax we used to run the model and generate the tables of simple main effects for both gender and minority.

The UNIANOVA results showed that R 2 = .258, MS error = 217.804, and F(3, 470) = 54.405, p < .001.Footnote 19 The cell means and the pairwise comparisons generated by the UNIANOVA syntax are shown in Tables 3, 4 and 5.

Table 3 Cell means from the 2 × 2 ANOVA model
Table 4 Simple main effects of minority classification for females and males
Table 5 Simple main effects of sex within the two levels of minority classification
Syntax Box 7. Syntax to run a 2 × 2 ANOVA and generate tables of estimated marginal means with pairwise comparisons

To make those same pairwise comparisons with the !OLScomp macro, we must first ensure that all categorical variables are coded as indicator variables, and then we must compute the product term needed for the interaction. As it happens, the gender variable in the raw data file is a string with values “f” and “m.” Therefore, we recoded it into an indicator variable called “male” (1 = male, 0 = female). The minority variable was already a 1–0 indicator variable, so there was no need to recode it. Syntax to compute the needed variables and to call !OLScomp to make the desired pairwise comparisons is shown in Syntax Box 8. Output from those !OLScomp runs is shown in Output Box 5.

Syntax Box 8. Syntax to compute the needed indicator and product variables and !OLScomp commands to make the desired pair-wise comparisons
Output Box 5. !OLScomp results showing the same pairwise comparisons generated via EMMEANS using UNIANOVA

As usual, the output from !OLScomp shows the R 2 value for the model, the root mean square error (RMSE), and the omnibus F-test. The R 2 and F values match exactly what we found when we ran the analysis via UNIANOVA, and apart from a little rounding error, the square of the RMSE = MS error from UNIANOVA, so the user can be confident that !OLScomp is producing the same model as UNIANOVA.

Turning to the results for the first call to !OLScomp (see Output Box 5), note that FV1 = 44.475 and FV2 = 26.707. These are the mean salaries (in thousands of dollars) for males and females for whom minority status = no. Note that they agree exactly with the EMMEANS produced via UNIANOVA (see Table 3). Note too that the FV1 − FV2 difference agrees exactly with the male–female pairwise comparison generated via UNIANOVA, right down to the standard error and 95% confidence interval (see Table 5). The results for the other three calls to !OLScomp in Syntax Box 8 can be verified in similar fashion.

One further way to validate !OLScomp is to test whether it gives correct results for the difference between two differences. Specifically, can it give us the M − F difference when minority = yes minus the M − F difference when minority = no?

As we saw earlier, programming !OLScomp to give a difference between two differences is very straightforward if one has already used it for looking at the two primary differences one now wishes to compare. In that case, the values that need to be entered in the Set1 and Set2 vectors can be read directly from earlier output. For the example we are considering, the two primary differences are M–F (Minority = Yes) and M–F (Minority = No). We previously used !OLScomp to examine both of them. Here are the relevant parts of the output from those analyses:

figuren

The S1–S2 line from the M–F (Minority = Yes) output gives us the values we now need to enter as Set1; and the S1–S2 line from the M–F (Minority = No) output gives us the values we now need to enter as Set2. Therefore, we must call !OLScomp as shown in Syntax Box 9. Output from that call is shown in Output Box 6.

Syntax Box 9. !OLScomp syntax to compare the M − F differences for minority and nonminority cases
Output Box 6. Output generated by the !OLScomp command shown in Syntax Box 9

Note that the FV1 and FV2 values from this analysis match F1 − F2 differences that we saw in previous analyses of the primary differences. FV1 from this analysis = FV1 − FV2 from the earlier “M–F (Minority = Yes)” analysis; and FV2 from this analysis = FV1 − FV2 from the earlier “M–F (Minority = No)” analysis (see Output Box 5).

To further validate the !OLScomp macro, we can compare the FV1 − FV2 result from this analysis with some output in the table of parameter estimates generated by UNIANOVA (Table 6). Note that the results for the male × minority interaction (the M × M line in Table 6) match exactly the FV1 − FV2 results from !OLScomp. That is, the regression coefficient for M × M = −8.585 (SE = 3.348), matching exactly the FV1 − FV2 output from !OLScomp.

Table 6 Parameter estimates from the 2 × 2 ANOVA model run via command shown in Syntax Box 7

Finally, we have validated the !MLEcomp macro in exactly the same manner, using the GENLIN procedure to generate estimated marginal means with pairwise comparisons. Readers who are interested in the details can run syntax file MLEcomp_validation.sps, which is available as part of the online Supplementary Material. Note that MLEcomp_validation.sps does not require a raw data file, since the needed raw data are generated internally via the DATA LIST command in conjunction with the WEIGHT command.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Weaver, B., Dubois, S. SPSS macros to compare any two fitted values from a regression model. Behav Res 44, 1175–1190 (2012). https://doi.org/10.3758/s13428-012-0204-2

Download citation

Keywords

  • Regression
  • Ordinary least squares
  • Maximum likelihood estimation
  • SPSS