A novel method for modelling interaction between categorical variables

Sweeney and Ulveling (1972) introduced weighted effect coding, where the estimates for categories of nominal and ordinal variables are deviations from the arithmetic mean, typically from a sample. This somewhat neglected parameterization is preferred over the well-known effect coding (ANOVA) if the data are unbalanced (i.e., when categories hold different numbers of observations) and was recently revived in this journal (te Grotenhuis et al. 2016). In this paper, we show that weighted effect coding can also be applied to regression models with interaction effects. The weighted effect coded interactions represent the additional effects over and above the main effects obtained from the model without these interactions. This is a useful alternative to effect coding when the data are unbalanced as in most observational data. In this contribution, we describe this novel parameterization and provide syntax, data, and examples in SPSS, R, and Stata on http://www.ru.nl/sociology/mt/wec/downloads. For didactical reasons we apply OLS regression models, but weighted effect coded interactions can be used in any generalized linear model. Throughout this text we use the word ‘interaction’, while other researchers prefer ‘moderation’. 
 
Interactions between categorical variables 
 
 
Dummy coded interaction 
When directional interaction hypotheses are tested and categorical (i.e., ordinal or nominal scaled) predictor variables are involved, dummy coding is often appropriate. In this parameterization the main effects relate to a particular subset of respondents and for the remaining subsets the dummy coded interaction effects reflect deviations from these main effects. To create dummy coded interaction variables one has to multiply the original, 0/1 coded, dummy variables (Hardy 1993). As an empirical example we will investigate to what extent the mean BMI differs across three age categories in a group of respondents with one or more children and in a childless group (Umberson et al. 2011). We use data on self-reported body length and weight, in three random samples (n = 3314) drawn from the Dutch population (aged 18–70) in 2000, 2005, and 2011 (Eisinga et al. 2002, 2012a, b). We created the dummy coded variables Childlessdc with code 1 for respondents with no children and code 0 for respondents with one or more children, Middledc (code 1 for the middle-aged and 0 for both young and older respondents) and Olderdc (1 for older and 0 for both young and middle-aged respondents). The dummy coded interaction variables Childlessdc × Middledc, and Childlessdc × Olderdc are multiplications of these dummy coded variables (see Table 1 and our website for details). First, we estimated the main effects without interaction (see Table 4, Model 1) and second, we added the two interaction variables (Table 4, Model 2). Note that the reference categories (a) respondents with children, (b) youngsters, and (c) childless youngsters are omitted from the two models, which means that their estimates are set to zero. 
 
Table 1 
 
Coding scheme for the dummy coded main and interaction effects for the childless, middle-aged and older-aged (references/omitted categories are with children, young, and childless × young)


Introduction
introduced weighted effect coding, where the estimates for categories of nominal and ordinal variables are deviations from the arithmetic mean, typically from a sample. This somewhat neglected parameterization is preferred over the well-known effect coding (ANOVA) if the data are unbalanced (i.e., when categories hold different numbers of observations) and was recently revived in this journal (te Grotenhuis et al. 2016). In this paper, we show that weighted effect coding can also be applied to regression models with interaction effects. The weighted effect coded interactions represent the additional effects over and above the main effects obtained from the model without these interactions. This is a useful alternative to effect coding when the data are unbalanced as in most observational data. In this contribution, we describe this novel parameterization and provide syntax, data, and examples in SPSS, R, and Stata on http://www.ru.nl/sociology/mt/wec/downloads. For didactical reasons we apply OLS regression models, but weighted effect coded interactions can be used in any generalized linear model. Throughout this text we use the word 'interaction', while other researchers prefer 'moderation'.

Dummy coded interaction
When directional interaction hypotheses are tested and categorical (i.e., ordinal or nominal scaled) predictor variables are involved, dummy coding is often appropriate. In this parameterization the main effects relate to a particular subset of respondents and for the remaining subsets the dummy coded interaction effects reflect deviations from these main effects. To create dummy coded interaction variables one has to multiply the original, 0/1 coded, dummy variables (Hardy 1993). As an empirical example we will investigate to what extent the mean BMI differs across three age categories in a group of respondents with one or more children and in a childless group (Umberson et al. 2011). We use data on selfreported body length and weight, in three random samples (n = 3314) drawn from the Dutch population (aged 18-70) in 2000(aged 18-70) in , 2005(aged 18-70) in , and 2011(aged 18-70) in (Eisinga et al. 2002(aged 18-70) in , 2012a. We created the dummy coded variables Childless dc with code 1 for respondents with no children and code 0 for respondents with one or more children, Middle dc (code 1 for the middleaged and 0 for both young and older respondents) and Older dc (1 for older and 0 for both young and middle-aged respondents). The dummy coded interaction variables Childless dc 9 Middle dc , and Childless dc 9 Older dc are multiplications of these dummy coded variables (see Table 1 and our website for details). First, we estimated the main effects without interaction (see Table 4, Model 1) and second, we added the two interaction variables (Table 4, Model 2). Note that the reference categories (a) respondents with children, (b) youngsters, and (c) childless youngsters are omitted from the two models, which means that their estimates are set to zero.
Our results show that without interaction, the estimated mean BMI among childless respondents is a significant 0.9 BMI points lower compared to respondents with children, taking into account their age. Further, the estimated mean BMI is significantly higher in both the middle-aged group (1.36) and in older respondents (2.09), compared to youngsters while controlling for having children or not.
After adding the interactions, the main effect of the dummy coded variable Childless relates to the youngest group only. So, respondents who are youngest and childless have an estimated mean BMI of -1.92 points lower compared to the youngest respondents with children. Further, the main effects in the middle-aged and older group pertain to the respondents with children only. The middleaged people with children have an estimated mean BMI that is 0.46 (non-significant) higher compared to youngsters with children. The older respondents with children have a BMI that is about 1 BMI point higher (1.22), again compared to youngsters with children.
The two interaction effects (one of them being significant) show the extra effect on BMI on top of the aforementioned main effects. For instance, childless middle-aged respondents have an estimated mean BMI that is -1.92 ? 1.22 = -0.7 BMI points less compared to middle-aged respondents with children. Likewise, the childless middle-aged respondents have an estimated mean BMI that is 1.68 higher (0.46 ? 1.22) compared to youngsters who are childless.

Effect coded interaction
It seems a bit odd to use dummy coding in our example because to our knowledge there is no theory that for instance predicts a stronger age-effect among the childless or a weaker effect of having children among the middleaged. In general, dummy coding is less appropriate if one is agnostic about the direction of effects as the selection of reference categories and the associated statistical tests are then mostly arbitrary. One popular solution is effect coding, where in interaction models the main effect represents a grand mean effect while the interaction effects are deviations from that grand mean effect. This grand mean effect is unweighted, so effect coding is tailor-made for socalled completely balanced designs (Berger and Wong 2009). In such designs all cells have equal numbers of observations. This is not a necessary condition for the sample data; it suffices to assume a population with such a balanced design, while the sample is unbalanced due to randomness for instance. Especially in experimental settings where equal group sizes are often desired, this type of parameterization is well suited to test whether the treatment effect differs across relevant groups (Berger and Wong 2009). Note that in that particular case there are no hypotheses about the directions of the interaction effects.
In general, an effect coded variable has code 1 for a specific category, 0 for all other categories save the statistically redundant and, therefore, omitted reference category, which is coded -1 (Hardy 1993). In our example, we created six effect coded interactions which are the result of the effect coded variables Childless ec and With Children ec multiplied with Young ec , Middle ec , and Older ec , which are also effect coded (see Table 2 and our website for details).
The results for this effect coded interaction model are given in Table 4, and again model 1 with no interaction is presented first. The grand mean BMI is 24.73 (intercept) and respondents with children have an estimated mean BMI of 24.73 1 0.45 = 25.18. The respondents with no children have an estimated mean BMI of 24.73 -0.45 = 24.28. To find the grand mean again we have to sum 25.18 and 24.28 and divide it by 2, resulting in 24.73, which again is the value for the intercept. This proves that with regard to the point of reference, effect coding does not take into account the possible unequal number of observations in the categories. Compared to this grand mean of 24.73, the estimated mean BMI is -1.15  lower for the younger respondents, 0.21 higher for the middle-aged, and 0.94 higher for the older respondents. Note that when these three deviations are summed, the outcome equals zero, which is typical for using a balanced design. After adding the effect coded interaction variables (Model 2), the grand mean shifts to 24.88 and the main effects also change. This is due to the unbalanced nature of our data, for instance the number of older respondents without children is 62, whereas 3314/6 = 552 is expected in a completely balanced design.

Weighted effect coded interaction
In our example the sample data are far from being balanced (see the numbers of observation per category in Table 4). This means that testing interaction effects under the assumption of balanced data with a grand mean effect as a point of reference is less appropriate, because most probably the data are not balanced in the target population as well. In such cases testing interaction effects against the effects found without interactions makes more sense, as the latter are overall main effects, taking into account the numbers of observation per category. This is a new way of modelling interaction using weighted effect coded interaction variables. Unlike dummy coding and effect coding, these interaction variables are not simply the multiplication of two weighted effect coded variables. Instead, weights are assigned to the interaction variables to obtain main effects that equal the effects from the model without these interactions (see Table 3 for details and our website for indepth matrix information and for syntax in SPSS, R and  With children and younger -(n c /n w ) -(n m /n y ) -(n o /n y ) (n cm /n wy ) ( n co /n wy ) With children and middle-aged -(n c /n w ) 1 0 -(n cm /n wm ) 0 With children and older -(n c /n w ) 0 1 0 -(n co /n wo ) Childless and younger 1 -(n m /n y ) -(n o /n y ) -(n cm /n cy ) -(n co /n cy ) Childless and middle-aged 1 1 0 1 0 Childless and older 1 0 1 0 1 n w number of observations (n) in category with children, n c n in category childless, n y n in category young, n m n in category middle, n o n in category older, n wy n in category with children and young, n wm n in category with children and middle, n wo n in category with children and older, n cy n in category childless and young, n cm n in category childless and middle, n co n in category childless and older Stata). The orthogonal interaction effects then denote the extra effect over and above the main effects found in the model without these interactions, no matter whether the data are unbalanced or not. In case the data are completely balanced, the estimates from weighted effect coding are equal to those from effect coding, but they can be quite different in effect size and associated t values when the data are unbalanced. This is illustrated in Table 4, last two columns. In Model 1 (without interactions), the estimate for the intercept equals 24.98, and equals the observed (arithmetic) sample mean in our dataset. Respondents with children have an estimated mean BMI that is 0.29 higher than 24.98, whereas childless respondents score 0.61 BMI points lower. Further, the youngster have a mean BMI of 24.98 -1.24 = 23.74, whereas for the middle-aged the mean BMI is slightly higher (?0.12), and finally for older respondents we must add 0.85 to 24.98 to find their estimated mean BMI. Note that the effects no longer add up to 0, as we take into account the unequal numbers of observations. When the six interactions are added in Model 2, nothing changes in the intercept or main effects, because the interactions have a mean of 0 and are orthogonal to the main effects. The interpretation of these interactions is straightforward: it is the extra estimated mean BMI over and above the main effects found in the model without interactions. For instance the young respondents with no children have a BMI which is an extra -0.17 lower compared to 24.98 (on top of the main effects -1.24 and -0.61). Note that the equal sized interaction effects -0.17 (childless 9 young) and 0.17 (childless 9 older) have quite different t values (-2.75 vs. 0.39). This is a direct result of taking into account the different number of respondents per category; there are much less older people than younger people, so the power of that test is lower. Note also that the t value for children 9 older and childless 9 older is equal (-0.39) as the dichotomy children/ childless is mutually exclusive. Note further that weighted effect coded interaction effects do not add up to zero as in effect coding, again due to the different numbers per category. We finally add that in Table 4 the explained variances are the same in all three models 1 and in all three models 2. So, no matter which type of coding is used, the predicted BMI scores are exactly the same. The only difference is the type of base line one wishes to use. In dummy coding this base line is a particular subset of respondents, in effect coding it is a grand mean of estimates (neglecting the possible unbalance in the data), while in weighted effect coding the base line is the weighted main effect.
To save space we did not include control variables in our models, the interpretation, however, is basically the same: the weighted effect coded interaction effects still reflect deviations from the weighted main effects, only this time after taking into account one or more control variables. Because the weighted effect coded interactions may be correlated with the control variable(s), the main effects in a controlled model with and without weighted coded interaction parameters can be different in such cases (see our website for an example). The interaction between weighted effect coded variables and interval/ratio scaled variables is available on our website as well.
To conclude: whenever non-directional interaction hypotheses are tested using unbalanced data and this unbalancedness is deemed relevant for the target population, weighted effect coded interactions are to be preferred over effect coded interactions.

Weighted effect coded interactions in generalized linear models
In this contribution, we showed that weighted effect coded interaction effects represent deviations from the overall main effects (i.e., the main effects found in a model without interaction). This general interpretation holds for any generalized linear model. However, we must add that in logistic regression models the main and interaction effects relate to the odds (i.e., p 1 /(1 2 p 1 )). They do not directly relate to p 1 itself, i.e., the estimated probability to score 1 on the dependent variable. In fact, even without interaction parameters, the effects of the predictor variables in a logistic regression model exhibit interaction when the probability (p 1 ) is considered (Mood 2010).