1 Introduction

Many research questions in the social sciences are categorical and concern theoretical expectations about heterogeneous effects in choice behavior. Regression models for categorical dependent variables are well-established and widely applied in the social sciences to analyze research problems that involve two or more categories without an ordering structure (see, e.g., Agresti 2007; Long 1997; Tutz 2012). Applications of discrete choice models that incorporate heterogeneity are limited. One frequently applied model is the mixed logit model (MXL) (see, e.g., Greene et al. 2006; McFadden and Train 2000) and related models, which can be quite demanding to apply. For example, the researcher needs to decide on a distribution for the subject-specific heterogeneity to approximate the underlying behavioral process, and repeated measurements are necessary to identify the model.

The objectives of this contribution are:

  1. 1.

    to present and discuss a very flexible and general approach to account for heterogeneity in nominal dependent variables;

  2. 2.

    to investigate differences with alternative models allowing heterogeneity in choice behavior, both theoretically and empirically;

  3. 3.

    to apply it to empirical social science choice data to exemplify the types of insights that can be obtained from the approach.

Relying on the random utility maximization framework, we present the General Heterogeneous Multinomial Logit Model (GHMNL). The model builds on the standard multinomial logit model (MNL) (McFadden 1974), the most frequently applied approach to study choices among discrete alternatives.Footnote 1 As the MNL model, the GHMNL model is a classical discrete choice model that can handle both choice-specific and chooser-specific explanatory variables. In contrast to the MNL model, which ignores that the variance of the underlying latent traits can be chooser-specific, the GHMNL model accounts for such heterogeneous effects. It allows for systematically studying theoretical expectations about heterogeneity in choice behavior. The extension integrates a heterogeneity term into the systematic part of the utility function, which is linked to explanatory variables and permits accounting for behavioral tendencies in choice behavior without referring to latent variables. The heterogeneity term indicates the degree of distinctiveness of choice or the strength of heterogeneity in choice behavior and allows examining which explanatory variables cause heterogeneity. Compared to the MXL and related models, the GHMNL model comes with convenient properties, such as its closed-form solution for evaluating the outcome probabilities. It also frees researchers from making distributional assumptions for the random parameters and is computationally straightforward.

We apply the GHMNL model to electoral choice data and demonstrate its benefits in studying heterogeneity in issue voting behavior. The empirical application has several merits. Issue voting models typically contain both types of explanatory variables, choice-specific (issue proximities) and chooser-specific (socioeconomic voter attributes) ones. The political science literature on voter heterogeneity also provides several theoretical concepts why not all voters assign the same importance to issue considerations, including, for instance, platform divergence or political sophistication. We will demonstrate how the GHMNL model allows incorporating such theoretical expectations into the empirical modeling. Although we focus on heterogeneity in electoral choices in our empirical application, we see great potential for applying the model to explore heterogeneous effects in all social science disciplines.

Based on a brief review of the classical discrete choice model, we first derive the general heterogeneous multinomial choice model (GHMNL) and outline how it extends the standard MNL model. Next, we investigate the differences between the GHMNL and competing models. Then, we apply the GHMNL model to electoral choice data, followed by an empirical comparison with alternative models. Finally, we offer some concluding remarks.

2 The standard multinomial choice model

The multinomial logit model (MNL) is the most common model to study choice behavior (see, e.g., Hensher et al. 2015; Train 2009). One key feature of the MNL model limits our insights into heterogeneity in choice behavior. It ignores that the variances of the underlying latent traits can vary across decision makers. A brief review of the MNL model will help to show how the model we propose can account for heterogeneity.

Let \(Y_i \in \{1, \dots , J\}\) denote the dependent variable that consists of J unordered multiple categories for \(i \in \{1,\ldots , n\}\) observations. Within the discrete choice framework, the categories represent J discrete, mutually exclusive, and finite alternatives of which decision makers choose one. The choice outcome can be a function of two types of explanatory variables: choice-specific and chooser-specific variables. The formers are specific for each category and take different values across both alternatives and choosers. They characterize the alternatives, such as price or distance in a classical mode choice situation. Chooser-specific variables contain characteristics of the decision makers, which vary over decision makers but are constant across alternatives, such as age or gender. Let \(z_{ijk}\), \(j \in \{1, \dots , J\}\), \(k\in \{1,\ldots ,K\}\) denote the choice-specific variables and \(s_{im}, m\in \{1,\ldots ,M\}\) the chooser-specific covariates.

A common way to motivate a choice model is to consider the utilities associated with the alternatives as latent variables. Let \(U_{ij}\) denote an unobservable random utility that represents how attractive or appealing each alternative \(j \in \{1, \ldots , J\}\) is for chooser \(i \in \{1,\ldots , n\}\). The decision makers are assumed to assess and compare each alternative and select the one that maximizes the random utility so that \(Y_i\) is linked to the latent variables by the principle of maximum random utility,

$$\begin{aligned} Y_i=j\quad \Leftrightarrow \quad U_{ij}=\max _{s\in \{1,\dots ,J\}} U_{is}. \end{aligned}$$

In a random utility framework, the utility is determined by \(U_{ij} = V_{ij} + \varepsilon _{ij}\), where \(V_{ij}\) represents the systematic part of the utility, specified by explanatory variables and unknown parameters, whereas \(\varepsilon _{i1},\dots , \varepsilon _{iJ}\) are independent and identically distributed (i.i.d.) random variables with distribution function F(.).

The systematic part of the utility function is specified as a linear predictor

$$\begin{aligned} V_{ij} = \beta _{j0} + \sum\nolimits _{k=1}^K z_{ijk}\alpha _{k} + \sum \nolimits _{m=1}^M s_{im}\beta _{jm} = \beta _{j0} + {\varvec{z}}_{ij}^T \varvec{\alpha }+ {\varvec{s}}_{i}^T {\varvec{\beta }}_j , \end{aligned}$$
(1)

where

  • \(\beta _{10},\ldots ,\beta _{J0}\) are the alternative-specific constants.

  • \(\varvec{\alpha }^T=(\alpha _1, \dots , \alpha _K)\) are the parametersFootnote 2 associated with the vector of choice-specific variables \({\varvec{z}}_{ij}^T=({\varvec{z}}_{ij1},\dots ,{\varvec{z}}_{ijK})\), which indicate the weight decision makers attach to each attribute k of the alternatives.

  • \({\varvec{\beta }}_j^T=(\beta _{j1}, \dots , \beta _{jM})\) is a coefficient vector that expresses how the chooser attributes contained in \({\varvec{s}}_{i}^T=(s_{i1} ,\dots ,s_{iM})\) determine the choice.

By assuming that \(\varepsilon _{i1},\dots ,\varepsilon _{iJ}\) are i.i.d. variables with distribution function \(F(x) = \exp (-\exp (-x))\), which is known as the Gumbel or maximum extreme value distribution, one obtains the classical standard multinomial logit model (see McFadden 1974)

$$\begin{aligned} P(Y_i=j \mid \{{\varvec{z}}_{ij}\}, {\varvec{s}}_i)= \frac{\exp (V_{ij})}{\sum \nolimits _{s=1}^J \exp (V_{is})}= \frac{\exp (\beta _{j0} + {\varvec{z}}_{ij}^T \varvec{\alpha }+ {\varvec{s}}_{i}^T {\varvec{\beta }}_j)}{\sum \nolimits _{s=1}^J \exp (\beta _{s0} + {\varvec{z}}_{is}^T \varvec{\alpha }+ {\varvec{s}}_{i}^T {\varvec{\beta }}_s)}, \end{aligned}$$
(2)

\(j \in \{1, \dots , J\}\). Since the chooser-specific variables \({\varvec{s}}_i\) are constant over the alternatives, not all of the corresponding coefficients are identifiable. The same applies to the constants. To identify the model, side constraints are needed. We will use the standard side constraint based on a reference alternative, whose coefficients are set to zero. We select the first alternative as reference and set \(\beta _{10}=0\) and \({\varvec{\beta }}_1^T=(0,\dots ,0)\).

The standard MNL model presented in Eqs. (1) and (2) ignores that the variance of the underlying latent traits can be subject-specific so that the variances are not allowed to differ across decision makers. Previous research has shown that ignoring variance heterogeneity can yield biased estimates (see, e.g., Tutz 2021).

3 A general heterogeneous multinomial choice model

In this section, we present and discuss a general multinomial choice model, which we refer to as the General Heterogeneous Multinomial Logit Model, in short GHMNL, that accounts for variance heterogeneity in choice behavior. Models of this type have been considered before by Hensher et al. (1998), DeShazo and Fermo (2002), and Tutz (2021).Footnote 3 The present work goes beyond these contributions in the following ways. We outline in detail and discuss the specification of the utility functions and the choice probabilities in the GHMNL model. We put particular emphasis on the interpretation of the heterogeneity term that is incorporated into the utility function and the estimation methods. In addition, we have written an R function that allows the user to fit the GHMNL model. Section A in the Supplementary Information describes the routines to implement the model.

3.1 Utility functions and choice probabilities

The GHMNL model extends the standard MNL model by adding a heterogeneity term to the systematic part of the utility function. For simplicity, let al.l the explanatory variables and the constants be collected in the alternative-specific vector \(\bf x_{{ij}}^{T} = \left( {1_{j}^{T} ,0, \ldots ,z_{{ij}}^{T} , \ldots ,0} \right)\), where 1j is the jth unit vector and 0 is a vector of zeros. Then, the utility functions take the form

$$\begin{aligned} V_{ij} = \beta _{j0} + {\varvec{z}}_{ij}^T \varvec{\alpha }+ {\varvec{s}}_{i}^T {\varvec{\beta }}_j = {\varvec{x}}_{ij}^T \varvec{\delta }, \end{aligned}$$

where \(\varvec{\delta }^T=(\beta _{10},\dots ,\beta _{J0},\varvec{\alpha }^T,{\varvec{\beta }}_1^T,\dots ,{\varvec{\beta }}_J^T)\). To derive the GHMNL model, we assume that the latent utilities are given more generally by

$$\begin{aligned} U_{ij}={\varvec{x}}_{ij}^T\varvec{\delta }+\sigma _i\varepsilon _j, \end{aligned}$$

where \(\sigma _i\) is the standard deviation associated with decision maker i. The standard deviation is linked to explanatory variables by assuming \(\sigma _i= e^{-{\varvec{w}}_i^T\varvec{\gamma }}\), where \({\varvec{w}}_i\) is a vector of chooser-specific covariates and \(\varvec{\gamma }\) is a vector of parameters. As a result, the utilities \(V_{ij}\) in the GHMNL model are specified as

$$\begin{aligned} V_{ij}={\varvec{x}}_{ij}^T\varvec{\delta }e^{{\varvec{w}}_i^T\varvec{\gamma }}= (\beta _{j0}+ {\varvec{z}}_{ij}^T\varvec{\alpha }+{\mathbf {s}}_{i}^T{\varvec{\beta }}_j )e^{{\varvec{w}}_i^T\varvec{\gamma }}, \end{aligned}$$
(3)

where

  • \({\varvec{s}}_{i}\) is a vector of chooser-specific covariates, and \({\varvec{z}}_{ij}\) is a vector of alternative-specific covariates. As in the standard MNL model, the variables \({\varvec{s}}_{i}\) have alternative-specific effects and \({\varvec{z}}_{ij}\) global effects.

  • \({\varvec{w}}_i^T=(w_{i1},\dots ,w_{iL})\) is a vector of chooser-specific variables, which can be a subset of \({\varvec{s}}_{i}\). It contains attributes of the decision makers that are supposed to cause heterogeneity in choice behavior. The parameter vector \(\varvec{\gamma }^T=(\gamma _{1}, \dots , \gamma _{L})\) indicates the strength of heterogeneity in choosing one alternative.

The model distinguishes between two types of effects: a location effect and a heterogeneity effect. The term \({\varvec{x}}_{ij}^T\varvec{\delta }\) in Eq. (3) represents the location effect. It is also present in the standard MNL model and determines which alternative the chooser tends to prefer. The novel term \({\varvec{w}}_i^T\varvec{\gamma }\) represents the heterogeneity effect that determines the impact of heterogeneity in choice behavior.

As the standard MNL model, the GHMNL model has a closed-form solution for evaluating the choice probabilities so that the utility functions \(V_{ij}\) are linked to the choice probabilities through a logistic response function,

$$\begin{aligned} P(Y_i=j\mid \{{\varvec{x}}_{ij}\},{\varvec{w}}_i)=\frac{\exp ({\varvec{x}}_{ij}^T\varvec{\delta }e^{{\varvec{w}}_i^T\varvec{\gamma }})}{\sum \nolimits _{s=1}^J \exp ({\varvec{x}}_{is}^T\varvec{\delta }e^{{\varvec{w}}_i^T\varvec{\gamma }})}, \, j \in \{1, \dots , J\}. \end{aligned}$$
(4)

Alternatively, the relationship between the choice probabilities and the utility functions can be expressed in terms of odds:

$$\begin{aligned} \begin{aligned} \frac{P(Y_i=j \mid \{{\varvec{x}}_{ij}\},{\varvec{w}}_i)}{P(Y_i=s \mid \{{\varvec{x}}_{ij}\}, {\varvec{w}}_i)} =&\exp \{({\varvec{x}}_{ij}- {\varvec{x}}_{is})^T \varvec{\delta }e^{{\varvec{w}}_i^T\varvec{\gamma }}\} \\ =&\exp \{[\beta _{j0}-\beta _{s0}+({\varvec{z}}_{ij}- {\varvec{z}}_{is})^T\varvec{\alpha }+{\varvec{s}}_i({\varvec{\beta }}_j - {\varvec{\beta }}_s)]e^{{\varvec{w}}_i^T\varvec{\gamma }}\}. \end{aligned} \end{aligned}$$

3.2 Interpretation of the heterogeneity term

The essential novel term in the GHMNL model is the heterogeneity term. It is modeled by the factor \(e^{{\varvec{w}}_i^T\varvec{\gamma }}\) and represents the (inverse) standard deviation of the latent variables. The heterogeneity term can be understood as representing variance heterogeneity. However, it also allows for an interpretation without reference to latent variables, which are always elements used to build a model but cannot be observed. The heterogeneity term represents a specific choice behavior that permits accounting for behavioral tendencies that are not linked to particular alternatives:

  • When \({\varvec{w}}_i^T\varvec{\gamma }\rightarrow -\infty\), one obtains \(P(Y_i=j\mid \{{\varvec{x}}_{ij}\},{\varvec{w}}_i)=1/J\). In this extreme case, all alternatives have the same choice probabilities. It implies that the decision maker chooses an alternative at random because none of the covariates can systematically explain the choice. The chooser shows maximal heterogeneity.

  • When \({\varvec{w}}_i^T\varvec{\gamma }\rightarrow \infty\) and the condition \({\varvec{x}}_{ij}^T\varvec{\delta }\ne 0\) holds at least for one \(j > 1\), the probability for one of the \(j \in \{1,\dots ,J\}\) alternatives approaches 1. In this case, the decision maker has a distinct preference, and shows minimal heterogeneity. Therefore, choosers with large \({\varvec{w}}_i^T\varvec{\gamma }\)-values show less variability, they distinctly prefer specific alternatives.

Thus, the heterogeneity term \({\varvec{w}}_i^T\varvec{\gamma }\) can be considered as an indicator of the degree of distinctness of choice or as a measure of heterogeneity in choice behavior. For small values of \({\varvec{w}}_i^T\varvec{\gamma }\), the difference between the choice probabilities becomes small. By contrast, the difference between a specific alternative and the remaining ones gets larger when \({\varvec{w}}_i^T\varvec{\gamma }\) increases. As the heterogeneity term contains attributes of the decision makers, the model systematically accounts for heterogeneity in choice behavior across individuals. It allows examining which explanatory variables cause heterogeneous effects. For example, suppose \(w_{i}\) denotes age and \(\gamma\) is positive. It would suggest that older decision makers have more clear-cut preferences than younger ones. The former tend to prefer specific alternatives, while younger decision makers have less distinct preferences and show more heterogeneity in selecting one alternative.

Figure 1 illustrates the behavioral tendencies the GHMNL model can uncover. It depicts the probabilities \(P(Y_{i}=j)\) for a five-choice situation based on a model with two covariates in the heterogeneity term \({\varvec{w}}_i\), one binary and one quantitative normally distributed explanatory variable. For the binary covariate, we consider the effect at value \({\varvec{w}}_i^T =(1,0)\). The two panels depict the probabilities for different parameter values (\(\gamma _1\)) in the heterogeneity term: panel (a) shows the effects for positive \(\gamma _1\)-values, panel (b) for negative \(\gamma _1\)-values. In both panels, the filled circles depict the base probabilities that result when no heterogeneity is present, that is, when \(\gamma _1=0\), yielding the standard MNL model. Inspecting these base probabilities suggests that the chooser prefers alternative 3, and to a lesser extent alternative 5. Panel (a) shows that this pattern becomes more pronounced for increasing \(\gamma _1\)-values. Thus, the decision maker more distinctly prefers alternative 3 in the GHMNL model. By contrast, the pattern flattens for negative \(\gamma _1\)-values, as illustrated in panel (b). This indicates that the decision maker tends to choose an alternative at random and shows substantial heterogeneity in selecting one of the five alternatives.

Fig. 1
figure 1

Illustration of the Heterogeneity Term in the GHMNL Model Note: Plots depict probabilities \(P(Y_{i}=j)\) for five alternatives \(j \in \{1, 2,3,4,5\}\). The model is based on one binary and one quantitative normally distributed covariate. Panel (a) shows the effects for positive \(\gamma _1\)-values, panel (b) for negative \(\gamma _1\)-values. Filled circles \(\bullet\) depict the base probabilities when no heterogeneity is present (\(\gamma _1=0\)). The probability curves result by plugging the respective values into Eq. (4)

3.3 Estimation

In the following, we outline how the parameters of the GHMNL model can be estimated. Let the choice outcome be represented as a vector \({\varvec{y}}_i = (y_{i1}, \dots , y_{iJ})^T\) with \(y_{i1}\) taking the value 1 when alternative \(j \in \{1, \dots , J\}\) is chosen and 0 otherwise so that one obtains \({\varvec{y}}_i = (0,\dots ,0,1,0,\dots ,0)^T\) if \(Y_i = j\). Let \(\pi _{ij}=P(Y_i=j\mid \{{\varvec{x}}_{ij}\},{\varvec{w}}_i)\) denote the choice probabilities and \(\varvec{\delta }^T=(\beta _{10},\dots ,\beta _{J0},\varvec{\alpha }^T,{\varvec{\beta }}_1^T,\dots ,{\varvec{\beta }}_J^T,\varvec{\gamma }^T)\) the overall parameter vector that collects all coefficients to be estimated.

Using the first alternative as reference, the kernel of the log-likelihood of the model presented in Eq. (4) is given by

$$\begin{aligned} \begin{aligned} l(\varvec{\delta })=&\sum \nolimits _{i=1}^{n}\left\{ \sum \nolimits _{j=2}^{J} y_{ij} \log \left( \frac{\pi _{ij}}{1-\pi _{i2}-\dots -\pi _{iJ}}\right) + \log (1-\pi _{i2}-\dots -\pi _{iJ})\right\} \\ =&\sum \nolimits _{i=1}^{n}\left\{ \sum _{j=2}^{J} y_{ij} ({\varvec{x}}_{ij}^T\varvec{\delta }) e^{{\varvec{w}}_i^T\varvec{\gamma }} - \log (1+ \sum _{s=2}^{J}\exp ({\varvec{x}}_{is}^T\varvec{\delta }e^{{\varvec{w}}_i^T\varvec{\gamma }})\right\} . \end{aligned} \end{aligned}$$

For the maximization of the log-likelihood, we make use of the first derivatives, also known as score functions. They take the form

$$\begin{aligned} \partial l(\varvec{\delta }) / \partial \delta _t =&\sum _{i=1}^{n}\left\{ \sum \nolimits _{j=2}^{J} y_{ij} x_{ijt}\delta _t e^{{\varvec{w}}_i^T\varvec{\gamma }} -\frac{\sum \nolimits _{s=2}^{J} x_{ist} e^{{\varvec{w}}_i^T\varvec{\gamma }} \exp ({\varvec{x}}_{is}^T\varvec{\delta }e^{{\varvec{w}}_i^T\varvec{\gamma }})}{1+ \sum \nolimits _{s=2}^{J}\exp ({\varvec{x}}_{is}^T\varvec{\delta }e^{{\varvec{w}}_i^T\varvec{\gamma }})}\right\}, \\ \partial l(\varvec{\delta }) / \partial \gamma _t =&\sum \nolimits _{i=1}^{n}\left\{ \sum \nolimits _{j=2}^{J} y_{ij} ({\varvec{x}}_{ij}^T\varvec{\delta }) e^{{\varvec{w}}_i^T\varvec{\gamma }} w_{it} -\frac{\sum \nolimits _{s=2}^{J} {\varvec{x}}_{is}^T\varvec{\delta }e^{{\varvec{w}}_i^T\varvec{\gamma }} w_{it}\exp ({\varvec{x}}_{is}^T\varvec{\delta }e^{{\varvec{w}}_i^T\varvec{\gamma }})}{1+ \sum \nolimits _{s=2}^{J}\exp ({\varvec{x}}_{is}^T\varvec{\delta }e^{{\varvec{w}}_i^T\varvec{\gamma }})}\right\}. \end{aligned}$$

As approximation of the covariance \({\text {cov}}({{\hat{\varvec{\delta }}}})\), we use the observed information \(-\partial ^2 l({{\hat{\varvec{\delta }}}})/\partial \varvec{\delta }\partial \varvec{\delta }^T\).

4 Alternative approaches

A model that has been used to study heterogeneity in decision behavior is the mixed logit model (MXL) (see Greene et al. 2006; Hensher and Greene 2003; McFadden and Train 2000).Footnote 4 Two related models, which differ in parameterization, are the scale heterogeneity (S-MNL) model and the generalized multinomial logit model (G-MNL) (see Fiebig et al. 2010). A brief review and discussion of the MXL and related models will illustrate the advantages of applying the GHMNL model to account for heterogeneity in choice behavior.

4.1 Mixed logit model

Following Greene et al. (2006), the MXL model can be derived from latent utilities

$$\begin{aligned} U_{ijt} = {\varvec{z}}_{ijt}^T \varvec{\alpha }_i + \varepsilon _{ijt}, \end{aligned}$$

where

  • the additional index t refers to different choice occasions or tasks, and

  • \({\varvec{z}}_{ijt}\) is the full vector of explanatory variables, including attributes of the alternatives, socioeconomic characteristics of the decision makers, and the choice task itself.

Compared to the standard MNL model, the crucial extension in the MXL model is that the parameter vector \(\varvec{\alpha }_i\) is subject-specific so that the effects can vary across decision makers i. Assuming that the subject-specific effects are random and in part determined by an additional vector of covariates \({\varvec{w}}_i\), the model becomes a mixed-effects model. The subject-specific effects are assumed to take the form

$$\begin{aligned} \varvec{\alpha }_i = \varvec{\alpha }+\Delta {\varvec{w}}_i + \varvec{\Sigma }^{1/2}{\varvec{v}}_i, \end{aligned}$$

where

  • \(\Delta\) is a matrix of coefficients associated with the covariate vector \({\varvec{w}}_i\),

  • \({\varvec{v}}_i\) is a random vector of uncorrelated random variables with known variances,

  • \(\varvec{\Sigma }\) is a covariance matrix that determines the variance structure of the random term.

Maximum simulated likelihood estimates are obtained by maximizing the log-likelihood with respect to all the unknown parameters (see also Train 2009).

By allowing parameters to vary randomly over decision makers instead of assuming that they are the same for every chooser, the MXL model is very flexible and can account for a rather general form of heterogeneity. However, this flexibility comes with the cost of a large number of parameters, which might render estimates unstable without careful variable selection. Further drawbacks of the model are that one has to specify a specific distribution for the subject-specific random effects. The model parameters may not be identified without repeated measurements, that is, without having multiple choice observations t for the same chooser.

4.2 S-MNL and G-MNL models

Two models that are related to the MXL approach are the scale heterogeneity model (S-MNL) and the generalized multinomial logit model (G-MNL) (see Fiebig et al. 2010). Table 1 summarizes the definition of the utility functions and the parametrization of these models together with the standard MNL and the MXL model.

As the MXL model, both the S-MNL and the G-MNL models belong to the family of mixed-effects or mixture models. Compared to the MXL model, where the scale of the error term \(\lambda _i\) is normalized, the S-MNL model assumes that \(\lambda _i\) may vary across decision makers and follows a fixed distribution. For the estimation of parameters, the assumed distribution is essential. Scale heterogeneity is introduced by multiplying the utility weights \(\varvec{\alpha }\) with the subject-specific scaling factor \(\lambda _i\), which scales \(\varvec{\alpha }\) up or down proportionately across chooser i. It is also possible that chooser-specific covariates \({\varvec{w}}_i\) enter the scale so that the scale may vary across decision makers according to their characteristics. As the S-MNL involves a simpler parametrization and fewer coefficients, it describes the data more sparsely than the MXL model.

Table 1 Overview on Competing Approaches

The G-MNL model assumes both coefficient and scale heterogeneity. It nests the MXL model and the S-MNL model. The MXL model results when the scale parameter \(\lambda _i=1\). Setting \(Var(\eta _i)=0\) yields the S-MNL model. The parameter \(\kappa\)Footnote 5 in the G-MNL model indicates how the variance of the error term varies with the scale \(\lambda _i\). There are two model variants. When \(\kappa\) approaches 1, the standard deviation of \(\eta _i\) does not depend on the scaling of \(\varvec{\alpha }\) (i.e., \(\lambda _i\)). When \(\kappa\) approaches 0, the standard deviation of \(\eta _i\) is proportional to \(\lambda _i\). Again, covariates can be included to explain the variation.

4.3 Comparing modeling approaches

The GHMNL and the alternative models (MXL, S-MNL, G-MNL) can be derived from latent utilities. The main difference between both approaches lies in the motivation of heterogeneity in choice behavior. In the GHMNL model, the variances of the latent utilities are allowed to vary across the characteristics of the decision makers. The alternative models assume that each chooser has its own parameters, which follow a fixed distribution. The GHMNL model also allows parameters to vary across choosers, but it does so in a more systematic and restrictive way. Here, the effect parameters associated with the alternative-specific covariates are \(\varvec{\alpha }e^{{\varvec{w}}_i^T\varvec{\gamma }}\). Under this specification, the covariates contained in \({\varvec{w}}_i\) modify the effects. Depending on the value of \({\varvec{w}}_i\), the effect is strengthened or weakened. In addition, the same effect modification applies to all coefficients, which is a consequence of the derivation from the variances of the latent utilities. By contrast, the competing models allow for all sorts of parameter variation, including random variation and even a possible reversal of the sign of effects.

By allowing the effects to vary across decision makers, both approaches have in common that they assume a specific form of interaction. In the GHMNL model, an interaction between the variables \({\varvec{x}}_{ij}\) and \({\varvec{w}}_i\) is present because the linear term takes the form \({\varvec{x}}_{ij}^T\varvec{\delta }e^{{\varvec{w}}_i^T\varvec{\gamma }}\) (see Eq. 3). In the MXL model, for example, the interaction is included as the linear effect \({\varvec{z}}_{ijt}^T \varvec{\alpha }_i\) contains the term \({\varvec{z}}_{ijt}^T \Delta {\varvec{w}}_i\). In both cases, the interaction can be seen as an interaction generated by effect modification. The effect of \({\varvec{x}}_{ij}\) (or \({\varvec{z}}_{ijt}\)) is modified by \({\varvec{w}}_i\), the latter variable is a so-called effect modifier.

Both approaches can be embedded into the general framework of varying-coefficient models (see, e.g., Fan and Zhang 1999; Hastie and Tibshirani 1993; Park et al. 2015). The varying-coefficients framework helps to see that identifiability problems arise if the variables \({\varvec{z}}_{ijt}\) and \({\varvec{w}}_i\) are not distinct. Guided by theoretical expectations about heterogeneity, the researcher applying the competing models might consider different variables in \({\varvec{z}}_{ijt}\) and \({\varvec{w}}_i\). However, if the underlying theory does not provide such expectations, one faces the challenge of determining which explanatory variables are effect modifiers and which ones represent main effects. By contrast, the inclusion of the same set of variables in the location and the heterogeneity part does not cause any difficulties in the GHMNL model.

In sum, the benefits of the GHMNL model as compared to the competing models are:

  • Whereas the MXL, the S-MNL, and the G-MNL model can account for a rather general and unspecific form of heterogeneity without further motivation, the heterogeneity term in the GHMNL model can uncover specific behavioral tendencies. It provides an indicator of the degree of distinctness of choice and measures the strength of heterogeneity in choice behavior.

  • The GHMNL is much sparser in terms of the number of parameters involved and therefore avoids that estimates render unstable without careful variable selection.

  • It allows for a closed-form of the log-likelihood without the need to use simulation methods to obtain choice probabilities, which makes the GHMNL model computationally straightforward.

  • The researcher does not need to decide on a specific and appropriate distribution for the random parameters to approximate the underlying behavioral process.

  • The GHMNL model avoids identifiability problems and works without repeated measurements.

5 Application: proximity voting and heterogeneous electorates

The empirical application uses individual-level survey data on electoral choices to study heterogeneity in voting behavior. Our voter choice model follows the classical proximity model, where the main source of voter utility is the ideological proximity to the parties (see Davis et al. 1970; Downs 1957). Accordingly, proximity voting approaches expect that voter i casts a ballot for the party j that offers policy platforms closest to the voter’s most preferred positions on K different policy issues.

We draw on the 2017 German parliamentary election study (Roßteutscher et al. 2018) and analyze heterogeneity in voter choice for one of the six major German parties in 2017: the Christian-Democratic Parties (CDU/CSU), the Social-Democratic Party (SPD), the Liberal Party (FDP), the Greens, the Left, and the Alternative for Germany (AfD). The election study contains three policy issues (immigration, taxes, climate change) on which the respondents positioned themselves and the parties on eleven-point scales. Using voter-specific self-placements and perceptions of party placements, the choice-specific variables \(z_{ijk}\) in Eq. (3) contain the absolute proximity between each voter i and party j on each policy issue k. We use a separate data set for each issue, including only respondents with no missing values on the respective issue scales. The three data sets are quite large. The number of observations varies from 1807 to 1251.

The empirical application proceeds as follows. Based on previous research on heterogeneity in proximity voting, the first part examines one central source of heterogeneity: platform divergence. In the second part, we present the results of a fully-specified voter choice that also accounts for nonpolicy considerations in the voting calculus. The section closes with an empirical comparison of the GHMNL model and competing approaches. Section B in the Supplementary Information contains a detailed description of the measurement and coding of all considered variables. Section C reports the estimates of competing models.

5.1 Sources of heterogeneity in proximity voting

It has become accepted wisdom that not all voters follow issue considerations in the same way in making electoral decisions. The debate about heterogeneous electorates has a long tradition in the proximity voting literature. One example is the classic article on voter heterogeneity by Rivers (1988), stating that different subgroups of voters apply different choice criteria when voting. Another one is the issue public hypothesis by Converse (1964), postulating that the population can be divided into issue publics, each consisting of voters who intensively care about particular issues. Several concepts, conditions, or sources of heterogeneity have been proposed as to why we should expect systematic individual-level differences in the impact of issue considerations on voting. The concept of issue importance is the most frequently discussed source of heterogeneity (see, e.g., Edwards III et al. 1995; RePass 1971). If issues are considered individually salient to voters, then voters are expected to assign these issues a greater weight in the voting-decision process. A large research body has also argued that heterogeneity in issue voting is the result of differences in political sophistication or awareness (see, e.g., Gerber et al. 2015; Luskin 1987; MacDonald et al. 1995). In the following, we empirically examine another theoretical source of heterogeneity in spatial voting: platform divergence.

5.1.1 Platform divergence

A central condition that must be met so that issues determine voter choice is substantial divergence in offered party positions. Accordingly, voters who see clear differences between parties’ policy proposals are expected to rely more strongly on issue attitudes when casting their ballots than those perceiving similar party stands (e.g., Alvarez and Nagler 2004; Weßels and Schmitt 2008). We employ a subject-specific measure to examine whether platform divergence causes heterogeneity in the impact of issue considerations on party choice. We use the individually perceived range of party positions to identify the degree of platform divergence. The measure is constructed as follows: we first identified the two parties perceived to take the most extreme positions on both ends of the issue scales for each voter and issue. Then, we computed the absolute difference between these party positions. This results in eleven-point scales, where 0 indicates minimum platform divergence (i.e., all parties are perceived to offer the same position) and 10 maximum platform divergence (i.e., voters perceive the party positions to be spread across the entire issue scale).

5.1.2 Empirical models: platform divergence

We specify a separate model for each of the three policy issues to examine whether voters exhibit heterogeneous reactions to issues due to platform convergence:

$$\begin{aligned} V_{ij}= (\beta _{j0}+ \alpha \cdot z_{ij} )e^{{\varvec{w}}_i^T\varvec{\gamma }}. \end{aligned}$$
(5)

In each model, the location term in Eq. (5) contains the party-specific constants and issue proximity. To identify the constants, we use the CDU as the reference party. In the heterogeneity term, we consider the concept of platform divergence. Since the heterogeneity term affects the complete location term and the considered source of heterogeneity is specific to each issue, the issue-by-issue model specification allows us to assess whether varying levels of platform divergence cause heterogeneous effects. That is, platform divergence on issue k (immigration, tax, climate change), which enters the respective heterogeneity terms, is a chooser-specific variable that affects only the weight of issue k on choosing.

Table 2 Platform Divergence: GHMNL Model Estimates

Table 2 reports the model estimates. The first column gives the log odds, followed by standard errors and t-values. The parameters related to the issue proximities in the location term all take positive values and are statistically different from zero at the 5% significance level. In line with proximity voting approaches, the estimates indicate that the closer voters perceive the parties to their own positions on the issues, the higher the weight they assign to them when voting, ceteris paribus. Inspecting the estimates on platform divergence in the heterogeneity term reveals interesting choice behavior. In all three models, the coefficients related to the concept of platform divergence are negative and statistically different from zero at the 5% significance level. The negative parameters indicate heterogeneity in choice behavior. The estimates imply that voters who perceive substantial divergence in party positions are more heterogeneous in choosing one party, ceteris paribus.

5.2 Fully-specified voter choice model

Next, we present the results of a fully-specified voter choice model in the sense that the location term includes both types of covariates (choice-specific and chooser-specific), which is in contrast to the models presented in Table 2, where we consider only a choice-specific variable in the location term. The chooser-specific variables \(s_{im}\) are socioeconomic voter characteristics. They account for the importance of voter’s nonpolicy motivations in the voting calculus, which presents a central extension of the proximity voting model (e.g., Adams et al. 2005; Mauerer et al. 2015b). As nonpolicy factors \({\varvec{s}}_{i}\), we consider three dummy-coded voter attributes in the location term: religious denomination, gender, and a regional variable, indicating whether the respondent resides in former West or East Germany. In the heterogeneity term, we include gender and the regional variable to examine whether there are systematic gender or regional differences in choice behavior. We also examine whether the voter’s education causes heterogeneity in proximity voting. We note that one could maintain the variable platform divergence both in the location and heterogeneity terms (see Eq. 3) without causing identifiability problems. We opted to include three different chooser-specific variables to demonstrate how the model allows accounting for nonpolicy considerations in the voting calculus.

Table 3 Full Voter Choice Model: GHMNL Model Estimates

We focus on the tax issue and use the CDU again as the reference party. The voter choice model is based on 24 degrees of freedom: 1 issue proximities on taxes, \(6-1\) constants, and \((6-1) \times 3\) parameters related to voter attributes in the location term and 3 coefficients in the heterogeneity term. Table 3 reports the estimation results. In the location term, the interpretation of the coefficients refers to the CDU as this party is used as the reference alternative to identify the model. For example, in line with central social cleavage structures in Germany, Catholics tend to prefer the Christian-Democratic Party CDU compared to the left parties SPD and the Left, ceteris paribus.

Regarding the heterogeneity term, the coefficients are not specific to a particular party. The corresponding effects are global and do not relate to a reference alternative. All three parameters in the heterogeneity term are statistically different from zero at the 5% significance level. The coefficient related to education is negative. This result indicates that voters with a higher level of education tend to react more heterogeneously to the tax issue. The coefficient associated with the variable gender is also negative. The negative value indicates that females show more heterogeneity in voter choice than males, ceteris paribus. By contrast, the coefficient related to the regional variable is positive. This result suggests that voters residing in former West Germany have more distinct party choice preferences than those in East Germany, ceteris paribus.

5.3 Empirical model comparisons

In this section, we compare our empirical GHMNL models and the competing models, and we do that as follows. First, we contrast the GHMNL models with the MNL models based on Likelihood Ratio tests. Then, we compare the performance of the GHMNL models with all alternative models.

Table 4 reports the results of the Likelihood Ratio tests across the four empirical models (immigration issue, tax issue, climate change issue, full model: tax issue). The test statistics indicate that the GHMNL models yield significantly better fits to the data than the standard MNL models in all applications.

Table 4 Likelihood Ratio Tests: Standard MNL vs. GHMNL

Next, we compare the fit of the GHMNL models with the four alternative models (MNL, MXL, S-MNL, G-MNL). We use the AIC and BIC criteria to measure model performance.Footnote 6 For each of the last three models, we estimated two variants. The first models only account for subject-specific heterogeneity according to their respective parameterizations (see Table 1). The second models additionally include covariates to explain the heterogeneity. The covariates are the same as in Tables 2 and 3. Table 5 summarizes the performance measures. The values indicate substantial improvements in both information criteria for the GHMNL models. In all settings, the AIC and BIC values for the alternative models are larger than for the GHMNL models, showing that the GHMNL models yield better model fits.

Table 5 Empirical Model Comparisons

6 Conclusion

Categorical dependent variables are widespread in the social sciences. Applied social scientists studying nominal dependent variables as a choice among discrete alternatives frequently hypothesize heterogeneous effects. We presented, discussed, and applied a general multinomial logit model (GHMNL) to account for heterogeneity in choice behavior systematically. The statistical theory and empirical applications provided in this paper suggest that the GHMNL model offers exciting insights into heterogeneous effects and better captures differences in choice behavior than competing approaches.

The GHMNL model integrates a heterogeneity term into the systematic part of the utility function and accounts for behavioral choice tendencies without referring to latent variables. The heterogeneity term is linked to explanatory variables, indicates the degree of distinctiveness of choice or the impact of heterogeneity in choice behavior. As demonstrated, alternative approaches come with several drawbacks, such as a high number of parameters to be estimated, identifiability problems, or the need to specify a specific and appropriate distribution for the random effects. The GHMNL avoids these difficulties, is computationally straightforward, and has convenient properties.

We illustrated the approach by analyzing electoral choices, highlighting the important insights possible from systemically modeling heterogeneity in voting behavior. We also provided empirical comparisons with alternative models, demonstrating that the GHMNL models outperform the competing models in all applications. As many research questions in the social sciences involve theoretical expectations about heterogeneity, we see a wide range of applications. We hope this contribution fosters the application of this type of model in applied social science work.