Heterogeneity in general multinomial choice models

Different voters behave differently at the polls, different students make different university choices, or different countries choose different health care systems. Many research questions important to social scientists concern choice behavior, which involves dealing with nominal dependent variables. Drawing on the principle of maximum random utility, we propose applying a flexible and general heterogeneous multinomial logit model to study differences in choice behavior. The model systematically accounts for heterogeneity that classical models do not capture, indicates the strength of heterogeneity, and permits examining which explanatory variables cause heterogeneity. As the proposed approach allows incorporating theoretical expectations about heterogeneity into the analysis of nominal dependent variables, it can be applied to a wide range of research problems. Our empirical example uses individual-level survey data to demonstrate the benefits of the model in studying heterogeneity in electoral decisions.


Introduction
Many research questions in political science are categorical in nature.Regression models for categorical dependent variables are well-established and widely applied in the discipline to analyze research problems that involve two or more categories without an ordering structure (see, e.g., Agresti 2007;Long 1997;Tutz 2012).Statistical techniques belonging to this model class build an established methodological subfield in the discipline.Various aspects, features, and key methodological problems that arise when dealing with categorical dependent variables have been discussed to enhance and simplify their applications.The methodological contributions comprise several approaches to measure and visualize the goodness of fit (e.g., Esarey and Pierce 2012;Greenhill, Ward, and Sacks 2011;Hagle and Mitchell 1992;Herron 1999), address the separation problem (Cook, Niehaus, and Zuhlke 2018;Rainey 2016;Zorn 2005), or discuss the evaluation of interactive hypotheses (Berry, DeMeritt, and Esarey 2010) in such models.
Although many research questions in political science involve theoretical expectations about heterogeneous effects, there are little efforts in allowing heterogeneity in categorical dependent variables.The most prominent way to relax the homogeneity assumption for nominal-scaled dependent variables is the mixed logit model (MXL) (see, e.g., Greene, Hensher, and Rose 2006;McFadden and Train 2000), which is applied to study heterogeneity in government choice (Glasgow, Golder, and Golder 2012;Glasgow and Golder 2015) or voting behavior (Glasgow 2001).However, for researchers, the MXL model can be quite demanding to apply.For example, the researcher needs to decide on a distribution for the subject-specific heterogeneity to approximate the underlying behavioral process, and repeated measurements are necessary to identify the model.
In this paper, we propose a methodological approach that is very flexible and general in accounting for heterogeneity in nominal-scale dependent variables.Relying on the random utility maximization framework, we derive a multinomial logit model, called the General Heterogeneous Multinomial Logit Model (GHMNL), which allows for systematically studying heterogeneity in choice behavior.The proposed model builds on the standard multinomial logit model (MNL), also known as the conditional logit model (McFadden 1974;Yellott 1977), which is the most frequently applied statistical tool to study choices among discrete alternatives. 1s the MNL model, the GHMNL model is a classical discrete choice model that can handle both choice-specific and chooser-specific explanatory variables.In contrast to the MNL model, which ignores that the variance of the underlying latent traits can be chooser-specific, the GHMNL model accounts for such heterogeneous effects.The extension integrates a heterogeneity term into the systematic part of the utility function.The heterogeneity term is linked to explanatory variables and permits accounting for behavioral tendencies in choice behavior without referring to latent variables.It provides an indicator of the degree of distinctiveness of choice, indicates the strength of heterogeneity in choice behavior, and allows examining which explanatory variables cause heterogeneity.Therefore, the proposed model enables incorporating theoretical expectations about heterogeneity into the analysis of nominal dependent variables.As compared to the MXL model, the GHMNL model also comes with convenient properties and assumptions, such as its closed-form solution for evaluating the outcome probabilities.In addition, the GHMNL model frees the researcher from making distributional assumptions for the random parameters and is computationally straightforward.
We apply the model to electoral choices in multiparty elections and demonstrate its benefits in the study of heterogeneity in spatial voting.This empirical application has several merits.First, spatial voting models typically contain both types of explanatory variables, choice-specific (voter-party issue proximities) and chooser-specific (socioeconomic voter attributes) ones.Second, the literature on voter heterogeneity provides several theoretical concepts why not all voters assign the same importance to issue considerations, including, for instance, platform divergence or political sophistication (e.g., Campbell et al. 1960;Luskin 1987;RePass 1971).We will demonstrate how the proposed model allows incorporating such theoretical expectations into the empirical modeling.Although we focus on electoral choices and voter heterogeneity in our empirical application, we see great potential for applying the model to explore heterogeneous effects in all sub-disciplines, such as in the study of legislative behavior, public opinion and attitudes, international relations, or comparative politics.
Based on a brief review of the classical discrete choice model, we first derive our general heterogeneous multinomial choice model and outline how it extends the standard MNL model.Next, we investigate the differences between the general heterogeneous multinomial choice model and the MXL model.Then, we demonstrate the usefulness of our model by examining heterogeneity in spatial issue voting.

The Standard Multinomial Choice Model
The multinomial logit model (MNL) is the most common model to study choice behavior (see, e.g., Hensher, Rose, and Greene 2015;Louviere, Hensher, and Swait 2009;Train 2009).One key feature of the MNL model limits our insights into heterogeneity in choice behavior.It ignores that the variances of the underlying latent traits can vary across decision makers.A brief review of the MNL model will help to motivate the model we propose and its advantages.
In the following, Y i ∈ {1, . . ., J} will denote the dependent variable that consists of J unordered multiple categories for i ∈ {1, . . ., n} observations.Within the discrete choice framework, the categories represent J discrete, mutually exclusive, and finite alternatives of which decision makers choose one.The choice outcome can be a function of two types of explanatory variables: choice-specific and chooser-specific variables.The former are variables that are specific for each category and therefore take different values across both alternatives and choosers.They characterize the choice alternatives, such as price or distance in a classical mode choice situation.Let the choice-specific variables be denoted by z ijk , j ∈ {1, . . ., J}, k ∈ {1, . . ., K}. Chooser-specific variables contain characteristics of the decision makers, which vary over decision makers but are constant across the alternatives, such as age or gender.Let s im , m ∈ {1, . . ., M } denote the chooser-specific covariates.
A common way to motivate a choice model is to consider the utilities associated with the alternatives as latent variables.Let U ij denote an unobservable random utility that represents how attractive or appealing each alternative j ∈ {1, . . ., J} is for chooser i ∈ {1, . . ., n}.The decision makers are assumed to assess and compare each alternative and select the one that maximizes the random utility so that Y i is linked to the latent variables by the principle of maximum random utility, In a random utility framework, the utility is determined by U ij = V ij + ε ij , where V ij represents the systematic part of the utility, specified by explanatory variables and unknown parameters, whereas ε i1 , . . ., ε iJ are independent and identically distributed (i.i.d.) random variables with distribution function F (.).
The systematic part of the utility function is specified as a linear predictor where β 10 , . . ., β J0 are the alternative-specific constants.
α T = (α 1 , . . ., α K ) are the parameters 2 associated with the vector of choice-specific variables z T ij = (z ij1 , . . ., z ijK ), which indicate the weight decision makers attach to each attribute k of the alternatives.
By assuming that ε i1 , . . ., ε iJ are i.i.d.variables with distribution function F (x) = exp(− exp(−x)), which is known as the Gumbel or maximum extreme value distribution, one obtains the classical standard multinomial logit model (see McFadden 1974;Yellott 1977) j ∈ {1, . . ., J}.Since the chooser-specific variables s i are constant over the alternatives, not all of the corresponding coefficients are identifiable.The same applies to the constants.To identify the model, side constraints are needed.We will use the standard side constraint based on a reference alternative, whose coefficients are set to zero.We select the first alternative as reference and set β j0 = 0 and β T 1 = (0, . . ., 0).The standard MNL model presented in Equations ( 1) and (2) ignores that the variance of the underlying latent traits can be subject-specific so that the variances are not allowed to differ across decision makers.Previous research has shown that ignoring variance heterogeneity can yield biased estimates (see, e.g., Tutz 2020).

A General Heterogeneous Multinomial Choice Model
In this section, we derive a general multinomial choice model, called General Heterogeneous Multinomial Logit Model, in short GHMNL, that accounts for variance heterogeneity in choice behavior.The GHMNL model builds on the model in Tutz (2020), which is restricted to global covariates that do not depend on the outcome categories.By contrast, the approach we propose explicitly incorporates choice-specific explanatory variables, which lay at the heard of discrete choice models as attributes of the choice alternatives are the source of utility in discrete choice models.In addition, we outline in detail the interpretation of the novel heterogeneity term we incorporate into the utility function and the estimation methods.In the following, we begin by describing the specification of the utility functions and the choice probabilities in the GHMNL model.a parameter selection procedure to systematically reduce the resulting model complexity.

Utility Functions and Choice Probabilities
The GHMNL model extends the standard MNL model by adding a heterogeneity term to the systematic part of the utility function.For simplicity, let all the explanatory variables and the constants be collected in the alternative-specific vector x T ij = (1 T j , 0, . . ., z T ij , . . ., 0), where 1 j is the jth unit vector and 0 is a vector of zeros.Then, the utility functions take the form where δ T = (β 10 , . . ., β J0 , α T , β T 1 , . . ., β T J ).To derive the GHMNL model, we assume that the latent utilities are given more generally by where σ i is the standard deviation associated with decision maker i.
In the GHMNL model, the standard deviation is linked to explanatory variables by assuming σ i = e −w T i γ , where w i is a vector of chooser-specific covariates and γ is a vector of parameters.As a result, the utility V ij is specified as where s i is a vector of chooser-specific covariates, and z ij is a vector of alternativespecific covariates.As in the standard MNL model, the variables s i have alternative-specific effects and z ij global effects.
w T i = (w i1 , . . ., w iL ) is a vector of chooser-specific variables, which can be a subset of s i .It contains attributes of the decision makers that are supposed to cause heterogeneity in choice behavior.The corresponding parameter vector γ T = (γ 1 , . . ., γ L ) indicates the strength of heterogeneity in choosing one alternative.
The model distinguishes between two types of effects: a location effect and a heterogeneity effect.The term x T ij β j in Equation (3) represents the location effect.It is also present in the standard MNL model and determines which alternative the chooser tends to prefer.The novel term w T i γ represents the heterogeneity effect that determines the impact of heterogeneity in choice behavior.
As the standard MNL model, the GHMNL model has a closed-form solution for evaluating the choice probabilities so that the utility functions V ij are linked to the choice probabilities through a logistic response function, Alternatively, the relationship between the choice probabilities and the utility functions can be expressed in terms of odds:

Interpretation of the Heterogeneity Term
The essential novel term in the GHMNL model is the heterogeneity term.It is modeled by the factor e w T i γ and represents the (inverse) standard deviation of the latent variables.The heterogeneity term can be understood as representing variance heterogeneity.However, it also allows for an interpretation without reference to latent variables, which are always elements used to build a model but cannot be observed.The heterogeneity term represents a specific choice behavior that permits accounting for behavioral tendencies that are not linked to particular alternatives: When w T i γ → −∞, one obtains P (Y i = j|{x ij }, w i ) = 1/J.In this extreme case, all alternatives have the same choice probabilities.It implies that the decision maker chooses an alternative at random because none of the covariates can systematically explain the choice.The chooser shows maximal heterogeneity.
When w T i γ → ∞ and the condition x T ij β j = 0 holds at least for one j > 1, the probability for one of the j ∈ {1, . . ., J} alternatives approaches 1.In this case, the decision maker has a distinct preference, and shows minimal heterogeneity.Therefore, choosers with large w T i γ-values show less variability, they distinctly prefer specific alternatives.
Thus, the heterogeneity term w T i γ can be considered as an indicator of the degree of distinctness of choice or as a measure of heterogeneity in choice behavior.For small values of w T i γ, the difference between the choice probabilities becomes small.By contrast, the difference between a specific alternative and the remaining ones gets larger when w T i γ increases.As the heterogeneity term contains attributes of the decision makers, the model systematically accounts for heterogeneity in choice behavior across individuals.It allows examining which explanatory variables cause heterogeneous effects.For example, suppose w i denotes age and γ is positive.It would suggest that older decision makers have more clear cut preferences than younger ones.The former tend to prefer specific alternatives, while younger decision makers have less distinct preferences and show more heterogeneity in selecting one alternative.
Figure 1 illustrates the behavioral tendencies the GHMNL model can uncover.For a five-choice situation j ∈ {1, 2, 3, 4, 5}, it depicts the probabilities P (Y i = j) for a model with two covariates contained in the heterogeneity term w i , one binary and one quantitative normally distributed explanatory variable.For the binary covariate, we consider the effect at value w T i = (1, 0).The two panels show the probabilities for different parameter values (γ 1 ) in the heterogeneity term: panel (a) shows the effects for positive γ 1 -values, panel (b) for negative γ 1 -values.In both panels, the filled circles depict the probabilities that result when no heterogeneity is present, that is, when γ 1 = 0, resulting in the standard MNL model.
When inspecting the base probabilities obtained from the standard MNL model, we see that the decision maker prefers alternative 3, and to a lesser extent alternative 5. Panel (a) shows that this pattern becomes more pronounced for increasing γ 1 -values.Thus, the decision maker more distinctly prefers alternative 3 in the GHMNL model.By contrast, the pattern flattens for negative γ 1 -values, as illustrated in panel (b).This indicates that the decision maker tends to choose an alternative at random and shows substantial heterogeneity in selecting one of the five alternatives.
Using the first alternative as reference, the kernel of the log-likelihood of the model presented in Equation ( 4) is given by For the maximization of the log-likelihood, we make use of the first derivatives, also known as score functions.They take the form As approximation of the covariance cov( δ), we use the observed information −∂ 2 l( δ)/∂δ∂δ T .

Implementation in R
We have written an R function that allows the user to fit the GHMNL model.Section A in the Supporting Information describes the routines to implement the model.

The GHMNL Model Contrasted with the Mixed Logit Model
A model that has been used to study heterogeneity in decision behavior is the mixed logit model (MXL) (see Greene, Hensher, and Rose 2006;Hensher and Greene 2003;McFadden and Train 2000). 3The MXL model has been applied in transportation economics and econometrics, and also political science settled on the model to examine heterogeneity in government choice (Glasgow, Golder, and Golder 2012;Glasgow and Golder 2015) or voting behavior (Glasgow 2001).A brief review and discussion of the MXL model illustrate the limitations of this approach and the advantages of the model we propose to account for heterogeneity in choice behavior.

Mixed Logit Model Formulation
Following Greene, Hensher, and Rose (2006), the MXL model can be derived from latent utilities where the additional index t refers to the choice situation, and z ijt is the full vector of explanatory variables, including attributes of the alternatives, socioeconomic characteristics of the decision makers, and the choice task itself.
As compared to the standard MNL model, the crucial extension in the MXL model is that the parameter vector α i is subject-specific so that the effects are allowed to vary across decision makers i.By assuming that the subject-specific effects are random and in part determined by an additional vector of covariates w i , the model becomes a mixed-effects model.The subject-specific effects are assumed to take the form where ∆ is a matrix of coefficients associated with the covariate vector w i , v i is a random vector of uncorrelated random variables with known variances, Σ 1/2 is a covariance matrix that determines the variance structure of the random term.
Maximum simulated likelihood estimates are obtained by maximizing the loglikelihood with respect to all the unknown parameters (see also Train 2009).
By allowing parameters to vary randomly over decision makers instead of assuming that they are the same for every chooser, the MXL model is very flexible and can account for a rather general form of heterogeneity.However, this flexibility comes with the cost of a large number of parameters, which might render estimates unstable without careful variable selection.Further drawbacks of the model are that one has to specify a specific distribution for the subject-specific random effects and the model parameters may not be identified without repeated measurements, that is, without having varying choice situations for the same chooser.

Comparing Modeling Approaches
Both the GHMNL model and the MXL model can be derived from latent utilities.The main difference between both approaches lies in the motivation of heterogeneity in choice behavior.In the proposed GHMNL model, the variances of the latent utilities are allowed to vary across decision makers.By contrast, the MXL model permits the parameters to vary across individual choosers; however, without further motivation.While the GHMNL model also allows parameters to vary across choosers, it does so in a more restrictive and systematic way.Here, the effect parameters associated with the alternative-specific covariates are αe w T i γ .Under this specification, the covariates contained in w i modify the effects.Depending on the value of w i , the effect is strengthened or weakened.In addition, the same effect modification applies to all coefficients, which is a consequence of the derivation from the variances of the latent utilities.By contrast, the MXL model allows for all sorts of parameter variation, including random variation and even a possible reversal of the sign of effects.
By allowing the effects to vary across decision makers, both models have in common that they assume a specific form of interaction.In the GHMNL model, an interaction between the variables x ij and w i is present because the linear term takes the form x T ij δe w T i γ (see Equation 3).In the MXL model, the interaction is included as the linear effect z T ijt α i contains the term z T ijt ∆w i .In both cases, the interaction can be seen as an interaction generated by effect modification.The effect of x ij (or z ijt ) is modified by w i , the latter variable is a so-called effect modifier.
Both models can be embedded into the general framework of varying-coefficient models (see, e.g., Fan and Zhang 1999;Hastie and Tibshirani 1993;Park et al. 2015).Although the connection between the MXL model and varying-coefficient models seems not to have been used before, the varying-coefficients framework helps to see that identifiability problems arise if the variables z ijt and w i are not distinct.Guided by theoretical expectations about heterogeneity, the researcher applying the MXL model might consider different variables in z ijt and w i .However, if the underlying theory does not provide deriving such expectations, one faces the challenge of determining which explanatory variables are effect modifiers and which ones represent main effects.By contrast, the inclusion of the same set of variables in the location and the heterogeneity part of the model does not cause any difficulties in the proposed GHMNL model.
In sum, the benefits of the GHMNL model as compared to the MXL model are: • Whereas the MXL model can account for a rather general and unspecific form of heterogeneity without further motivation, the heterogeneity term in the GHMNL model can uncover specific behavioral tendencies.It provides an indicator of the degree of distinctness of choice and measures the strength of heterogeneity in choice behavior.
• The GHMNL is much sparser in terms of the number of parameters involved and therefore avoids that estimates render unstable without careful variable selection.
• It allows deriving a closed-form of the log-likelihood without the need to use simulation methods to obtain choice probabilities, which makes the GHMNL model computationally straightforward.
• The researcher does not need to decide on a specific and appropriate distribution for the random parameters to approximate the underlying behavioral process.
• The GHMNL model avoids identifiability problems and works without repeated measurements.

Application: Spatial Voting and Heterogeneous Electorates
The empirical application uses survey data on electoral choices in multiparty elections to study heterogeneity in spatial voting behavior (Davis, Hinich, and Ordeshook 1970;Downs 1957;Enelow and Hinich 1984).Numerous studies have demonstrated that voters evaluate where parties or candidates stand on controversial issues when casting their ballots (see recently, e.g., Ansolabehere and Puy 2018; Jessee 2010; Mauerer, Thurner, and Debus 2015).The expectation that not all voters behave in the same way but instead differ in their reliance on issues also has a long tradition in the voting literature.One example is the classic article on voter heterogeneity by Rivers (1988), stating that different subgroups of voters apply different choice criteria when voting.
Another one is the issue public hypothesis by Converse (1964), postulating that the population can be divided into issue publics, each consisting of voters who intensively care about particular issues.

Data
We draw on the 2017 German parliamentary election study (Roßteutscher et al. 2018) and analyze heterogeneity in voter choice for one of the six major German parties in 2017: the Christian-Democratic Parties (CDU/CSU), the Social-Democratic Party (SPD), the Liberal Party (FDP), the Greens, the Left, and the Alternative for Germany (AfD).Section B in the Supporting Information contains a detailed description of the measurement and coding of all variables considered in the empirical application.

Operationalization of Spatial Proximities
In the tradition of spatial voting approaches, our voter choice model follows the classical proximity model, where the main source of voter utility is the ideological proximity to the parties.Based on a simple linear voter-party proximity specification, we expect that voter i casts a ballot for the party j that offers policy platforms closest to the voter's most preferred positions on K different policy issues.The 2017 German national election study contains three policy issues (immigration, taxes, climate change) on which the respondents positioned themselves and the parties on eleven-point scales.Using voter-specific self-placements and perceptions of party placements, the choice-specific variables z ijk in Equation ( 3) contain the absolute proximity between each voter i and party j on each policy issue k.
The empirical application proceeds as follows.Based on previous research on heterogeneity in spatial voting, the first part examines three sources of heterogeneity: issue importance, platform divergence, and political sophistication.In the second part, we present the results of a fully-specified voter choice that also accounts for nonpolicy considerations in the voting calculus.

Sources of Heterogeneity in Spatial Voting
It has become accepted wisdom that not all voters follow spatial considerations in the same way in making electoral decisions.The debate about heterogeneous electorates has a long tradition in the spatial voting literature.The homogeneity assumption, implying that voters with identical observed characteristics and issue preferences care equally about issues, has already been questioned in early studies of electoral behavior (see, e.g., Campbell et al. 1960;Luskin 1987;Meier and Campbell 1979;Popkin 1991;RePass 1971).Several concepts, conditions, or sources of heterogeneity have been proposed as to why we should expect systematic individual-level differences in the impact of issue considerations on voting.We empirically examine three theoretical sources of heterogeneity in spatial voting.

Issue Importance
The first one is the concept of issue importance.It is the most frequently discussed source of heterogeneity in spatial voting (see, e.g., Edwards III, Mitchell, and Welch 1995;Epstein and Segal 2000;Gomez and Wilson 2001;Rabinowitz, Prothro, and Jacoby 1982).If issues are considered as individually salient to voters, then voters are expected to assign these issues a greater weight in the voting-decision process.We employ a typical measure to assess whether the concept of issue importance provides an explanation to why voters differ in their reliance on issues when voting: the selfreported importance of the three policy issues on five-point scales.

Platform Divergence
Another central condition that must be met so that issues determine voter choice is substantial divergence in offered party positions.Accordingly, voters who see clear differences between parties' policy proposals are expected to rely more strongly on issue attitudes when casting their ballots than those perceiving similar party stands (e.g., Alvarez and Nagler 2004;Weßels and Schmitt 2008).To examine whether platform divergence on an issue causes heterogeneity in the impact of issue considerations on party choice, we employ a subjective measure.We use the individually perceived range of party positions to identify the degree of platform divergence.The measure is constructed as follows: For each voter and issue, we first identified the two parties that are perceived to take the most extreme positions on both ends of the issue scales.Then, we computed the absolute difference between these party positions.This results in eleven-point scales, where 0 indicates minimum platform divergence (i.e., all parties are perceived to offer the same position) and 10 maximum platform divergence (i.e., voters perceive the party positions be spread the entire original eleven-point issue scale).
We specify a separate model for each of the three policy issues to examine whether voters exhibit heterogeneous reactions to issues due to issue importance and platform convergence.In each model, the location term in Equation ( 3) contains the partyspecific constants and spatial proximity.To identify the constants, we use the CDU as the reference party.In the heterogeneity term, we consider the concepts of issue importance and platform divergence.Since the heterogeneity term affects the complete location term, and both sources of heterogeneity in spatial voting are specific to each policy issue, the issue-by-issue model specification allows us to assess whether varying levels of issue importance and platform divergence cause heterogeneous effects.
Table 1 reports the results.The first column gives the log odds, followed by standard errors and t-values.The parameters related to the issue proximities in the location term all take positive values and are statistically different from zero at the 5% significance level.In line with spatial voting approaches, the estimates indicate that the closer voters perceive the parties to their ideal points on the issues, the higher the weight they assign to them when voting, ceteris paribus.The issues of immigration and climate change exhibit the most substantial impact on party choice.
Inspecting the estimates on issue importance and platform divergence in the heterogeneity term reveals interesting choice behavior.In all three models, the coefficients related to the concept of issue importance are positive.Whereas the parameter in the immigration-issue model does not reach conventional statistical significance levels, the parameters in the remaining models do (10% significance level).The positive estimates suggest that those voters who consider the tax or the climate change issue individually salient have more distinct party choice preferences, ceteris paribus.In line with previous research (e.g., Edwards III, Mitchell, and Welch 1995;Rabinowitz, Prothro, and Jacoby 1982), our model estimates indicate that voters for whom the issues are personally important distinctly prefer specific parties and assign the issues a greater weight in the voting-decision process.By contrast, all coefficients related to the concept of platform divergence take negative values.The negative parameters, which are all statistically different from zero at the 5% significance level, indicate heterogeneity in choice behavior.In accord with previous studies (e.g., Alvarez and Nagler 2004;Weßels and Schmitt 2008), the estimates imply that voters who perceive substantial divergence in party positions are more heterogeneous in choosing one party.

Political Sophistication
A large research body has also argued that heterogeneity in issue voting is the result of differences in political sophistication or awareness (see, e.g., Carmines and Stimson Source: 2017 German election study (Roßteutscher et al. 2018).N = 910.Note: Dependent variable is voter choice.CDU is used as reference party to identify the constants in the location term.
1980; Delli Carpini and Keeter 1993;Gerber, Nicolet, and Sciarini 2015;Luskin 1987;MacDonald, and Listhaug 1995;Palfrey and Poole 1987).To identify voter segments that might be more sensitive toward issues due to political sophistication, we consider three typical operationalizations of this concept: the stated strength of political interest, objective political knowledge, and education.The level of political interest is measured by relying on voters' self-reports on a five-point scale.Political knowledge is measured using factual knowledge questions with right or wrong answers.Based on the respondents' replies to seven questions, we generated an additive index.We assigned a value of one for each correct answer; wrong and "don't know" responses give a value of zero.Education is a binary variable that takes the value of 1 when the respondent has a higher education entrance qualification and 0 otherwise.Table 2 reports the estimation results.Issue by issue, we specify a model that includes the three measures of political sophistication in the heterogeneity term.The location term again contains constants and spatial proximity.For the immigrationissue model, the coefficient related to political knowledge is negative and statistically different from zero at the 5% significance level.This result indicates that those voters who have a higher level of political knowledge tend to react more heterogeneously to the immigration issue.Whereas none of the political sophistication measures explain heterogeneous reactions on the climate change issue, the parameter related to education in the tax-issue model is different from zero at the 5% significance level.Again, the parameter is negative, suggesting that voters with higher education show heterogeneity in voter choice.

Fully-Specified Voter Choice Model
Next, we present the results of a fully-specified voter choice model.In addition to spatial proximities, the model also contains chooser-specific variables s im in the location term.These are socioeconomic voter characteristics.They account for the importance of voter's nonpolicy motivations in the voting calculus, which presents a central extension of the spatial voting model (see, e.g., Adams, Merrill III, and Grofman 2005;Merrill III and Adams 2001).As nonpolicy factors s i , we consider four dummy-coded voter attributes in the location term: worker, religious denomination, gender, and a regional variable, indicating whether the respondent resides in former West or East Germany.In the heterogeneity term, we include gender and the regional variable to examine whether there are systematic gender or regional differences in choice behavior.
Again, we use the CDU as the reference party.The voter choice model is based on 30 degrees of freedom: 3 issue proximities, 6 − 1 constants, and (6 − 1) × 4 parameters related to voter attributes in the location term, and 2 coefficients in the heterogeneity term.The maximum likelihood point estimates and associated standard errors  for the issue proximities (immigration, tax, climate change) are as follows: αT = (α 1 , α2 , α3 ) = (0.196, 0.202, 0.245); Σ 1/2 T α = (σ 1 , σ2 , σ3 ) = (0.026, 0.031, 0.034).Table 3 reports the estimates for the voter attributes in the location and heterogeneity term.In the location term, the interpretation of the coefficients refers to the CDU as this party is used as the reference alternative to identify the model.For example, in line with central social cleavage structures in Germany, Catholics tend to prefer the Christian-Democratic Party CDU compared to the left parties SPD and the Left, ceteris paribus.
Regarding the heterogeneity term, the coefficients are not specific to a particular party.The corresponding effects are global and do not relate to a reference alternative.The coefficient associated with the gender variable is negative and statistically different from zero at the 5% significance level.The negative value indicates that females show more heterogeneity in voter choice than males, ceteris paribus.By contrast, the coefficient related to the regional variable is positive and statistically different from zero at the 10% significance level.This result suggests voters living in former West Germany have more distinct party choice preferences than those residing in East Germany, ceteris paribus.

Conclusion
Categorical dependent variables are widespread in political science, and the discipline has contributed enormously to methods for the analysis of nominal responses.Political scientists studying nominal-scaled dependent variables as a choice among discrete alternatives frequently hypothesize heterogeneous effects.Being guided by recommendations from the political methodology literature, current practice to study heterogeneity in choice behavior is to allow the parameters associated with choice-specific attributes to vary randomly across decision makers.In particular, political science settled on the mixed logit model.As we have demonstrated, the mixed logit model, however, comes with several drawbacks, such as a high number of parameters to be estimated, identifiability problems, or the need to specify a specific and appropriate distribution for the random effects.
Building on the standard MNL model, a general multinomial logit model for the systematic study of heterogeneity in choice behavior is proposed, which avoids these difficulties, is computationally straightforward, and comes with convenient properties.The proposed GHMNL model integrates a heterogeneity term into the systematic part of the utility function and accounts for behavioral choice tendencies without referring to latent variables.The heterogeneity term is linked to explanatory variables, indicates the degree of distinctiveness of choice or the impact of heterogeneity in choice behavior.
Drawing on theoretical sources of heterogeneity in spatial voting (issue importance, platform divergence, and political sophistication), we have demonstrated how the GHMNL model allows incorporating theoretical expectations into the empirical modeling and how it can improve our understanding of heterogeneous electorates, which remains to be an important topic in electoral research (e.g., Basinger and Lavine 2005;Federico and Hunt 2013;Gerber, Nicolet, and Sciarini 2015;Peterson 2005;Singh and Roy 2014).For example, our empirical estimates suggest that voters who consider the issues are personally important distinctly prefer specific parties and assign the issues a greater weight in the voting-decision process.By contrast, platform divergence induces heterogeneity in spatial voting behavior.Depending on the measure and the issue under consideration, our results also indicate that the higher the level of political sophistication, the more voters tend to exhibit heterogeneous reactions to issues.As many research questions in political science involve theoretical expectations about heterogeneity, we see a range of applications in all political science sub-disciplines.
or wrong answers.Based on the respondents' replies to seven questions, we generated an additive index in which for each answer, a value of one is assigned, whereas wrong and "don't know" answers give a value of zero.The index is based on two questions about the German electoral system (survey questions: "Which one of the two votes is decisive for the relative strengths of the parties in the Bundestag?";"What is the percentage of the second vote a party needs to be able to send delegates to the Bundestag definitely?") and two questions regarding the budget deficit and the unemployment rate.In addition, the respondents were confronted with pictures showing three politicians and were asked to state the party each politician belongs to.These politicians are Martin Schulz (SPD), Katrin Goering-Eckardt (Greens), and Christian Lindner (FDP).The answers are aggregated by counting the correct responses, yielding an eight-categorical variable (0 none correct, 7 all correct).Education is a dichotomous variable that takes the value of 1 when the respondent has a higher education entrance qualification (i.e., a higher-school certificate with university admission) and 0 otherwise.

FIGURE 1 :
FIGURE 1: Illustration of the Heterogeneity Term in the GHMNL Model

TABLE 1 :
GHMNL Model Estimates, Issue Importance Platform Divergence

TABLE 2 :
GHMNL Model Estimates, Political Sophistication (Roßteutscher et al. 2018)on study(Roßteutscher et al. 2018).N = 910.Note: Dependent variable is voter choice.CDU is used as reference party to identify the constants in the location term.