Comparative study on mode split discrete choice models

Discrete choice model acts as one of the most important tools for studies involving mode split in the context of transport demand forecast. As different types of discrete choice models display their merits and restrictions diversely, how to properly select the specific type among discrete choice models for realistic application still remains to be a tough problem. In this article, five typical discrete choice models for transport mode split are, respectively, discussed, which includes multinomial logit model, nested logit model (NL), heteroscedastic extreme value model, multinominal probit model and mixed multinomial logit model (MMNL). The theoretical basis and application attributes of these five models are especially analysed with great attention, and they are also applied to a realistic intercity case of mode split forecast, which results indicating that NL model does well in accommodating similarity and heterogeneity across alternatives, while MMNL model serves as the most effective method for mode choice prediction since it shows the highest reliability with the least significant prediction errors and even outperforms the other four models in solving the heterogeneity and similarity problems. This study indicates that conclusions derived from a single discrete choice model are not reliable, and it is better to choose the proper model based on its characteristics.


Introduction
A good understanding on the travellers' mode choice behaviours serves as one of the prerequisites for passenger transport policy-making. Being important tools of travellers' mode choice behaviour studies, discrete choice models are widely used both in theory and practice within transportation planning field.
In the context of transportation, travellers tend to choose the transport mode which fits them best, and in the meanwhile, transportation means tend to 'choose' the most capable travellers as well. As a result, the decision making process of travellers' mode choice is influenced by the attributes of transportation means as well as the internal factors (individual attributes) of the travellers themselves, such as economic capability, personal preference, etc. Since several factors affect the description and the prediction accuracy of the mode choice behaviours, selecting a suitable discrete choice model with good interpretation ability appears to be very critical.
The discrete choice model used in early times is the multinomial logit model (MNL). The rigid assumption that the utility random terms of alternative parts satisfy independent identical distribution (IID) conditions makes MNL simple in calculation as well as provides MNL with independence of irrelevant alternatives (IIA) property, which weakens its ability of recurring the actual choice behaviours [1].
The IIA property of MNL model stems from the rigid assumption that the utility random terms of alternatives parts have totally independent structures. To relax the MNL model's IIA property, and in the meanwhile, keep its calculation convenience, the researchers gradually relax the restrictions on the assumption of utility random terms structures and successively explore and develop several MNL-based models, which are more capable of recurring decision-makers' choice behaviours, such as nested logit model (NL) [2], generalised extreme value model [3,4], heteroscedastic extreme value model (HEV) [5], mixed multinomial logit model (MMNL) [6], etc. Many scholars [7][8][9][10] have analysed the travel mode of urban commuters. Schmidt and Strauss [11] and Boskin [12] analysed occupational choice among multiple alternatives. Rossi and Allenby [13] studied consumer brand choices in a repeated choice (panel data) model. Train [14] studied the choice of electricity supplier by a sample of California electricity customers. Hensher et al. [15] analysed choices of automobile models by a sample of consumers that offered a hypothetical menu of features. In each of these cases, there is a single decision among two or more alternatives. Faced with so many discrete choice models, how to choose an appropriate one to simulate travel behavior is still a rather difficult problem. In this article, we focused on transport mode choice behaviour modelling and made a comparison between five typical discrete choice models and discussed the rules for choosing the optimal discrete choice model.

MNL model and its application restrictions
Stemming from psychology and economics, discrete choice theory has become a mainstream since 1980s. Most discrete choice theory studies are grounded on the utility functions, which are expressed as where U nj denotes the utility that the decision-maker n associates with alternative j; V nj denotes measurable utility; e nj is the error term (immeasurable utility), c 0 n is the parameter vector of decision-maker n; Z nj , is the observed variable; x nj is the individual attributes vector of decisionmaker n, y nj is attributes vector of alternative j; n 2 N, N denotes the amount of decision-makers; j 2 J; J denotes the amount of alternatives.
In the modelling process of individual choice models, assume that the consumers are rational choosers; therefore, the probability that individual n selects programme i is Equation (2) assumes that the error term e nj satisfies independent and identically distributed (IID) assumption and subjects to type I extreme value distribution: where g nj and k denote the location parameter and dispersion parameter, respectively. The variance of this distribution is p 2 =6k 2 . Let k = l and g nj = 0, the MNL choice probability model takes the form of In terms of measurable utility, it is usually defined as linear-in-parameter specification, i.e. V ni ¼ b n x ni þ ay ni : Thus, the probability that individual n chooses alternative i can be expressed as: The following equation is produced by a simple derivation from Eq. (5) and shows the IIA property of multinomial logit model: Equation (6) means that among all the alternative sets, the ratio of choice probabilities of any two of the decisionmaker's alternatives only associates with the utilities of these two alternatives but has nothing to do with the utilities of any other alternatives.
On the other hand, if we assume that the decision individual n makes is affected by his personal attributes x ni , then MNL model in Eq. (5) can be rewritten as: Equation (7) is the conditional logit model. The feature of this model lies in that all the decisions are merely dependent on the attributes of the chosen alternatives y j À Á but irrelevant to decision-makers' attributes X n ð Þ. No matter how many sets of alternatives exist, only one group of parameters needs to be estimated because of the assumption that the influences on individual utility of every choice set are identical. If there are many alternatives, then the conditional logit model can be Mode split discrete choice models 267 served as a better choice for modelling and is also simpler to be compared with other models. In practice, multinominal logit model (MNL) has to satisfy IIA property, which means that, all the alternatives are independent with each other and the ratio of choice probabilities only associates with the utilities of the given alternatives and is irrelevant to the utilities of any other alternatives. IIA property is generated from IID assumption of error term (The error term satisfies independent and identical type I extreme value distribution). IID constrains cannot be guaranteed if heterogeneity and similarity problems remain, which may result in the wrong statistic inference.
Except for IIA property restrictions, MNL model has two shortcomings in application. One is its incapability of handling with random preference discrepancies, the other is its incapability of finding out the correlative factors with panel data. Two merits of MNL are that it has a close-formed structure, and its parameters can be easily estimated.
Some scholars hold that IIA property is totally reasonable in terms of model theory. McFadden and Domencich [16] found that although the IIA restrictions result in a value loss of those studies on MNL, the deviations caused by IIA are owning to study objects, not the theory itself. They believed that IIA property is tenable in homogeneous populations. Ben-Akiva and Lerman [17] further pointed out that although IIA does not fit the whole populations, it does exist in homogeneous populations, such that the reliability of IIA property relies on whether the populations significantly show their heterogeneity. MNL model has the best performance to explain discrete choice behavior if the populations' heterogeneity is not significant.

Improvement and development of discrete choice models
Heterogeneity and similarity problems are directly related to the assumption of error terms in model. Observed samples and alternatives will cause the error terms. Therefore, we can consider heterogeneity and similarity problems from the perspective of these two factors.
Viewing the heterogeneity issues from the perspective of samples means that the decision-maker holds different viewpoints towards specific transport modes in his mode choice behaviour, which can be called divergent tastes in individuals, or individual heterogeneity. Individual heterogeneity mainly comes from preference heterogeneity and response heterogeneity. The former includes the observed and unobserved effects that individual socioeconomic characteristics put on transport mode choice, and the latter refers to individual evaluation discrepancy on level of service across transport modes, which brings observed and unobserved effects as well.
Similarity across alternatives refers to the situation that similarity issues arise because of spatial or time autocorrelation during the survey process of samples (e.g. repeatedly investigations on the same respondent or samples across sampling objects are self-correlated because of adjacent zone effects, etc.) [5]. Heterogeneity and similarity are prone to biassed parameter evaluation or even overestimation on the effects of some specific factors.
When it comes to examining heterogeneity and similarity attributes from the perspective of alternatives, we need to consider whether alternatives share IIA property. If the alternatives appear to be dependent or heterogeneous, there may be similarity and heterogeneity problems among alternatives, which are called alternatives similarity and alternatives heterogeneity, respectively.
3.1 NL model NL model introduces the concept of nest layers, in which similar alternatives are put in the same nest layer. Assuming that, the error terms across alternatives in the same nest layer are independently and identically type I extreme value distributed, and the error terms across alternatives which belong to different nest layers are different. Here, we take two-layer nest structure as an example. Suppose that, there are M nests in the model and J m alternatives in the mth nest layer. Alternative i is one of the alternatives in the mth nest layer, and thus, P i j is the probability that decision-maker chooses alternative i: where P j m represents the marginal probability that decisionmaker chooses nest layer m; denotes the conditional probability that alternative i of nest layer m is chosen, I m is inclusive value, which means comprehensive utility of nest layer m; ; l m is the parameter of inclusive value, which explains the similarity degree of the alternatives in nests. The estimated inclusive value parameter must subject to 0 l m 1; i.e. the principle of utility maximisation. When l m ¼ 1; NL model is simplified as MNL model. The more l m approaches to zero, the higher the correlation degree among alternatives is.
In order to make all the alternatives independent with each other, NL model imposes all the correlated alternatives on the same independent nest layer and makes use of inclusive value to represent the common utility of these alternatives, and then builds models with other independent alternatives. The NL model is good at solving the similarity problems among alternatives. However, its disadvantages are also evident. First, it has to be assigned a fixed nest layer structure; Second, it is not able to accommodate the situation that all the error terms correlate with each other at the same time; third, decision procedures should be supposed to satisfy the continuity condition; Forth, each alternative is restricted to appear in only one nest.

HEV model
HEV model is put forward by Bhat [18]. This model allows that alternative e nj satisfies independent non-uniform type I extreme value distribution, which means that each alternative has its own variance, and the variances may be same or not, but the covariance of different alternatives is zero. The probability that individual n chooses alternative i is where W Á ð Þ and w Á ð Þ are cumulative distribution function and probability density function of type I extreme value distribution, respectively; C is the choice set; var e i ð Þ ¼ p 2 Á h 2 i 6; w ¼ e ni =h i ; h i represents heterogeneity parameter of alternatives, and it reflects the degree of uncertain factors, namely the weight of uncertain factors. Different alternatives have different effects on the whole utility. Increasing h i will decrease the unit variation that observed utility brings to choice probability.
HEV model allows variance discrepancy (the variance can be identical to each other or not) among error terms across alternatives by introducing scale factor into the expression of error terms, and the covariance across different alternatives is zero. HEV model is only able to handle with the heterogeneity problems among alternatives. During the model application process, large deviation may happen if similarity problems among alternatives simultaneously exist.

MNP model
Daganzo [6] proposed that the MNP model can be derived if assuming that random error terms follow normal distribution in Eq. (2). MNP model allows the situation that not all the random error terms are independent and identical with each other. It is the most generalised model as it fully reflects the realistic choice behaviours. The MNP model can be expressed as: where I Á ð Þ is the index function, I Á ð Þ ¼ 1 means that the decision-maker has the one with max utility chosen, otherwise is zero; /ðe n Þ obeys multivariate normal distribution, expectation E e n ð Þ ¼ 0; and X denote covariance matrix. Equation (10) has very complex integral components. When there are more than four alternatives within the choice set, it is difficult to estimate parameters.
MNP model is free of the MNL model's three restrictions. It is capable of handling with heteroscedasticity problems, defining error structures of any types as well as dealing with error terms related to time series by using panel data. The only limitation of MNP model is that all the error terms of utility functions must normally distribute. In most cases, assuming that the random terms satisfy normal distribution seems to be proper, but in some cases, this assumption may lead to unconventional prediction results. The most well-known example is about the price variable coefficient, the density distribution of which ought to only appear in the side of distribution greater than zero. Besides, MNP model appears to be much complicated in finding its parameters.

MMNL model
MMNL model based upon the assumption that decisionmakers show different preferences. It assumes that marginal utility obeys Gumbel distribution, and the probability of MMNL has to be obtained by integrating the parameters of MNL model. The probability that decision-maker n chooses alternative i is: where L ni ðcÞ is the multinomial logit choice probability along with specific parameter vectors, gðcÞ ¼ gðc h j Þ represents probability density function, h denotes deep parameter vector, which include mean value, Mode split discrete choice models 269 variance or covariance, etc., and V ni ðcÞ is measurable utility. If the utility is linearly combined, i.e. V ni ðcÞ ¼ c 0 Z ni ; then the choice probability of MMNL can be expressed as below: The choice probability of MMNL model relies on the distribution form of c. g c ð Þ is normally distributed or log normally distributed [19].
In the utility function of MMNL model, except for observed non-random terms and error terms, unobserved random terms are also involved. The correlation, heterogeneity and individual preference heterogeneity of alternatives need to be considered with these random terms. MMNL is able to deal with heterogeneity and similarity problems simultaneously. Thus, the assumption of MMNL model is most practical and performs best in interpreting preference behaviours.
If the utility of MMNL model is set as linear combination, then MNL turns out to be the special case of MMNL model. The merits of MMNL model lie in that preference discrepancy among individuals is allowed, the correlation among different trips of the same consumer can be described, and it can approach to the estimated results of any other random utility models. The demerit of MMNL model is its complex computing process.
Note that, the parameter estimation methods of MNL, NL, HEV, MMNL and MNP models are not totally the same. Generally, the parameters of MNL, NL and HEV models can be identified and obtained by the maximum likelihood estimation method while the unknown parameters of MMNL model and MNP model can only be estimated by the maximum simulated likelihood method.
By analysing discrete choice model on error terms' assumptions from two dimensions, which are independent and identical, five introduced models can be classified as shown in Table 1. Each model has its own merits and demerits, and in application, desirable results can be obtained if they are well combined.

Illustrations
The source data used for the comparison of these five discrete choice models were drawn from the questionnaire survey on transport mode (car, train, bus and air) choice behaviours from 210 commuters between Sydney and Melbourne [12]. The main variables include: TTME Terminal time, The TTME for car is zero (min) INVT In-vehicle time (min) GC Generalised cost

HINC Household income
The utility function to be estimated is constructed as where for each j; e ij has the same independent, type 1 extreme value distribution, which has standard deviation p 2 =6: d i;m is the binary variable which indicates if individual i made choice m; m ¼ air; train; bus, car: a m is an estimate parameter for mode m.
We take the car mode as the basic alternative to construct MNL model and estimate parameters by maximum likelihood estimation method. The parameter values of universal set and restricted set are shown in Table 2, where restricted set means that the set excludes the air mode.
The calculation results show that as Hausman test value HM = 33.3363 is greater than v 2 0:05 ¼ 9:488; which indicates that IIA assumption of the MNL model is not proper, and there exist heterogeneity and similarity problems. The train, bus and car modes can all be used as the standard basic group, except for the air mode that will result in nonidentified parameters, The tree-like NL model is shown in Fig. 1  It is possible that heterogeneity exists across alternatives. Here, we try to introduce scale parameters into the error terms across alternatives to make error terms unequal and alternatives heterogeneous. At least, one of the alternative scale parameter has to be fixed in HEV model. For the convenience of model comparison, car mode is set to be the basic alternative, and its scale parameter is assumed to be 1. The estimated scale parameter values for other three modes are: a air ¼ 0:2485; a train ¼ 0:2595; a bus ¼ 0:6065: In Bhat's empirical study [5], HEV model has better interpretation ability over NL model and MNL model. The example below (Table 3) indicates that HEV model does out-perform MNL model on interpretation of choice behaviours but this doesn't mean it is better than NL model.
The parameter value of car mode is set as zero in the parameter estimation of MNP model, and the results are shown in Table 3. The MNP model does not enhance the interpretation ability of choice behaviours. This is because, some error terms of utility function do not distribute normally.
The MMNL model was built on the basis of the MNL model under the universal set mentioned above. The parameters estimated by maximum simulated likelihood estimation method are listed in Table 4 (classified as independent random parameters and correlated random parameters). It is shown that the MMNL model has the best performance in interpretation among all the models.

Conclusions
(1) MMNL model can be the first option when the parameter distribution is available because it performs best in interpretation.
(2) MNP model has its natural defect that all its error terms of utility functions should be normally distributed, which leads to a poor interpretation performance in practical application. Of all the test models, MNP shows the poorest performance in interpretation. (3) The prediction accuracy of NL model depends on the given behaviour structure of decision-makers. If the decision-making procedures are unknown, then it will turn out to be very difficult to construct choice structure, and it has great influences on final model results if the decision structure is built with considerable mistakes. (4) Illustration analysis indicates that the HEV model has a better interpretation ability in behavioral choice than the MNL model, but worse than NL model.
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.