Customer segmentation in retail facility location planning

In this contribution, we discuss a facility location model to maximize firms’ patronage, while demand is determined by a multinomial logit model (MNL). We account for customer segmentation based on customer characteristics. Hence, we are able to reduce the bias to the objective, which is due to constant substitution patterns of the MNL. Numerical studies show that averaging customer characteristics yield a bias of more than 15 % of the objective function value compared to segmentation. Using GAMS/CPLEX, we are able to solve problem sets with 2 segments, 500 demand points and 10 potential locations to optimality in 1 h computation time. If we consider 50 potential locations, the gap reported by CPLEX is \8 % in 1 h. We present an illustrative case example of a furniture store company in Germany (data are available as electronic supplementary material to this article). The corresponding problem is solved to optimality in a few minutes.


Introduction
In this paper, we consider a situation where companies (retail store chains, for example) compete for their market share. Suppose for example that a firm wants to locate new shops in a geographical market. The decision variable under control is only where to locate the new facilities. The way customers make their choices is to be taken into account, too (Serra and Colome 2001). The reaction of possible competitors (price, locations) is not considered here.
We discuss a model-based on the maximum capture problem-for the optimal location of K facilities. Customers' choices are modeled according to a specific discrete choice model, namely the multinomial logit model (MNL). Other demand models (the Huff-model, for example) might be used instead of the MNL. Our approach is valid for such kind of models as well. However, we do not consider this here. In general, discrete choice models are the workhorse for the analysis of individual choice behavior (McFadden 1973(McFadden , 2001. In literature, we find several applications of discrete choice models for spatial choice situations (Timmermans et al. 1992;Dellaert et al. 1998). Inspite of their long-term and widespread use, we find only few references in the operations research literature on facility location that account for discrete choice models. One reason may be the mathematical sophistication of the choice models. For example, de Palma et al. (1989), Benati (1999) and Marianov et al. (2008) discuss non-linear model formulations for discrete locational decisions. To the best of our knowledge, Benati and Hansen (2002) are the first who proposed a linear reformulation of the non-linear MNL. Their approach results in a hyperbolic sum integer problem. Haase (2009) uses constant substitution patterns of the MNL to find a linear integer reformulation. Aros-Vera et al. (2013) apply this approach to the planning of park-and-ride facilities. Finally, Zhang et al. (2012) propose an alternative approach similar to Benati and Hansen (2002). Haase and Müller (2014a) show that a variant of the model of Haase (2009) seems to be superior to the formulations of Benati and Hansen (2002) and Zhang et al. (2012).
The MNL exhibits the well-known independence from irrelevant alternatives property (IIA). Roughly speaking, this property implies that each choice alternative (facility location) is an equal substitute to every other alternative. Unfortunately, it is empirically evidenced that this core property is unlikely to hold in spatial choice context (Bhat and Guo 2004;Hunt et al. 2004). The linear reformulations of the MNL already introduced in the literature are all based on the assumption that customers of a given demand point are homogenous in their observable characteristics (age and income, for example). In this contribution, we show that, if customers of a given demand point are portioned into homogenous subgroups according to their characteristics, the predictive bias due to the IIA might be reduced (Sect. 2). Of course, simply considering average characteristics are not sufficient as the following illustrative example shows (see Fig. 1).
Consider a country with only two regions (1 and 2) and a firm selling rice seeds to farmers. Farmers are assumed to bunker seeds at a facility of the firm. There are two potential facility locations A and B (there are no competitors). Region 1 contains location A and region 2 contains location B. Farmers located in region 1 buy rice seeds only in A, while those of region 2 buy only in B. Region 1 contains 49 farmers and region 2 contains 50 farmers. Now assume that the climate in region 1 is hot and humid, while the climate of region 2 is arid (both regions might be separated by mountains). Since we expect that all of the farmers of region 1 buy rice seeds, but none of the farmers of region 2 would do so, we end up with a choice probability of buying rice seeds of 0.495 if we consider the population average. Now, assume the task of the firm is to select the facility location that maximizes the expected rice seed customers. Of course, we would select location B (in region 2), because 0.495 9 50 [ 0.495 9 49. However, the true sales are 0, because none of the farmers located in region 2 buys rice seeds, while the farmers of region 1 would only patronize a facility located in A. If the firm considers segment-specific choice probabilities instead (1 for farmers of region 1 and 0 for farmers of region 2), the optimal solution would be facility location A with an expected number of 49 rice seed customers. As a result, the expected bias, i.e., the relative deviation between the two solutions, is 100 %. Now, we learn from this example that simply considering average customer characteristics (instead of proper segmentation) may yield remarkably biased predictive outcomes. In other words, if customer characteristics are considered, it is advisable to employ segmentation instead of the averages of customer characteristics.
In this paper, we present an elucidating model formulation to account for customer segmentation within a mixed-integer program that enables to consider customer choice behavior by an MNL that accounts for customer characteristics (Sect. 3.1). Moreover, we present a simple lower bound and objective cuts for our problem (Sect. 3.2). We demonstrate the usefulness of our approach in extensive numerical studies (Appendix). Finally, we present an illustrative case example to show how our approach might be applied to support decision making for the management of a globally operating furniture store retail chain (Sect. 4). Let us consider the following problem statement: Find K facility locations from all potential locations J such that the total patronage for the K facilities is maximized.
First, we define the sets I demand nodes representing zones, like census blocks etc., that contain the customers, M i locations (existing and potential ones) from which the customers located in i 2 I choose exactly one location. M i may include a no-choice-alternative, indicating that customers might not occupy any facility. Hence, the no-choice alternative (a dummy facility, for example) reflects the proportion of customers who do not consume (services or products) at any facility. We might consider a special case such that M i ¼ M 8 i 2 I. J potential locations for the facilities a decision maker (a firm, for example) has to decide on: J S i2I M i . Note M i n J may include facility locations of competitors and/or the no-choice-alternative. That is, M i n J f g comprises locations that are not influenceable by the decision maker. Further, R i is a set of choice alternatives faced by the customers of i 2 I that denotes the number, type, and/or the amount of purchases conducted by the customers. Hence, the choice set faced by customers located in i 2 I is M i Â R i f g . Consider exemplarily a customer located in a given demand node i ¼ 1 who chooses to make a purchase of €10, €20, or €30 at any opened facility within a given time period. So R 1 ¼ 10; 20; 30 f g . Let us further assume there are only two facilities, i.e., M 1 ¼ A; B f g, then the choice set is ðA; 10Þ; . . .; ðB; 10Þ; . . .; ðB; 30Þ f g . A choice of ðA; 20Þ means that the customer chooses to make a purchase of €20 at facility A. Note, the choice set must be exhaustive and the choice alternatives have to be mutually exclusive. Roughly speaking, all alternatives the customers actually face have to be included in the choice set. The generation of M i Â R i f gis a sophisticated issue. We refer to Swait (2001) for further details.
We consider the parameters h i number of customers located in node i 2 I, and v ijr as the deterministic utility of customers located in i 2 I patronizing j 2 M i making a purchase denoted by r 2 R i . This could be a measure of generalized cost etc. K number of facilities to be located, with 0\K\ J j j.
Further, we define the binary decision variable y j = 1, if location j 2 J provides a facility (0, otherwise), and the non-negative variable x ijr as the choice probability of customers of node i 2 I who makes a purchase denoted by r 2 R i at a facility located at j 2 J i . If we assume that the choice probability is given by the MNL, x ijr is defined as Note, if M i n J 6 ¼ ;, then P j2J i P r2R i x ijr \1 for all i 2 I. Now the problem can be modeled as a mixed-integer non-linear program: subject to (1) and X j2J y j ¼ K ð3Þ Demand is determined by f ði; j; rÞx ijr with f ði; j; rÞ as a function denoting the consumption. We denote F as the objective function value of (2). In literature, we find exact linear reformulations of (1) such that (2)-(4) can be modeled as a mixedinteger program: Haase (2009) and Aros-Vera et al. (2013) employ specific properties of the MNL, while Zhang et al. (2012) propose an approach based on variable substitution similar to Benati and Hansen (2002). In Sect. 3, we present a modified reformulation of Haase (2009). At first, we focus on important properties of (1) in the following subsequent sections. We assume in the following that R i j j ¼ 1 8 i 2 I simplifying v ijr , (1), and (2) for convenience reasons. Of course, all formulations of the subsequent sections are valid for R i j j[ 1 8 i 2 I as well.

The independence from irrelevant alternatives property
The IIA property is well known in discrete (locational) choice literature (Ray 1973;Sheppard 1978;McFadden 2001;Sener et al. 2011). One outcome of the IIA is that the ratio of choice probabilities of two alternatives (i.e., facility locations) remains constant no matter whether other alternatives are available or not (constant substitution pattern). That is, the probability of patronizing a facility located in j relative to a facility located in m is independent of the existence and attributes of any other facility. Consider two arbitrary but existing facility locations j; m 2 M i to be given. Then, according to (1), the ratio of the choice probabilities x ij and x im is The IIA property of (5) implies that a new facility or change in the attractiveness of an existing facility other than m or j will draw patronage from competing facilities in direct proportion to their choice probabilities. In contrast, in applications, it is extremely unlikely that this property holds (Haynes and Fotheringham 1990;Müller et al. 2012;Hunt et al. 2004). In situations when the IIA property is not valid we should consider discrete choice models other than MNL (mixed logit or nested logit, for example). See Train (2009) for further reading. Müller et al. (2009), Haase (2009 and Haase and Müller (2013) propose approximate approaches that are able to incorporate a large class of discrete choice models into mathematical programs.

Aggregation issues
The MNL and hence (1) is based on the theory of utility maximization behavior of individuals. That is, each individual chooses the location that maximizes its utility. Given our problem statement of Sect. 2 and the corresponding model (2)-(4), we are interested in aggregate measures (market shares, total patronage etc.) instead of individual choice probabilities. Data on customer demand are usually given as an aggregate measure (number of customers, for example). Now, the question arises how we should compute the choice probability of all customers (individuals) located in a given demand point i 2 I? The answer depends on the specification of the utility v ij . If v ij does not contain characteristics of the customers (age, income, and so forth) then the choice probability x ij applies to all customers in i 2 I in the same way and thus, (2) is a proper formulation. In contrast, the incorporation of customer characteristics in v ij will improve the accurateness of x ij (Koppelman and Bhat 2006, pp 21-23 and pp 41-46). However, aggregation is more tedious in such a case.
Example 1 For simplicity reasons, we consider only one demand node i ¼ i 0 .
Further, we assume i 0 contains two customers n 2 1; 2 f g. Let the deterministic utility function for customer n be given as with g i 0 j as the cost for a trip from i 0 to j and q n is the income of customer n. The higher the income the lesser the impact of travel cost (Casado and Ferrer 2013). Now, there are basically two ways of computing x i 0 j : 1. we use the average income of n ¼ 1 and n ¼ 2 (i.e., the average income of demand node i 0 ) denoted by q i 0 ¼ ðq 1 þ q 2 Þ=2 to compute v i 0 j and thus x i 0 j , or 2. we first compute the choice probabilities for each customer x nj and then we determine the average choice probability of customers located in i 0 as In general, (1) is expected to be inaccurate compared to (2) because of the nonlinear relationship between x i 0 j and v i 0 j in (1). Consider the values given in Table 1. As expected, x i 0 j determined by (1) andx i 0 j determined by (2) are different. As shown by Train (2009), pp 29-32 (2) should be preferred. In addition, we observe an interesting pattern if we apply customer characteristics in an appropriate way: the ratio of the average choice probabilitiesx i 0 A =x i 0 C depends on the existence of facility location B (non-constant substitution pattern). Although the IIA property does apply to each customer n, it does not apply to the population of i 0 as a whole. The key point is that there are two distinct segments of the population (high and low income) with different choice probabilities: We compare two different solutions to (2)-(4), namely solution I (all locations are selected) and solution II (location B is not selected). The customer with low income (n ¼ 2) considers location A to be a better substitute to B than C. In contrast, for customer n ¼ 1 (high income), locations A and C are more or less equal substitutes to location B. This pattern is due to the different evaluation of travel cost by the two segments (i.e., customers).
There are two lessons learned so far: First, the more customer characteristics are included in v ij in an appropriate way, the better are the forecast properties of MNL, x ij , respectively. Second, by applying segmentation to our model (2)-(4) as outlined in (2), we are able to reduce the bias of x ij and F due to the IIA of (5) to some extent. In applications, one would be interested in how to classify customers, and how Table 1 Aggregation, choice probabilities and the IIA property q 1 9 9 9 q 2 1 1 1 Solution I: y j 1 1 1 Solution II: y j 1 0 1 Of course, income q n as a characteristic of the customer is constant over alternatives. The choice probabilities are computed using (1) and (6). The last column contains the ratio of choice probabilities of facility locations A and C according to (5). We consider two solutions (i.e., I and II) to problem (2)-(4) Business Research (2014) 7:235-261 241 many customer segments are appropriate for a given application. Of course, segmentation makes sense only if the deterministic part of utility contains factors that vary over choice makers. Usually, such factors are socio-economic factors like age, gender, income, occupation, car ownership, and so forth. In empirical studies, socio-economic factors that are continuous measures (age and income, for example) are usually considered as categorical measures. For example, a proband is asked whether his/her age is (a) below 20 years, (b) between 20 and 40 years, (c) between 40 and 60 years, or (d) older than 60 years. Now consider a deterministic utility function with only two socio-economic factors: gender and age. Gender, of course, consists of only two categories: female and male. So, we end up with eight customer segments: the four age levels for each of the two genders. Considering many socioeconomic factors with many levels yields a large number of segments. How many segments are appropriate and tractable could not be said in the abstract. It rather depends on the application, in particular, the empirically specified choice model. See Ben-Akiva and Lerman (1985), pp 131-153 for a detailed discussion of aggregation and segmentation.

A probabilistic choice model with customer segmentation
In Sect. 2, we have demonstrated that the IIA may yield biased values of x ij of (1) and hence a biased objective function value F of (2). Moreover, a partition of the population of a demand point i 2 I into homogenous sub-populations (i.e., segmentation) enables us to reduce the bias due to the IIA. In this section, we propose how to explicitly account for segments of customers (heterogeneous customer demand) in a linear mixed-integer model formulation of (2)-(4).

Mathematical formulation
In addition to the definitions of Sect. 2, we consider the set S i segments of the customers located in demand node i 2 I; for example high and low income or male and female or a combination of income and gender.
Next, we denote the parameters e h is number of customers according to segment s 2 S i located in node i 2 I, e v isj as the deterministic utility of customers of segment s 2 S i located in i 2 I patronizing j 2 M i , p isj choice probability of customers of segment s 2 S i at node i 2 I who access service at a facility located at j 2 J i given that all m 2 J are established, i.e., p isj ¼ e e v isj = P m2M i e e v ism , u isj choice probability of customers of segment s 2 S i at node i 2 I who access service at a facility located at j 2 J i given that j 2 J i is the only facility location established, i.e., u isj ¼ e e v isj =ðe e v isj þ P m2M i nJ e e v ism Þ, and f is cumulative choice probability of customers of segment s 2 S i at node i 2 I who access service at competing facilities given that all potential facilities Finally, we define the non-negative variables e x isj as the MNL choice probability of customers of segment s 2 S i at node i 2 I who access service at a facility located at j 2 J i , and z is as the cumulative choice probability of customers of segment s 2 S i at node i 2 I who do not access any facility of the considered firm.
Then, our model according to the problem statement of Sect. 2 is subject to Business Research (2014) 7:235-261 243 We denote e F as the objective function value of (7). Let be given a combination of i 2 I; s 2 S i ; and j 2 J i . For convenience reasons, we assume for a moment that (9) and further z is ¼ 1 because of (8). If y j ¼ 1, then according to (11), e x isj ¼ z is Á p isj =f is , because of (7) and z is Á p isj =f is u isj . Due to (8) and substitution, we get the correct choice probabilities with k indicating the facility location of the competitor. Of course, these coherences are valid for M j j[ 2 and J j j [ 1 as well. Therefore, constraints (8)-(11) together with (7) yield the MNL choice probabilities. For more details, we refer to Haase (2009) and Aros-Vera et al. (2013). Using u isj in (9) and p isj in (10) yields bounds on e x isj that are tighter than simply using 0 e x isj y j . In contrast to Aros-Vera et al. (2013), we do not consider redundant constraints in our model: Using (11)

Lower bound and objective cuts
To derive an intelligible lower bound for e F of (7), we consider the binary variable w mj . Further, we define the non-negative variable If we minimize Q subject to (16) and X j2Jn m f g the quantity denotes the maximum attractiveness of facility location m 2 J with w Ã mj indicating that j belongs to the K À 1 most attractive facility locations compared to m. If we maximize Q subject to (16)-(18), then the quantity denotes the minimum attractiveness of facility location m 2 J with w Ã mj indicating that j belongs to the K À 1 least attractive facility locations compared to m. To derive a lower bound, we choose the K-largest j 2 J according to b j . Denote this set asJ. Accordingly,J i ¼J \ M i . Now compute the lower bound as: Finally, we add to our model (7)- (15) to account for a lower bound (LB) (22) and an objective cut OC1 (23). A lower bound for problems with capacities is presented in Haase and Müller (2014b). Now, we might define the quantities and Based on Benati and Hansen (2002), we can define a second objective cut OC2 alternatively to (23) Note, c j in (25) is negative for all j 2J by construction. We are interested in the impact of the number of segments, the lower bound, the objective cuts, the number of competitors on the solution and the solvability of our approach. The corresponding numerical examples can be found in the Appendix. The major findings of these numerical examples are that (1) segmentation has significant impact on the computational effort, (2) the lower bound (22) provides a quite good solution (it deviates \1 % from the optimal solution), and (3) the use of the objective cut OC1 (23) is particularly appealing if we do not expect to find an optimal solution within a given time. Further, we solve problem sets with 2 segments, 500 demand points and 10 potential locations to optimality in 1 h computation time. If we consider 50 potential locations, the gap reported by CPLEX is \8 % in 1 h. In this section, we apply our model of Sect. 3.1 to a hypothetical-but still realistic-branch-extension of a large furniture store company in Germany. Figure  2a shows the already existing facility locations and the potential facility locations of the considered firm, as well as the locations of the main competitors in the market. The firm already runs 46 stores in the year 2012 with a market share of 12.5 % and 46 million customers yielding 3.7 billion Euro revenue. The firm aims to massively expand in the market in the near future. It is intended to establish 5-15 new facilities until 2020. The task is to find out the optimal locations for a given number of new facilities (K þ ) from 50 potential facility locations and the corresponding expected market share of the firm. We consider the centroids of the 415 German ''Kreise'' (municipalities) as demand points. The locations of the facilities (existing, potential, and competitors) are given by longitude and latitude coordinates. The euclidean distance in kilometers between a demand point i 2 I and a facility location j 2 M is denoted by d ij . The choice set for each demand node i 2 I is defined by with M as the set of all facility locations and d as a threshold distance. If p isj \0:00001 then we remove j from M i . There exist 101 facility locations of the competitors. Thus, M j j ¼ 197. Customers do not consider facilities located more distant than d as a conceivable alternative. Since, the main customers of the firm are aged between 15 and 25, we consider two distinct segments of customers: e h i;s¼1 as the number of customers aged between 15 and 25 and e h i;s¼2 as the number of customers of all other ages. The deterministic part of utility (see Sects. 2, 3.1) is given as are the utility contribution per unit of the corresponding attribute (distance and income). Equation 28 denotes the utility for not choosing any of the facility locations of the firm (potential and existing) or the competitors. Roughly speaking, j ¼ 0 denotes a dummy facility absorbing all demand not satisfied by the facilities of the firm or the competitor. The dummy facility j ¼ 0 comprises the utility of customers either to patronize a small, local furniture store or not to consume furniture anyway. Note that (28) and (29) are rather simplistic specifications of utility to make the application more comprehensible.
In a real-world application, the coefficients b inc ; b dist s of (28) and (29) have to be estimated using empirical choice data (i.e., discrete choice analysis). Large companies can easily afford a comprehensive empirical study to appropriately estimate the coefficients of the utility functions. Here, we cannot obtain such estimates, hence we rely on parameter estimates from other empirical studies. Suarez et al. (2004) provide coefficient estimates for a shopping center choice model. They distinguish between two different segments of customers (target group and others) and estimate coefficients of the distance between the customers location and the shopping center for both customer segments. Here, we employ these coefficient estimates, given as This indicates that the main customers (s ¼ 1, population aged between 15 and 25) are less sensitive to distance than other customers. Goldman (1976) provides empirical evidence on the coherence between income and the propensity of shopping at a specific facility. Based on Fotheringham and Trew (1993), we might consider b inc ¼ À0:015: Now, we are able to compute the expected patronage for each existing facility using (21) and hence the total expected market share of the firm as 123 We consider this as the base scenario. Figure 2b displays the result. We know that, on average, a customer is assumed to make five shopping visits a year. This yields 41 million customers over all existing facilities and a total expected market share of 11.12 %. The expected market share is below the reference value of 12.5 %. This is (a) (b) (c) (d) Fig. 3 Results of sensitivity analysis for d; b dist s ; b inc , and market share (MS). K of (12) is given by 46 þ K þ (46 facilities are already in the market) reasonable, because we do not consider online purchases and there might be some inconsistencies close to the border of Germany due to transnational purchases of customers. On average, a customer spends 80 Euro per visit yielding an annual revenue of 3.28 billion Euro. This is close to the reference value of 3.7 billion Euro. We conclude that our demand model makes predictions fairly well.
Since our parameters do not stem from a unique study on furniture store customer behavior in Germany, we first investigate the sensitivity of the solution to parameter variations. The locational decision variables y j are fixed to one for the already existing facility locations (i.e., j\47). We solve our model of Sect. 3 for various parameter settings and for different distance thresholds d of (27). We are interested in MS's dependence on K þ . We have implemented our model in GAMS 23.7 and we use CPLEX 12.2 on a 64-bit Windows Server 2008 with 4 Intel Xeon 2.4 GHz processors and 24 GB RAM for all studies. All problems considered in this section are solved to optimality within minutes. The results of Fig. 3 show a piecewise linear increase of the market share in K þ . The slope is nearly 0.35 indicating that with each additional facility, the total market share of the firm increases by 0.35 % points. Note, the underlying function is not necessarily concave. The sensitivity analysis indicates that the market share is independent from the distance threshold d [ 50 and the weight of the income b inc . In contrast, the scale of the market share heavily depends on the distance parameters (b dist s ). This finding stresses the need for firms to employ the estimates based on unique choice studies (see Street and Burgess 2007;Müller et al. 2008;Louviere et al. 2000 for how to design studies and experiments for discrete choice analysis).  Table 2) Based on the (linear) relationship between MS and K þ , the firm's management is enabled to identify a specific number of new facilities to be located. The optimal locations and the expected (annual) patronage of the new facilities can be displayed in maps and enhance the decision making of the firm's management. Figure 4 exemplifies a market expansion with 5 and 10 new facilities. In a real-world management application, one usually has to account for locally varying locational (and maybe operational) cost. In such a situation, one would be interested in the relationship between cost (or budget) and market share. The firm is further interested in the impact of segmentation of their customers (see Sect. 2.2). Therefore, we consider the following example that extends Example 1.
Example 2 We expect the more the two segments differ, the larger is the predictive bias of the MNL and thus the larger is the bias of the objective function value if segmentation is neglected. Due to the specification of the deterministic part of utility in (29), the difference in choice probabilities between the two segments corresponds to the difference between b dist s¼1 and b dist s¼2 .To evaluate the impact of neglected segmentation, we first consider in (29). This corresponds to a simple average of utilities as described in (1) of Sect.

The corresponding solution in terms of selected locations is denoted by
Based on J, we compute the MNL choice probabilities using segmentation, i.e., we use b dist s¼1 and b dist s¼2 instead of b dist in (29). The corresponding objective function value is denoted as F and the corresponding market share is given by MS(F).
We consider b dist s 2 f À 1; À0:1; À0:01; À0:001; À0:0001g, b inc ¼ À0:015, and d ¼ 150. Further, we consider two scenarios: K þ ¼ 5 and K þ ¼ 10. The results are given in Fig. 5. The patterns for the total deviation F À e F, relative deviation 100 Â ðF À e FÞ= e F, and the deviation of the market shares MSðFÞ À MSð e FÞ are similar. The most eye-catching bias occurs if b dist s¼1 ¼ À1. Consider exemplarily For each problem set, we have computed ten instances. The numbers given are the averages over ten instances. CPU denotes the time used by CPLEX. All instances are solved to optimality. I j j ¼ 100 and J j j ¼ 20 Business Research (2014) 7:235-261 251 b dist s¼1 ¼ À1 and b dist s¼2 ¼ À0:1, i.e., segment s ¼ 1 evaluates each additional kilometer ten times as negative as segment s ¼ 2 (i.e., b dist s¼1 =b dist s¼2 ¼ 10). In case that segmentation is neglected, the corresponding distance-coefficient is b dist ¼ À0:55. As a consequence, a large part of customers (recall that, P i e h i;s¼1 = P i e h i;s¼2 ¼ 0:163) evaluates distance more than five times as negative as this would be the case with segmentation. Of course, the corresponding deviation is remarkable (À8.9 % for K þ ¼ 5 and À12.5 % for K þ ¼ 10). The asymmetric pattern in Fig. 5 is due to the uneven distribution of population over the two segments (the population of segment 2 is larger than the population of segment 1): the more the true coefficient of the large part of the population (segment 2) deviates from the average coefficient the larger is the expected predictive error. In contrast, a large deviation of the true coefficient of segment 1 has impact only on a small part of the population and the corresponding expected predictive error is comparably small. Obviously, the extent of the error heavily depends on the scale of the coefficients. Consider, for example, b dist s¼1 ¼ À1 and b dist s¼2 ¼ À0:1. The corresponding ratio is 10 and the expected error for K þ ¼ 5 is À8.88 %. Now, for b dist s¼1 ¼ À0:1 and b dist s¼2 ¼ À0:01 the corresponding ratio is 10 again. However, the corresponding error is only À0.18 %. This pattern is due to the non-linear relationship between distance (deterministic utility) and the choice probabilities (i.e., a s-shaped probability function). As the coefficients (weighting of travel distance) get larger (i.e., approaching 0) the probabilities of choosing to patronize a facility approach the largest possible value. For these values of the deterministic utility the difference in the corresponding choice probabilities between the two segments become small.
The bias found in our study is comparable to those reported in studies on spatial aggregation (Andersson et al. 1998;Daskin et al. 1989;Current and Schilling 1987;Murray and Gottsegen 1997). In literature, ratios of segment-specific coefficients larger than 50 are reported (Müller et al. 2012;Koppelman and Bhat 2006, pp 133-134). However, the difference between segment-specific distance-coefficients used in our application is small. We have considered parameter settings that yield a ratio b dist s¼1 =b dist s¼2 ¼ 0:91 (see Fig. 3). As a consequence, the expected bias is below 1 % if we neglect segmentation in our application. Nevertheless, the consideration of segments yields valuable insights, because the utility function (29) and the corresponding coefficients are arbitrarily chosen. As stated before, for a real application, the company is expected to specify utility functions and estimate the corresponding coefficients on unique choice data. The firm may use such a numerical study to make assumptions about worst-case scenarios.

Summary
By an intelligible example, we demonstrate that the independence from IIA of the MNL may yield false predictions. This finding is well founded on empirical studies. When the MNL is used in a mathematical program to incorporate customer choice behavior, the model outcomes are very likely to be biased as well. Although the MNL is founded on individual choice behavior, in facility location planning we are interested in the share of customers of a demand point patronizing a certain facility. If we assume the customers of a demand point are homogenous, i.e., they exhibit the same observable characteristics, then there is no need for segmentation. If we assume the customers to be heterogeneous then segmentation of the customers according to their characteristics (income and age, for example) should be employed. By proper segmentation, we are able to reduce the predictive bias of the MNL in terms of market shares.
In this contribution, we present a model formulation for the maximum capture problem that explicitly allows for customer segmentation using the MNL to find optimal shopping facility locations. Moreover, we propose an intelligible approach to derive a lower bound for our model. Extensive computational studies show the impact of proper segmentation as well as the efficiency of our approach: using aggregate customer characteristics instead of proper segmentation may yield a predictive bias of the objective function value of more than 15 % deviation from the optimal objective function value. Our lower bound is found in \1 s and deviates \1 % from the optimal solution. Problems with 2 segments, 50 potential locations and 500 demand points can be solved to a gap \8 % within 1 h using GAMS/ CPLEX. Based on our numerical studies concerning the quality of the lower bound, it is reasonable to assume that the true gap is remarkably smaller than 8 %. We apply our approach in an illustrative case example of a globally operating furniture store company that intends to increase its market share in Germany by branch expansion. This problem can be solved to optimality within few minutes. Our example shows how the novel approach can be used for management decision support.
Based on our findings, several possible directions of future research appear. It is of interest to find analytically bounds on the bias of the objective function value due to missing segmentation under various segmentation patterns and specifications of utility. Further, the explicit consideration of substitution patterns, i.e., correlation between facility locations, is a very important issue to be analyzed. Efficient solution methods are necessary to account for larger problem sets. Finally, our approach is useful to other areas of operations research; assortment optimization, for example Kök and Fisher (2007).

Appendix
In this section, we provide numerical examples to validate and test the mathematical formulation of Sect. 3. We assume M i ¼ M, S i ¼ S and J i ¼ J 8 i 2 I. For given I, M, and J, we generate longitude and latitude coordinates using a random uniform distribution in the interval 0; 100 ½ . We set the maximum computational time to 1 h if not stated otherwise. Further, we assume that demand is completely satisfied, i.e., a no-choice alternative does not exist. To generate the demand e h is , we first generate a population Pop i for each demand node i 2 I using a random uniform distribution in the interval 0; 10 ½ weighted by the ratio M j j= I j j. Further, we generate weights x is 8 i 2 I; s 2 S using a random uniform distribution in the interval 0; 1 ½ . Then, with t ij as the travel-time between i 2 I and j 2 J; computed as the rectangular distance between i 2 I and j 2 J divided by 60. All other parameters of Sect. 3 can be easily derived. In the following, we consider several numerical examples to test our mathematical formulation.
Example 3 In this study, we are interested in the additional burden due to the number of segments. We set I j j ¼ 50, J j j ¼ 20, K ¼ 5; 10; 15 f g , S j j ¼ 1; 2; . . .; 5 f g , M n J j j¼ K, and b s from a uniform distribution in the interval À2; À0:5 ½ . Note, in Fig. 6 Example 3: M n J j j¼ K, I j j ¼ 50 and J j j ¼ 20. For each problem set, we consider ten randomly generated instances. The values are the averages over ten instances Table 3 Results of examining the effect of the lower bound and the objective cuts (Example 5) applications the number of segments will be small due to data availability. See Ben-Akiva and Lerman (1985), pp 148-150 for an illustrative case study. For each problem set K and S, we compute ten randomly generated instances. Figure 6 displays the results. We observe that the computational effort increases with the number of segments. Seemingly, it depends on the ratio K=jJj how fast the computational effort increases in jSj. If only a few locations have to be selected (K ¼ 5) or many locations have to be selected (K ¼ 15), the computational effort is small compared to the situation where 50 % of the potential locations have to be selected (K ¼ 10).
Example 4 In this example, we investigate the impact of the number of competing facility locations M n J j jand the number of facilities to be located K. We consider I j j ¼ 100, J j j ¼ 20, jSj ¼ 2, b s¼1 ¼ À1, and b s¼2 ¼ À0:5 for nine different problem sets with ten instances each. The results are given in Table 3. The market share of the considered firm declines in the number of competing facilities. The smaller K the more the decline of the market share in the number of competitors (nearly 50 % decline for K ¼ 5 compared to somewhat more than 30 % for K ¼ 15). If the number of established facilities and the number of competing facilities are equal, then market shares are nearly the same (especially, if many facilities are established). This study confirms the findings of Example 3 concerning the ratio K=jJj and the corresponding computational effort. Further, the study shows an interesting pattern: there seems to be a positive relationship between the market share and the computational effort (the larger the market share the more CPU time is needed).
Example 5 Now we are interested in the efficiency of the lower bound described in Sect. 3.2. We consider four problem sets with J j j ¼ 20; 30 f gand K ¼ 5; 10 f g. For each problem set, 10 randomly instances are generated. Further, we set I j j ¼ 50, M n J j j¼ K, jSj ¼ 2, b s¼1 ¼ À1, and b s¼2 ¼ À0:5. For each instance, we solve our model with and without the lower bound (22) and with and without the OC1 (23) and OC2 (26). Table 4 displays the results. For all instances, CPLEX found the optimal solution within 1 h computational time. However, for larger problem sets (K [ 5), we are able to prove optimality within 1 h only if we use the lower bound LB. We are able to decrease the computational effort remarkably (at least 20 times faster) using LB. The lower bound is found in \1 s and LB deviates \1 % from the optimal solution. In a small numerical example Benati and Hansen (2002) show that they find the optimal solution to their problem by variable neighborhood search in \1 s for problem sets up to J j j ¼ 50 and K\10. Concerning the objective cuts OC1 (23) and OC2 (26), we observe a benefit only for small problem sets ( J j j ¼ 20, K ¼ 5). Unfortunately, for larger problem sets, the computational effort increases (up to 2.5 times slower). Possibly, this is due to a degeneration of the LP relaxation using the objective cuts. This finding is confirmed by the results of Benati and Hansen (2002). They report that their upper bound based on submodular maximization-which is comparable to our objective cuts-performs not as good as the bound provided by concave relaxation. In our study, we find no remarkable difference in performance between OC1 and OC2.
Example 6 The objective of this numerical example is to figure out up to what problem size we are able to solve our problem to (or close to) optimality. We consider I j j 2 100; 250; 500 f g , J j j 2 10; 25; 50 For each of the nine problem sets, we solve ten instances. The results are given in Table 5. Small-sized problem sets ( J j j ¼ 10) can be easily solved to optimality. Medium-sized problem sets ( J j j ¼ 25) can be solved up to a gap of \6 % in 1 h. For large problem sets ( J j j ¼ 50), the gap becomes disappointing if we only use the lower bound (22). In contrast, if we use the lower bound (22) and the OC1 (23), we are able to reduce the gap to somewhat more than 7 % within 1 h. Taking into account the good quality of the lower bound (see Example 5) and the observation that most of the time is needed to prove optimality, we may assume that the ''true'' gap is even smaller. Note, Benati and Hansen (2002) made the same observation. We consider model (7)- (15), (22). For each problem set, we have computed ten random instances. The numbers are the averages over ten instances. CPU denotes the time in seconds used by CPLEX (maximum computation time 1 or 2 h). GAP denotes the solution gap in percent provided by CPLEX. We consider S j j ¼ 2, K ¼ J j j=2 and M n J j j¼ K Business Research (2014) 7:235-261 257 Table 5 Data of Example 2 in Fig. 5 In.