Abstract
In this contribution, we discuss a facility location model to maximize firms’ patronage, while demand is determined by a multinomial logit model (MNL). We account for customer segmentation based on customer characteristics. Hence, we are able to reduce the bias to the objective, which is due to constant substitution patterns of the MNL. Numerical studies show that averaging customer characteristics yield a bias of more than 15 % of the objective function value compared to segmentation. Using GAMS/CPLEX, we are able to solve problem sets with 2 segments, 500 demand points and 10 potential locations to optimality in 1 h computation time. If we consider 50 potential locations, the gap reported by CPLEX is \(<\)8 % in 1 h. We present an illustrative case example of a furniture store company in Germany (data are available as electronic supplementary material to this article). The corresponding problem is solved to optimality in a few minutes.
Introduction
In this paper, we consider a situation where companies (retail store chains, for example) compete for their market share. Suppose for example that a firm wants to locate new shops in a geographical market. The decision variable under control is only where to locate the new facilities. The way customers make their choices is to be taken into account, too (Serra and Colome 2001). The reaction of possible competitors (price, locations) is not considered here.
We discuss a model—based on the maximum capture problem—for the optimal location of \(K\) facilities. Customers’ choices are modeled according to a specific discrete choice model, namely the multinomial logit model (MNL). Other demand models (the Huffmodel, for example) might be used instead of the MNL. Our approach is valid for such kind of models as well. However, we do not consider this here. In general, discrete choice models are the workhorse for the analysis of individual choice behavior (McFadden 1973, 2001). In literature, we find several applications of discrete choice models for spatial choice situations (Timmermans et al. 1992; Dellaert et al. 1998). Inspite of their longterm and widespread use, we find only few references in the operations research literature on facility location that account for discrete choice models. One reason may be the mathematical sophistication of the choice models. For example, de Palma et al. (1989), Benati (1999) and Marianov et al. (2008) discuss nonlinear model formulations for discrete locational decisions. To the best of our knowledge, Benati and Hansen (2002) are the first who proposed a linear reformulation of the nonlinear MNL. Their approach results in a hyperbolic sum integer problem. Haase (2009) uses constant substitution patterns of the MNL to find a linear integer reformulation. ArosVera et al. (2013) apply this approach to the planning of parkandride facilities. Finally, Zhang et al. (2012) propose an alternative approach similar to Benati and Hansen (2002). Haase and Müller (2014a) show that a variant of the model of Haase (2009) seems to be superior to the formulations of Benati and Hansen (2002) and Zhang et al. (2012).
The MNL exhibits the wellknown independence from irrelevant alternatives property (IIA). Roughly speaking, this property implies that each choice alternative (facility location) is an equal substitute to every other alternative. Unfortunately, it is empirically evidenced that this core property is unlikely to hold in spatial choice context (Bhat and Guo 2004; Hunt et al. 2004). The linear reformulations of the MNL already introduced in the literature are all based on the assumption that customers of a given demand point are homogenous in their observable characteristics (age and income, for example). In this contribution, we show that, if customers of a given demand point are portioned into homogenous subgroups according to their characteristics, the predictive bias due to the IIA might be reduced (Sect. 2). Of course, simply considering average characteristics are not sufficient as the following illustrative example shows (see Fig. 1).
Consider a country with only two regions (1 and 2) and a firm selling rice seeds to farmers. Farmers are assumed to bunker seeds at a facility of the firm. There are two potential facility locations A and B (there are no competitors). Region 1 contains location A and region 2 contains location B. Farmers located in region 1 buy rice seeds only in A, while those of region 2 buy only in B. Region 1 contains 49 farmers and region 2 contains 50 farmers. Now assume that the climate in region 1 is hot and humid, while the climate of region 2 is arid (both regions might be separated by mountains). Since we expect that all of the farmers of region 1 buy rice seeds, but none of the farmers of region 2 would do so, we end up with a choice probability of buying rice seeds of 0.495 if we consider the population average. Now, assume the task of the firm is to select the facility location that maximizes the expected rice seed customers. Of course, we would select location B (in region 2), because 0.495 × 50 > 0.495 × 49. However, the true sales are 0, because none of the farmers located in region 2 buys rice seeds, while the farmers of region 1 would only patronize a facility located in A. If the firm considers segmentspecific choice probabilities instead (1 for farmers of region 1 and 0 for farmers of region 2), the optimal solution would be facility location A with an expected number of 49 rice seed customers. As a result, the expected bias, i.e., the relative deviation between the two solutions, is 100 %. Now, we learn from this example that simply considering average customer characteristics (instead of proper segmentation) may yield remarkably biased predictive outcomes. In other words, if customer characteristics are considered, it is advisable to employ segmentation instead of the averages of customer characteristics.
In this paper, we present an elucidating model formulation to account for customer segmentation within a mixedinteger program that enables to consider customer choice behavior by an MNL that accounts for customer characteristics (Sect. 3.1). Moreover, we present a simple lower bound and objective cuts for our problem (Sect. 3.2). We demonstrate the usefulness of our approach in extensive numerical studies (Appendix). Finally, we present an illustrative case example to show how our approach might be applied to support decision making for the management of a globally operating furniture store retail chain (Sect. 4).
A probabilistic choice model
Let us consider the following problem statement:
Find\(K\)facility locations from all potential locations\(J\)such that the total patronage for the\(K\)facilities is maximized.
First, we define the sets
 \(I\) :

demand nodes representing zones, like census blocks etc., that contain the customers,
 \(M_i\) :

locations (existing and potential ones) from which the customers located in \(i \in I\) choose exactly one location. \(M_i\) may include a nochoicealternative, indicating that customers might not occupy any facility. Hence, the nochoice alternative (a dummy facility, for example) reflects the proportion of customers who do not consume (services or products) at any facility. We might consider a special case such that \(M_i = M \ \forall \ i \in I\).
 \(J\) :

potential locations for the facilities a decision maker (a firm, for example) has to decide on: \(J \subseteq \bigcup _{i \in I} M_i\). Note \(M_i \setminus J\) may include facility locations of competitors and/or the nochoicealternative. That is, \(\left\{ M_i \setminus J \right\} \) comprises locations that are not influenceable by the decision maker. Further, \(J_i = M_i \cap J\).
 \(R_i\) :

is a set of choice alternatives faced by the customers of \(i \in I\) that denotes the number, type, and/or the amount of purchases conducted by the customers. Hence, the choice set faced by customers located in \(i \in I\) is \(\left\{ M_i \times R_i\right\} \). Consider exemplarily a customer located in a given demand node \(i=1\) who chooses to make a purchase of €10, €20, or €30 at any opened facility within a given time period. So \(R_1 = \left\{ 10, 20, 30\right\} \). Let us further assume there are only two facilities, i.e., \(M_1 = \left\{ A, B\right\} \), then the choice set is \(\left\{ (A,10),\ldots , (B,10),\ldots ,(B,30)\right\} \). A choice of \((A, 20)\) means that the customer chooses to make a purchase of €20 at facility \(A\). Note, the choice set must be exhaustive and the choice alternatives have to be mutually exclusive. Roughly speaking, all alternatives the customers actually face have to be included in the choice set. The generation of \(\left\{ M_i\times R_i\right\} \) is a sophisticated issue. We refer to Swait (2001) for further details.
We consider the parameters
 \(h_i\) :

number of customers located in node \(i \in I\), and
 \(v_{ijr}\) :

as the deterministic utility of customers located in \(i \in I\) patronizing \(j \in M_i\) making a purchase denoted by \(r \in R_i\). This could be a measure of generalized cost etc.
 \(K\) :

number of facilities to be located, with \(0 < K < \left J\right \).
Further, we define the binary decision variable
 \(y_j\) :

= 1, if location \(j \in J\) provides a facility (0, otherwise), and
the nonnegative variable
 \(x_{ijr}\) :

as the choice probability of customers of node \(i \in I\) who makes a purchase denoted by \(r \in R_i\) at a facility located at \(j \in J_i\). If we assume that the choice probability is given by the MNL, \(x_{ijr}\) is defined as
Note, if \(M_i \setminus J \ne \emptyset \), then \(\sum _{j \in J_i} \sum _{r \in R_i} x_{ijr} < 1\) for all \(i \in I\). Now the problem can be modeled as a mixedinteger nonlinear program:
subject to (1) and
Demand is determined by \(f(i, j, r)x_{ijr}\) with \(f(i, j, r)\) as a function denoting the consumption. We denote \(F\) as the objective function value of (2). In literature, we find exact linear reformulations of (1) such that (2)–(4) can be modeled as a mixedinteger program: Haase (2009) and ArosVera et al. (2013) employ specific properties of the MNL, while Zhang et al. (2012) propose an approach based on variable substitution similar to Benati and Hansen (2002). In Sect. 3, we present a modified reformulation of Haase (2009). At first, we focus on important properties of (1) in the following subsequent sections.
We assume in the following that \(\left R_i\right = 1 \ \forall \ i \in I\) simplifying \(v_{ijr}\), (1), and (2) for convenience reasons. Of course, all formulations of the subsequent sections are valid for \(\left R_i\right > 1 \ \forall \ i \in I\) as well.
The independence from irrelevant alternatives property
The IIA property is well known in discrete (locational) choice literature (Ray 1973; Sheppard 1978; McFadden 2001; Sener et al. 2011). One outcome of the IIA is that the ratio of choice probabilities of two alternatives (i.e., facility locations) remains constant no matter whether other alternatives are available or not (constant substitution pattern). That is, the probability of patronizing a facility located in \(j\) relative to a facility located in \(m\) is independent of the existence and attributes of any other facility. Consider two arbitrary but existing facility locations \(j, m \in M_i\) to be given. Then, according to (1), the ratio of the choice probabilities \(x_{ij}\) and \(x_{im}\) is
The IIA property of (5) implies that a new facility or change in the attractiveness of an existing facility other than \(m\) or \(j\) will draw patronage from competing facilities in direct proportion to their choice probabilities. In contrast, in applications, it is extremely unlikely that this property holds (Haynes and Fotheringham 1990; Müller et al. 2012; Hunt et al. 2004). In situations when the IIA property is not valid we should consider discrete choice models other than MNL (mixed logit or nested logit, for example). See Train (2009) for further reading. Müller et al. (2009), Haase (2009) and Haase and Müller (2013) propose approximate approaches that are able to incorporate a large class of discrete choice models into mathematical programs.
Aggregation issues
The MNL and hence (1) is based on the theory of utility maximization behavior of individuals. That is, each individual chooses the location that maximizes its utility. Given our problem statement of Sect. 2 and the corresponding model (2)–(4), we are interested in aggregate measures (market shares, total patronage etc.) instead of individual choice probabilities. Data on customer demand are usually given as an aggregate measure (number of customers, for example). Now, the question arises how we should compute the choice probability of all customers (individuals) located in a given demand point \(i \in I\)? The answer depends on the specification of the utility \(v_{ij}\). If \(v_{ij}\) does not contain characteristics of the customers (age, income, and so forth) then the choice probability \(x_{ij}\) applies to all customers in \(i \in I\) in the same way and thus, (2) is a proper formulation. In contrast, the incorporation of customer characteristics in \(v_{ij}\) will improve the accurateness of \(x_{ij}\) (Koppelman and Bhat 2006, pp 21–23 and pp 41–46). However, aggregation is more tedious in such a case.
Example 1
For simplicity reasons, we consider only one demand node \(i = i'\). Consider \(J=M_{i'}=\left\{ A,B,C\right\} \). Further, we assume \(i'\) contains two customers \(n\in \left\{ 1,2\right\} \). Let the deterministic utility function for customer \(n\) be given as
with \(g_{i'j}\) as the cost for a trip from \(i'\) to \(j\) and \(q_n\) is the income of customer \(n\). The higher the income the lesser the impact of travel cost (Casado and Ferrer 2013). Now, there are basically two ways of computing \(x_{i'j}\):

1.
we use the average income of \(n=1\) and \(n=2\) (i.e., the average income of demand node \(i'\)) denoted by \(\overline{q}_{i'} = (q_1 + q_2)/2\) to compute \(\overline{v}_{i'j}\) and thus \(\overline{x}_{i'j}\), or

2.
we first compute the choice probabilities for each customer \(x_{nj}\) and then we determine the average choice probability of customers located in \(i'\) as \(\tilde{x}_{i'j} = (x_{n=1,j} + x_{n=2,j})/2\).
In general, (1) is expected to be inaccurate compared to (2) because of the nonlinear relationship between \(x_{i'j}\) and \(v_{i'j}\) in (1). Consider the values given in Table 1. As expected, \(\overline{x}_{i'j}\) determined by (1) and \(\tilde{x}_{i'j}\) determined by (2) are different. As shown by Train (2009), pp 29–32 (2) should be preferred. In addition, we observe an interesting pattern if we apply customer characteristics in an appropriate way: the ratio of the average choice probabilities \(\tilde{x}_{i'A}/\tilde{x}_{i'C}\) depends on the existence of facility location \(B\) (nonconstant substitution pattern). Although the IIA property does apply to each customer \(n\), it does not apply to the population of \(i'\) as a whole. The key point is that there are two distinct segments of the population (high and low income) with different choice probabilities: We compare two different solutions to (2)–(4), namely solution I (all locations are selected) and solution II (location B is not selected). The customer with low income (\(n=2\)) considers location A to be a better substitute to B than C. In contrast, for customer \(n=1\) (high income), locations A and C are more or less equal substitutes to location B. This pattern is due to the different evaluation of travel cost by the two segments (i.e., customers).
There are two lessons learned so far: First, the more customer characteristics are included in \(v_{ij}\) in an appropriate way, the better are the forecast properties of MNL, \(x_{ij}\), respectively. Second, by applying segmentation to our model (2)–(4) as outlined in (2), we are able to reduce the bias of \(x_{ij}\) and \(F\) due to the IIA of (5) to some extent. In applications, one would be interested in how to classify customers, and how many customer segments are appropriate for a given application. Of course, segmentation makes sense only if the deterministic part of utility contains factors that vary over choice makers. Usually, such factors are socioeconomic factors like age, gender, income, occupation, car ownership, and so forth. In empirical studies, socioeconomic factors that are continuous measures (age and income, for example) are usually considered as categorical measures. For example, a proband is asked whether his/her age is (a) below 20 years, (b) between 20 and 40 years, (c) between 40 and 60 years, or (d) older than 60 years. Now consider a deterministic utility function with only two socioeconomic factors: gender and age. Gender, of course, consists of only two categories: female and male. So, we end up with eight customer segments: the four age levels for each of the two genders. Considering many socioeconomic factors with many levels yields a large number of segments. How many segments are appropriate and tractable could not be said in the abstract. It rather depends on the application, in particular, the empirically specified choice model. See BenAkiva and Lerman (1985), pp 131–153 for a detailed discussion of aggregation and segmentation.
A probabilistic choice model with customer segmentation
In Sect. 2, we have demonstrated that the IIA may yield biased values of \(x_{ij}\) of (1) and hence a biased objective function value \(F\) of (2). Moreover, a partition of the population of a demand point \(i \in I\) into homogenous subpopulations (i.e., segmentation) enables us to reduce the bias due to the IIA. In this section, we propose how to explicitly account for segments of customers (heterogeneous customer demand) in a linear mixedinteger model formulation of (2)–(4).
Mathematical formulation
In addition to the definitions of Sect. 2, we consider the set
 \(S_i\) :

segments of the customers located in demand node \(i \in I\); for example high and low income or male and female or a combination of income and gender.
Next, we denote the parameters
 \(\widetilde{h}_{is}\) :

number of customers according to segment \(s \in S_i\) located in node \(i \in I\),
 \(\widetilde{v}_{isj}\) :

as the deterministic utility of customers of segment \(s \in S_i\) located in \(i \in I\) patronizing \(j \in M_i\),
 \(\pi _{isj}\) :

choice probability of customers of segment \(s \in S_i\) at node \(i \in I\) who access service at a facility located at \(j \in J_i\) given that all \(m \in J\) are established, i.e., \(\pi _{isj} = \mathrm{e}^{\widetilde{v}_{isj}}/ \sum _{m \in M_i}\mathrm{e}^{\widetilde{v}_{ism}}\),
 \(\varphi _{isj}\) :

choice probability of customers of segment \(s \in S_i\) at node \(i \in I\) who access service at a facility located at \(j \in J_i\) given that \(j \in J_i\) is the only facility location established, i.e., \(\varphi _{isj} = \mathrm{e}^{\widetilde{v}_{isj}}/ (\mathrm{e}^{\widetilde{v}_{isj}} + \sum _{m \in M_i \setminus J}\mathrm{e}^{\widetilde{v}_{ism}})\), and
 \(\zeta _{is}\) :

cumulative choice probability of customers of segment \(s \in S_i\) at node \(i \in I\) who access service at competing facilities given that all potential facilities \(j \in J\) are located, i.e., \(\zeta _{is} = \sum _{l\in M_i \setminus J} (\mathrm{e}^{\widetilde{v}_{isl}} / \sum _{m \in M_i}\mathrm{e}^{\widetilde{v}_{ism}})\). Therefore,
Finally, we define the nonnegative variables
 \(\widetilde{x}_{isj}\) :

as the MNL choice probability of customers of segment \(s \in S_i\) at node \(i \in I\) who access service at a facility located at \(j \in J_i\), and
 \(z_{is}\) :

as the cumulative choice probability of customers of segment \(s \in S_i\) at node \(i \in I\) who do not access any facility of the considered firm.
Then, our model according to the problem statement of Sect. 2 is
subject to
We denote \(\widetilde{F}\) as the objective function value of (7). Let be given a combination of \(i \in I, \ s \in S_i,\) and \(j \in J_i\). For convenience reasons, we assume for a moment that \(\left M\right =2\) and \(\left J\right =1\) with \(M =\left\{ j, k\right\} \) and \(J=\left\{ j\right\} \), accordingly \(M_i=M\) and \(J_i=J\). Now, if \(y_j=0\), then \(\widetilde{x}_{isj}=0\) because of (9) and further \(z_{is}=1\) because of (8). If \(y_j = 1\), then according to (11), \(\widetilde{x}_{isj} = z_{is}\cdot \pi _{isj}/ \zeta _{is}\), because of (7) and \(z_{is}\cdot \pi _{isj}/ \zeta _{is} \le \varphi _{isj}\). Due to (8) and substitution, we get the correct choice probabilities \(\widetilde{x}_{isj}= \mathrm{e}^{\widetilde{v}_{isj}}/\left( \mathrm{e}^{\widetilde{v}_{isj}}+\mathrm{e}^{\widetilde{v}_{isk}}\right) \) with \(k\) indicating the facility location of the competitor. Of course, these coherences are valid for \(\left M\right >2\) and \(\left J\right >1\) as well. Therefore, constraints (8)–(11) together with (7) yield the MNL choice probabilities. For more details, we refer to Haase (2009) and ArosVera et al. (2013). Using \(\varphi _{isj}\) in (9) and \(\pi _{isj}\) in (10) yields bounds on \(\widetilde{x}_{isj}\) that are tighter than simply using \(0 \le \widetilde{x}_{isj} \le y_j\). In contrast to ArosVera et al. (2013), we do not consider redundant constraints in our model: Using (11) yields \(\left I\right \cdot \left S_i\right \cdot \left J_i\right \) constraints instead of \(\left I\right \cdot \left S_i\right \cdot \left J_i\right ^2\) constraints.
Lower bound and objective cuts
To derive an intelligible lower bound for \(\widetilde{F}\) of (7), we consider the binary variable \(w_{mj}\). Further, we define the nonnegative variable
If we minimize \(Q\) subject to (16) and
the quantity
denotes the maximum attractiveness of facility location \(m \in J\) with \(w_{mj}^*\) indicating that \(j\) belongs to the \(K1\) most attractive facility locations compared to \(m\). If we maximize \(Q\) subject to (16)–(18), then the quantity
denotes the minimum attractiveness of facility location \(m \in J\) with \(w_{mj}^*\) indicating that \(j\) belongs to the \(K1\) least attractive facility locations compared to \(m\). To derive a lower bound, we choose the \(K\)largest \(j \in J\) according to \(b_j\). Denote this set as \(\tilde{J}\). Accordingly, \(\tilde{J}_i = \tilde{J} \cap M_i\). Now compute the lower bound as:
Finally, we add
to our model (7)–(15) to account for a lower bound (LB) (22) and an objective cut OC1 (23). A lower bound for problems with capacities is presented in Haase and Müller (2014b). Now, we might define the quantities
and
Based on Benati and Hansen (2002), we can define a second objective cut OC2 alternatively to (23)
Note, \(\gamma _j\) in (25) is negative for all \(j \in \tilde{J}\) by construction.
We are interested in the impact of the number of segments, the lower bound, the objective cuts, the number of competitors on the solution and the solvability of our approach. The corresponding numerical examples can be found in the Appendix. The major findings of these numerical examples are that (1) segmentation has significant impact on the computational effort, (2) the lower bound (22) provides a quite good solution (it deviates \(<\)1 % from the optimal solution), and (3) the use of the objective cut OC1 (23) is particularly appealing if we do not expect to find an optimal solution within a given time. Further, we solve problem sets with 2 segments, 500 demand points and 10 potential locations to optimality in 1 h computation time. If we consider 50 potential locations, the gap reported by CPLEX is \(<\)8 % in 1 h.
Illustrative case example: furniture store location in Germany
In this section, we apply our model of Sect. 3.1 to a hypothetical—but still realistic— branchextension of a large furniture store company in Germany. Figure 2a shows the already existing facility locations and the potential facility locations of the considered firm, as well as the locations of the main competitors in the market. The firm already runs 46 stores in the year 2012 with a market share of 12.5 % and 46 million customers yielding 3.7 billion Euro revenue. The firm aims to massively expand in the market in the near future. It is intended to establish 5–15 new facilities until 2020. The task is to find out the optimal locations for a given number of new facilities (\(K^+\)) from 50 potential facility locations and the corresponding expected market share of the firm.
We consider the centroids of the 415 German "Kreise” (municipalities) as demand points. The locations of the facilities (existing, potential, and competitors) are given by longitude and latitude coordinates. The euclidean distance in kilometers between a demand point \(i \in I\) and a facility location \(j \in M\) is denoted by \(d_{ij}\). The choice set for each demand node \(i \in I\) is defined by
with \(M\) as the set of all facility locations and \(\delta \) as a threshold distance. If \(\pi _{isj}<0.00001\) then we remove \(j\) from \(M_i\). There exist 101 facility locations of the competitors. Thus, \(\left M\right =197\). Customers do not consider facilities located more distant than \(\delta \) as a conceivable alternative. Since, the main customers of the firm are aged between 15 and 25, we consider two distinct segments of customers: \(\widetilde{h}_{i,s=1}\) as the number of customers aged between 15 and 25 and \(\widetilde{h}_{i,s=2}\) as the number of customers of all other ages. The deterministic part of utility (see Sects. 2, 3.1) is given as
with \(\text {INC}_i\) as the average annual disposable income of the population located in \(i \in I\) in 1,000 Euro. Total population and \(\text {INC}_i\) are given in Fig. 2a. The ratio \(\sum _i \widetilde{h}_{i, s=1} / \sum _i \widetilde{h}_{i, s=2} =0.163\). Coefficients \(\beta ^{\text {inc}}\) and \(\beta ^{\text {dist}}_s\) are the utility contribution per unit of the corresponding attribute (distance and income). Equation 28 denotes the utility for not choosing any of the facility locations of the firm (potential and existing) or the competitors. Roughly speaking, \(j=0\) denotes a dummy facility absorbing all demand not satisfied by the facilities of the firm or the competitor. The dummy facility \(j=0\) comprises the utility of customers either to patronize a small, local furniture store or not to consume furniture anyway. Note that (28) and (29) are rather simplistic specifications of utility to make the application more comprehensible.
In a realworld application, the coefficients \(\beta ^{\text {inc}}, \beta ^{\text {dist}}_s\) of (28) and (29) have to be estimated using empirical choice data (i.e., discrete choice analysis). Large companies can easily afford a comprehensive empirical study to appropriately estimate the coefficients of the utility functions. Here, we cannot obtain such estimates, hence we rely on parameter estimates from other empirical studies. Suarez et al. (2004) provide coefficient estimates for a shopping center choice model. They distinguish between two different segments of customers (target group and others) and estimate coefficients of the distance between the customers location and the shopping center for both customer segments. Here, we employ these coefficient estimates, given as
This indicates that the main customers (\(s=1\), population aged between 15 and 25) are less sensitive to distance than other customers. Goldman (1976) provides empirical evidence on the coherence between income and the propensity of shopping at a specific facility. Based on Fotheringham and Trew (1993), we might consider
Now, we are able to compute the expected patronage for each existing facility using (21) and hence the total expected market share of the firm as
We consider this as the base scenario. Figure 2b displays the result. We know that, on average, a customer is assumed to make five shopping visits a year. This yields 41 million customers over all existing facilities and a total expected market share of 11.12 %. The expected market share is below the reference value of 12.5 %. This is reasonable, because we do not consider online purchases and there might be some inconsistencies close to the border of Germany due to transnational purchases of customers. On average, a customer spends 80 Euro per visit yielding an annual revenue of 3.28 billion Euro. This is close to the reference value of 3.7 billion Euro. We conclude that our demand model makes predictions fairly well.
Since our parameters do not stem from a unique study on furniture store customer behavior in Germany, we first investigate the sensitivity of the solution to parameter variations. The locational decision variables \(y_j\) are fixed to one for the already existing facility locations (i.e., \(j<47\)). We solve our model of Sect. 3 for various parameter settings and for different distance thresholds \(\delta \) of (27). We are interested in MS’s dependence on \(K^+\). We have implemented our model in GAMS 23.7 and we use CPLEX 12.2 on a 64bit Windows Server 2008 with 4 Intel Xeon 2.4 GHz processors and 24 GB RAM for all studies. All problems considered in this section are solved to optimality within minutes. The results of Fig. 3 show a piecewise linear increase of the market share in \(K^+\). The slope is nearly 0.35 indicating that with each additional facility, the total market share of the firm increases by 0.35 % points. Note, the underlying function is not necessarily concave. The sensitivity analysis indicates that the market share is independent from the distance threshold \(\delta > 50\) and the weight of the income \(\beta ^{\text {inc}}\). In contrast, the scale of the market share heavily depends on the distance parameters (\(\beta _s^{\text {dist}}\)). This finding stresses the need for firms to employ the estimates based on unique choice studies (see Street and Burgess 2007; Müller et al. 2008; Louviere et al. 2000 for how to design studies and experiments for discrete choice analysis).
Based on the (linear) relationship between MS and \(K^+\), the firm’s management is enabled to identify a specific number of new facilities to be located. The optimal locations and the expected (annual) patronage of the new facilities can be displayed in maps and enhance the decision making of the firm’s management. Figure 4 exemplifies a market expansion with 5 and 10 new facilities. In a realworld management application, one usually has to account for locally varying locational (and maybe operational) cost. In such a situation, one would be interested in the relationship between cost (or budget) and market share. The firm is further interested in the impact of segmentation of their customers (see Sect. 2.2). Therefore, we consider the following example that extends Example 1.
Example 2
We expect the more the two segments differ, the larger is the predictive bias of the MNL and thus the larger is the bias of the objective function value if segmentation is neglected. Due to the specification of the deterministic part of utility in (29), the difference in choice probabilities between the two segments corresponds to the difference between \(\beta ^{\text {dist}}_{s=1}\) and \(\beta ^{\text {dist}}_{s=2}\).To evaluate the impact of neglected segmentation, we first consider \(\beta ^{\text {dist}}_{s=1} =\beta ^{\text {dist}}_{s=2} = \beta ^{\text {dist}}\) with \(\beta ^{\text {dist}} = (\beta ^{\text {dist}}_{s=1} + \beta ^{\text {dist}}_{s=2})/2\) in (29). This corresponds to a simple average of utilities as described in (1) of Sect. 2.2. The corresponding solution in terms of selected locations is denoted by \(\overline{J} = \left\{ j \in J\left y^*_j=1\right. \right\} \). Based on \(\overline{J}\), we compute the MNL choice probabilities using segmentation, i.e., we use \(\beta ^{\text {dist}}_{s=1}\) and \(\beta ^{\text {dist}}_{s=2}\) instead of \(\beta ^{\text {dist}}\) in (29). The corresponding objective function value is denoted as \(\overline{F}\) and the corresponding market share is given by MS(\(\overline{F}\)).
We consider \(\beta ^{\text {dist}}_{s} \in \left\{ \right. 1, 0.1, 0.01, 0.001, 0.0001\left. \right\} \), \(\beta ^{\text {inc}}=0.015\), and \(\delta = 150\). Further, we consider two scenarios: \(K^+ = 5\) and \(K^+ = 10\). The results are given in Fig. 5. The patterns for the total deviation \(\overline{F} \widetilde{F}\), relative deviation \(100 \times (\overline{F} \widetilde{F})/\widetilde{F}\), and the deviation of the market shares \(\text {MS}(\overline{F})\text {MS}(\widetilde{F})\) are similar. The most eyecatching bias occurs if \(\beta ^{\text {dist}}_{s=1} = 1\). Consider exemplarily \(\beta ^{\text {dist}}_{s=1} = 1\) and \(\beta ^{\text {dist}}_{s=2} = 0.1\), i.e., segment \(s=1\) evaluates each additional kilometer ten times as negative as segment \(s=2\) (i.e., \(\beta ^{\text {dist}}_{s=1}/\beta ^{\text {dist}}_{s=2} = 10\)). In case that segmentation is neglected, the corresponding distancecoefficient is \(\beta ^{\text {dist}} =0.55\). As a consequence, a large part of customers (recall that, \(\sum _i \widetilde{h}_{i, s=1} / \sum _i \widetilde{h}_{i, s=2} =0.163\)) evaluates distance more than five times as negative as this would be the case with segmentation. Of course, the corresponding deviation is remarkable (\(\)8.9 % for \(K^+=5\) and \(\)12.5 % for \(K^+=10\)). The asymmetric pattern in Fig. 5 is due to the uneven distribution of population over the two segments (the population of segment 2 is larger than the population of segment 1): the more the true coefficient of the large part of the population (segment 2) deviates from the average coefficient the larger is the expected predictive error. In contrast, a large deviation of the true coefficient of segment 1 has impact only on a small part of the population and the corresponding expected predictive error is comparably small. Obviously, the extent of the error heavily depends on the scale of the coefficients. Consider, for example, \(\beta ^{\text {dist}}_{s=1} = 1\) and \(\beta ^{\text {dist}}_{s=2} = 0.1\). The corresponding ratio is 10 and the expected error for \(K^+ = 5\) is \(\)8.88 %. Now, for \(\beta ^{\text {dist}}_{s=1} = 0.1\) and \(\beta ^{\text {dist}}_{s=2} = 0.01\) the corresponding ratio is 10 again. However, the corresponding error is only \(\)0.18 %. This pattern is due to the nonlinear relationship between distance (deterministic utility) and the choice probabilities (i.e., a sshaped probability function). As the coefficients (weighting of travel distance) get larger (i.e., approaching 0) the probabilities of choosing to patronize a facility approach the largest possible value. For these values of the deterministic utility the difference in the corresponding choice probabilities between the two segments become small.
The bias found in our study is comparable to those reported in studies on spatial aggregation (Andersson et al. 1998; Daskin et al. 1989; Current and Schilling 1987; Murray and Gottsegen 1997). In literature, ratios of segmentspecific coefficients larger than 50 are reported (Müller et al. 2012; Koppelman and Bhat 2006, pp 133–134). However, the difference between segmentspecific distancecoefficients used in our application is small. We have considered parameter settings that yield a ratio \(\beta ^{\text {dist}}_{s=1}/\beta ^{\text {dist}}_{s=2} = 0.91\) (see Fig. 3). As a consequence, the expected bias is below 1 % if we neglect segmentation in our application. Nevertheless, the consideration of segments yields valuable insights, because the utility function (29) and the corresponding coefficients are arbitrarily chosen. As stated before, for a real application, the company is expected to specify utility functions and estimate the corresponding coefficients on unique choice data. The firm may use such a numerical study to make assumptions about worstcase scenarios.
Summary
By an intelligible example, we demonstrate that the independence from IIA of the MNL may yield false predictions. This finding is well founded on empirical studies. When the MNL is used in a mathematical program to incorporate customer choice behavior, the model outcomes are very likely to be biased as well. Although the MNL is founded on individual choice behavior, in facility location planning we are interested in the share of customers of a demand point patronizing a certain facility. If we assume the customers of a demand point are homogenous, i.e., they exhibit the same observable characteristics, then there is no need for segmentation. If we assume the customers to be heterogeneous then segmentation of the customers according to their characteristics (income and age, for example) should be employed. By proper segmentation, we are able to reduce the predictive bias of the MNL in terms of market shares.
In this contribution, we present a model formulation for the maximum capture problem that explicitly allows for customer segmentation using the MNL to find optimal shopping facility locations. Moreover, we propose an intelligible approach to derive a lower bound for our model. Extensive computational studies show the impact of proper segmentation as well as the efficiency of our approach: using aggregate customer characteristics instead of proper segmentation may yield a predictive bias of the objective function value of more than 15 % deviation from the optimal objective function value. Our lower bound is found in \(<\)1 s and deviates \(<\)1 % from the optimal solution. Problems with 2 segments, 50 potential locations and 500 demand points can be solved to a gap \(<\)8 % within 1 h using GAMS/CPLEX. Based on our numerical studies concerning the quality of the lower bound, it is reasonable to assume that the true gap is remarkably smaller than 8 %. We apply our approach in an illustrative case example of a globally operating furniture store company that intends to increase its market share in Germany by branch expansion. This problem can be solved to optimality within few minutes. Our example shows how the novel approach can be used for management decision support.
Based on our findings, several possible directions of future research appear. It is of interest to find analytically bounds on the bias of the objective function value due to missing segmentation under various segmentation patterns and specifications of utility. Further, the explicit consideration of substitution patterns, i.e., correlation between facility locations, is a very important issue to be analyzed. Efficient solution methods are necessary to account for larger problem sets. Finally, our approach is useful to other areas of operations research; assortment optimization, for example Kök and Fisher (2007).
References
Andersson, G., R.L. Francis, T. Normark, and B.M. Rayco. 1998. Aggregation method experimentation for largescale network location problems. Location Science 6(4): 25–39.
ArosVera, F., V. Marianov, and J.E. Mitchell. 2013. pHub approach for the optimal parkandride facility location problem. European Journal of Operational Research 226(2): 277–285.
BenAkiva, M., and S. Lerman. 1985. Discrete choice analysis, theory and applications to travel demand. Cambridge: MIT Press.
Benati, S. 1999. The maximum capture problem with heterogeneous customers. Computers and Operations Research 26(14): 1351–1367.
Benati, S., and P. Hansen. 2002. The maximum capture problem with random utilities: problem formulation and algorithms. European Journal of Operational Research 143(3): 518–530.
Bhat, C.R., and J. Guo. 2004. A mixed spatially correlated logit model: formulation and application to residential choice modeling. Transportation Research Part B Methodological 38(2): 147–168.
Casado, E., and J.C. Ferrer. 2013. Consumer price sensitivity in the retail industry: latitude of acceptance with heterogeneous demand. European Journal of Operational Research 228(2): 418–426.
Current, J.R., and D.A. Schilling. 1987. Elimination of source a and b errors in pmedian location problems. Geographical Analysis 19(2): 95–110.
Daskin, M., A. Haghani, M. Khanal, and C. Malandraki. 1989. Aggregation effects in maximum covering models. Annals of Operations Research 18(1): 113–139.
Dellaert, B.G., T.A. Arentze, M. Bierlaire, A.W. Borgers, and H.J. Timmermans. 1998. Investigating consumers’ tendency to combine multiple shopping purposes and destinations. Journal of Marketing Research 35(2): 177–188.
Fotheringham, S., and R. Trew. 1993. Chain image and storechoice modeling: the effects of income and race. Environment and Planning A 25(2): 179–196.
Goldman, A. 1976. Do lowerincome consumers have a more restricted shopping scope? Journal of Marketing 40(1): 46–54.
Haase, K. 2009. Discrete location planning. Tech. Rep. WP0907. Institute for Transport and Logistics Studies, University of Sydney.
Haase, K., and S. Müller. 2013. Management of school locations allowing for free school choice. Omega 41(5): 847–855.
Haase, K., and S. Müller. 2014a. A comparison of linear reformulations for multinomial logit choice probabilities in facility location models. European Journal of Operational Research 232(3): 689–691.
Haase, K., and S. Müller. 2014b. Insights into clients’ choice on preventive health care facility location planning. OR Spectrum. doi10.1007/s0029101403676
Haynes, K., and S. Fotheringham. 1990. The impact of space on the application of discrete choice models. The Review of Regional Studies 20(1): 39–49.
Hunt, L., B. Boots, and P. Kanaroglou. 2004. Spatial choice modelling: new opportunities to incorporate space into substitution patterns. Progress in Human Geography 28(6): 746–766.
Kök, G., and M.L. Fisher. 2007. Demand estimation and assortment optimization under substitution: methodology and application. Operations Research 55(6): 1001–1021.
Koppelman, F.S., and C. Bhat. 2006. A self instructing course in mode choice modeling: multinomial and nested logit models. Prepared for US Department of Transportation Federal Transit Administration.
Louviere, J., D. Hensher, and J. Swait. 2000. Stated choice methods : analysis and applications. Cambridge: Cambridge University Press.
Marianov, V., M. Rfos, and M.J. Icaza. 2008. Facility location for market capture when users rank facilities by shorter travel and waiting times. European Journal of Operational Research 191(1): 30–42.
McFadden, D. 1973. Conditional logit analysis of qualtitative choice behaviour. In Frontiers of econometrics, ed. P. Zarembka, 105–142. New York: Academic Press.
McFadden, D. 2001. Economic choices. American Economic Review 91(3): 351–378.
Müller, S., S. Tscharaktschiew, and K. Haase. 2008. Traveltoschool mode choice modelling and patterns of school choice in urban areas. Journal of Transport Geography 16(5): 342–357.
Müller, S., K. Haase, and S. Kless. 2009. A multiperiod school location planning approach with free school choice. Environment and Planning A 41(12): 2929–2945.
Müller, S., K. Haase, and F. Seidel. 2012. Exposing unobserved spatial similarity: evidence from German school choice data. Geographical Analysis 44(1): 65–86.
Murray, A.T., and J.M. Gottsegen. 1997. The influence of data aggregation on the stability of pmedian location model solutions. Geographical Analysis 29(3): 200–213.
de Palma, A., V. Ginsburgh, M. Labbe, and J.F. Thisse. 1989. Competitive location with random utilities. Transportation Science 23(4): 244–252.
Ray, P. 1973. Independence of irrelevant alternatives. Econometrica 41(5): 987–991.
Sener, I., R. Pendyala, and C. Bhat. 2011. Accommodating spatial correlation across choice alternatives in discrete choice models: an application to modeling residential location choice behavior. Journal of Transport Geography 19(2): 294–303.
Serra, D., and R. Colome. 2001. Consumer choice and optimal locations models: formulations and heuristics. Papers in Regional Science 80(4): 439–464.
Sheppard, E. 1978. Theoretical underpinnings of the gravity hypothesis. Geographical Analysis 10(4): 386–402.
Street, D., and L. Burgess. 2007. The construction of optimal stated choice experiments. New York: Wiley.
Suarez, A., I.R. del Bosque, J.M. RodriguezPoo, and I. Moral. 2004. Accounting for heterogeneity in shopping centre choice models. Journal of Retailing and Consumer Services 11(2): 119–129.
Swait, J. 2001. Choice set generation within the generalized extreme value family of discrete choice models. Transportation Research Part B Methodological 35(7): 643–666.
Timmermans, H., A. Borgers, and P. van der Waerden. 1992. Mother logit analysis of substitution effects in consumer shopping destination choice. Journal of Business Research 23(2): 311–323.
Train, E. Kenneth. 2009. Discrete choice methods with simulation. Cambridge: Cambridge University Press.
Zhang, Y., O. Berman, and V. Verter. 2012. The impact of client choice on preventive healthcare facility network design. OR Spectrum 34(2): 349–370.
Acknowledgments
The very helpful comments and suggestions of three anonymous reviewers are gratefully acknowledged. They made significant contributions in order to improve the paper. We further thank the editors for helpful suggestions. Finally, we thank Sonja Bröning for copy editing. However, the responsibility for any remaining error is with the authors.
Author information
Affiliations
Corresponding author
Additional information
Responsible editor: Karl Inderfurth (Operations and Information Systems).
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix
Appendix
In this section, we provide numerical examples to validate and test the mathematical formulation of Sect. 3. We assume \(M_i = M\), \(S_i = S\) and \(J_i = J \ \forall \ i \in I\). For given \(I\), \(M\), and \(J\), we generate longitude and latitude coordinates using a random uniform distribution in the interval \(\left[ 0,100\right] \). We set the maximum computational time to 1 h if not stated otherwise. Further, we assume that demand is completely satisfied, i.e., a nochoice alternative does not exist. To generate the demand \(\widetilde{h}_{is}\), we first generate a population \(\text {Pop}_i\) for each demand node \(i \in I\) using a random uniform distribution in the interval \(\left[ 0,10\right] \) weighted by the ratio \(\left M\right /\left I\right \). Further, we generate weights \(\omega _{is} \ \forall \ i \in I, s\in S\) using a random uniform distribution in the interval \(\left[ 0,1\right] \). Then, \(\widetilde{h}_{is} = \text {Pop}_i \cdot \omega _{is}/\sum _{s' \in S} \omega _{is'}\). Let the utility function be
with \(t_{ij}\) as the traveltime between \(i \in I\) and \(j \in J\); computed as the rectangular distance between \(i \in I\) and \(j \in J\) divided by 60. All other parameters of Sect. 3 can be easily derived. In the following, we consider several numerical examples to test our mathematical formulation.
Example 3
In this study, we are interested in the additional burden due to the number of segments. We set \(\left I\right =50\), \(\left J\right =20\), \(K = \left\{ 5, 10, 15\right\} \), \(\left S\right =\left\{ 1,2, \ldots , 5\right\} \), \(\left M \setminus J\right =K\), and \(\beta _s\) from a uniform distribution in the interval \(\left[ 2, 0.5\right] \). Note, in applications the number of segments will be small due to data availability. See BenAkiva and Lerman (1985), pp 148–150 for an illustrative case study. For each problem set \(K\) and \(S\), we compute ten randomly generated instances. Figure 6 displays the results. We observe that the computational effort increases with the number of segments. Seemingly, it depends on the ratio \(K/J\) how fast the computational effort increases in \(S\). If only a few locations have to be selected (\(K=5\)) or many locations have to be selected (\(K=15\)), the computational effort is small compared to the situation where 50 % of the potential locations have to be selected (\(K=10\)).
Example 4
In this example, we investigate the impact of the number of competing facility locations \(\left M \setminus J\right \) and the number of facilities to be located \(K\). We consider \(\left I\right =100\), \(\left J\right =20\), \(S=2\), \(\beta _{s=1} = 1\), and \(\beta _{s=2} = 0.5\) for nine different problem sets with ten instances each. The results are given in Table 3. The market share of the considered firm declines in the number of competing facilities. The smaller \(K\) the more the decline of the market share in the number of competitors (nearly 50 % decline for \(K=5\) compared to somewhat more than 30 % for \(K=15\)). If the number of established facilities and the number of competing facilities are equal, then market shares are nearly the same (especially, if many facilities are established). This study confirms the findings of Example 3 concerning the ratio \(K/J\) and the corresponding computational effort. Further, the study shows an interesting pattern: there seems to be a positive relationship between the market share and the computational effort (the larger the market share the more CPU time is needed).
Example 5
Now we are interested in the efficiency of the lower bound described in Sect. 3.2. We consider four problem sets with \(\left J\right =\left\{ 20,30\right\} \) and \(K=\left\{ 5,10\right\} \). For each problem set, 10 randomly instances are generated. Further, we set \(\left I\right =50\), \(\left M \setminus J\right = K\), \(S=2\), \(\beta _{s=1} = 1\), and \(\beta _{s=2} = 0.5\). For each instance, we solve our model with and without the lower bound (22) and with and without the OC1 (23) and OC2 (26). Table 4 displays the results. For all instances, CPLEX found the optimal solution within 1 h computational time. However, for larger problem sets (\(K>5\)), we are able to prove optimality within 1 h only if we use the lower bound LB. We are able to decrease the computational effort remarkably (at least 20 times faster) using LB. The lower bound is found in \(<\)1 s and LB deviates \(<\)1 % from the optimal solution. In a small numerical example Benati and Hansen (2002) show that they find the optimal solution to their problem by variable neighborhood search in \(<\)1 s for problem sets up to \(\left J\right =50\) and \(K<10\). Concerning the objective cuts OC1 (23) and OC2 (26), we observe a benefit only for small problem sets (\(\left J\right =20\), \(K=5\)). Unfortunately, for larger problem sets, the computational effort increases (up to 2.5 times slower). Possibly, this is due to a degeneration of the LP relaxation using the objective cuts. This finding is confirmed by the results of Benati and Hansen (2002). They report that their upper bound based on submodular maximization—which is comparable to our objective cuts—performs not as good as the bound provided by concave relaxation. In our study, we find no remarkable difference in performance between OC1 and OC2.
Example 6
The objective of this numerical example is to figure out up to what problem size we are able to solve our problem to (or close to) optimality. We consider \(\left I\right \in \left\{ 100, 250, 500\right\} \), \(\left J\right \in \left\{ 10, 25, 50\right\} \), \(K=\left\lceil \left J\right /2\right\rceil \), \(\left M \setminus J\right =\left\lceil 2/3 \cdot \left J\right \right\rceil \), \(\beta _{s=1} = 1\), and \(\beta _{s=2} = 0.5\). For each of the nine problem sets, we solve ten instances. The results are given in Table 5. Smallsized problem sets (\(\left J\right =10\)) can be easily solved to optimality. Mediumsized problem sets (\(\left J\right =25\)) can be solved up to a gap of \(<\)6 % in 1 h. For large problem sets (\(\left J\right =50\)), the gap becomes disappointing if we only use the lower bound (22). In contrast, if we use the lower bound (22) and the OC1 (23), we are able to reduce the gap to somewhat more than 7 % within 1 h. Taking into account the good quality of the lower bound (see Example 5) and the observation that most of the time is needed to prove optimality, we may assume that the “true” gap is even smaller. Note, Benati and Hansen (2002) made the same observation.
Rights and permissions
This article is published under license to BioMed Central Ltd. Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
About this article
Cite this article
Müller, S., Haase, K. Customer segmentation in retail facility location planning. Bus Res 7, 235–261 (2014). https://doi.org/10.1007/s4068501400086
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s4068501400086
Keywords
 Multinomial logit model
 Facility location
 Maximum capture problem
 Customer choice
 Heterogeneous demand
 Substitution patterns
JEL Classification
 C61
 C35
 C44
 R32