1 Introduction

In this paper, we consider a situation where companies (retail store chains, for example) compete for their market share. Suppose for example that a firm wants to locate new shops in a geographical market. The decision variable under control is only where to locate the new facilities. The way customers make their choices is to be taken into account, too (Serra and Colome 2001). The reaction of possible competitors (price, locations) is not considered here.

We discuss a model—based on the maximum capture problem—for the optimal location of \(K\) facilities. Customers’ choices are modeled according to a specific discrete choice model, namely the multinomial logit model (MNL). Other demand models (the Huff-model, for example) might be used instead of the MNL. Our approach is valid for such kind of models as well. However, we do not consider this here. In general, discrete choice models are the workhorse for the analysis of individual choice behavior (McFadden 1973, 2001). In literature, we find several applications of discrete choice models for spatial choice situations (Timmermans et al. 1992; Dellaert et al. 1998). Inspite of their long-term and widespread use, we find only few references in the operations research literature on facility location that account for discrete choice models. One reason may be the mathematical sophistication of the choice models. For example, de Palma et al. (1989), Benati (1999) and Marianov et al. (2008) discuss non-linear model formulations for discrete locational decisions. To the best of our knowledge, Benati and Hansen (2002) are the first who proposed a linear reformulation of the non-linear MNL. Their approach results in a hyperbolic sum integer problem. Haase (2009) uses constant substitution patterns of the MNL to find a linear integer reformulation. Aros-Vera et al. (2013) apply this approach to the planning of park-and-ride facilities. Finally, Zhang et al. (2012) propose an alternative approach similar to Benati and Hansen (2002). Haase and Müller (2014a) show that a variant of the model of Haase (2009) seems to be superior to the formulations of Benati and Hansen (2002) and Zhang et al. (2012).

The MNL exhibits the well-known independence from irrelevant alternatives property (IIA). Roughly speaking, this property implies that each choice alternative (facility location) is an equal substitute to every other alternative. Unfortunately, it is empirically evidenced that this core property is unlikely to hold in spatial choice context (Bhat and Guo 2004; Hunt et al. 2004). The linear reformulations of the MNL already introduced in the literature are all based on the assumption that customers of a given demand point are homogenous in their observable characteristics (age and income, for example). In this contribution, we show that, if customers of a given demand point are portioned into homogenous subgroups according to their characteristics, the predictive bias due to the IIA might be reduced (Sect. 2). Of course, simply considering average characteristics are not sufficient as the following illustrative example shows (see Fig. 1).

Consider a country with only two regions (1 and 2) and a firm selling rice seeds to farmers. Farmers are assumed to bunker seeds at a facility of the firm. There are two potential facility locations A and B (there are no competitors). Region 1 contains location A and region 2 contains location B. Farmers located in region 1 buy rice seeds only in A, while those of region 2 buy only in B. Region 1 contains 49 farmers and region 2 contains 50 farmers. Now assume that the climate in region 1 is hot and humid, while the climate of region 2 is arid (both regions might be separated by mountains). Since we expect that all of the farmers of region 1 buy rice seeds, but none of the farmers of region 2 would do so, we end up with a choice probability of buying rice seeds of 0.495 if we consider the population average. Now, assume the task of the firm is to select the facility location that maximizes the expected rice seed customers. Of course, we would select location B (in region 2), because 0.495 × 50 > 0.495 × 49. However, the true sales are 0, because none of the farmers located in region 2 buys rice seeds, while the farmers of region 1 would only patronize a facility located in A. If the firm considers segment-specific choice probabilities instead (1 for farmers of region 1 and 0 for farmers of region 2), the optimal solution would be facility location A with an expected number of 49 rice seed customers. As a result, the expected bias, i.e., the relative deviation between the two solutions, is 100 %. Now, we learn from this example that simply considering average customer characteristics (instead of proper segmentation) may yield remarkably biased predictive outcomes. In other words, if customer characteristics are considered, it is advisable to employ segmentation instead of the averages of customer characteristics.

Fig. 1
figure 1

The rice seed example

In this paper, we present an elucidating model formulation to account for customer segmentation within a mixed-integer program that enables to consider customer choice behavior by an MNL that accounts for customer characteristics (Sect. 3.1). Moreover, we present a simple lower bound and objective cuts for our problem (Sect. 3.2). We demonstrate the usefulness of our approach in extensive numerical studies (Appendix). Finally, we present an illustrative case example to show how our approach might be applied to support decision making for the management of a globally operating furniture store retail chain (Sect. 4).

2 A probabilistic choice model

Let us consider the following problem statement:

Find\(K\)facility locations from all potential locations\(J\)such that the total patronage for the\(K\)facilities is maximized.

First, we define the sets

\(I\) :

demand nodes representing zones, like census blocks etc., that contain the customers,

\(M_i\) :

locations (existing and potential ones) from which the customers located in \(i \in I\) choose exactly one location. \(M_i\) may include a no-choice-alternative, indicating that customers might not occupy any facility. Hence, the no-choice alternative (a dummy facility, for example) reflects the proportion of customers who do not consume (services or products) at any facility. We might consider a special case such that \(M_i = M \ \forall \ i \in I\).

\(J\) :

potential locations for the facilities a decision maker (a firm, for example) has to decide on: \(J \subseteq \bigcup _{i \in I} M_i\). Note \(M_i \setminus J\) may include facility locations of competitors and/or the no-choice-alternative. That is, \(\left\{ M_i \setminus J \right\} \) comprises locations that are not influenceable by the decision maker. Further, \(J_i = M_i \cap J\).

\(R_i\) :

is a set of choice alternatives faced by the customers of \(i \in I\) that denotes the number, type, and/or the amount of purchases conducted by the customers. Hence, the choice set faced by customers located in \(i \in I\) is \(\left\{ M_i \times R_i\right\} \). Consider exemplarily a customer located in a given demand node \(i=1\) who chooses to make a purchase of €10, €20, or €30 at any opened facility within a given time period. So \(R_1 = \left\{ 10, 20, 30\right\} \). Let us further assume there are only two facilities, i.e., \(M_1 = \left\{ A, B\right\} \), then the choice set is \(\left\{ (A,10),\ldots , (B,10),\ldots ,(B,30)\right\} \). A choice of \((A, 20)\) means that the customer chooses to make a purchase of €20 at facility \(A\). Note, the choice set must be exhaustive and the choice alternatives have to be mutually exclusive. Roughly speaking, all alternatives the customers actually face have to be included in the choice set. The generation of \(\left\{ M_i\times R_i\right\} \) is a sophisticated issue. We refer to Swait (2001) for further details.

We consider the parameters

\(h_i\) :

number of customers located in node \(i \in I\), and

\(v_{ijr}\) :

as the deterministic utility of customers located in \(i \in I\) patronizing \(j \in M_i\) making a purchase denoted by \(r \in R_i\). This could be a measure of generalized cost etc.

\(K\) :

number of facilities to be located, with \(0 < K < \left| J\right| \).

Further, we define the binary decision variable

\(y_j\) :

= 1, if location \(j \in J\) provides a facility (0, otherwise), and

the non-negative variable

\(x_{ijr}\) :

as the choice probability of customers of node \(i \in I\) who makes a purchase denoted by \(r \in R_i\) at a facility located at \(j \in J_i\). If we assume that the choice probability is given by the MNL, \(x_{ijr}\) is defined as

$$\begin{aligned} x_{ijr} = \frac{\mathrm{e}^{v_{ijr}}y_j}{\sum _{o \in R_i}\left( \sum _{m \in M_i \setminus J}\mathrm{e}^{v_{imo}} + \sum _{m\in J_i} \mathrm{e}^{v_{imo}}y_m\right) } \quad \forall \ i \in I, j \in J_i, r \in R_i. \end{aligned}$$
(1)

Note, if \(M_i \setminus J \ne \emptyset \), then \(\sum _{j \in J_i} \sum _{r \in R_i} x_{ijr} < 1\) for all \(i \in I\). Now the problem can be modeled as a mixed-integer non-linear program:

$$\begin{aligned} \text {Maximize} \ \sum _{i \in I} \sum _{r \in R_i} \sum _{j \in J_i} f(i, r, j)x_{ijr} \end{aligned}$$
(2)

subject to (1) and

$$\begin{aligned} \sum _{j \in J} y_j= \ K&\end{aligned}$$
(3)
$$\begin{aligned} y_{j}\in \ \left\{ 0,1\right\} \quad \forall \ j \in J. \end{aligned}$$
(4)

Demand is determined by \(f(i, j, r)x_{ijr}\) with \(f(i, j, r)\) as a function denoting the consumption. We denote \(F\) as the objective function value of (2). In literature, we find exact linear reformulations of (1) such that (2)–(4) can be modeled as a mixed-integer program: Haase (2009) and Aros-Vera et al. (2013) employ specific properties of the MNL, while Zhang et al. (2012) propose an approach based on variable substitution similar to Benati and Hansen (2002). In Sect. 3, we present a modified reformulation of Haase (2009). At first, we focus on important properties of (1) in the following subsequent sections.

We assume in the following that \(\left| R_i\right| = 1 \ \forall \ i \in I\) simplifying \(v_{ijr}\), (1), and (2) for convenience reasons. Of course, all formulations of the subsequent sections are valid for \(\left| R_i\right| > 1 \ \forall \ i \in I\) as well.

2.1 The independence from irrelevant alternatives property

The IIA property is well known in discrete (locational) choice literature (Ray 1973; Sheppard 1978; McFadden 2001; Sener et al. 2011). One outcome of the IIA is that the ratio of choice probabilities of two alternatives (i.e., facility locations) remains constant no matter whether other alternatives are available or not (constant substitution pattern). That is, the probability of patronizing a facility located in \(j\) relative to a facility located in \(m\) is independent of the existence and attributes of any other facility. Consider two arbitrary but existing facility locations \(j, m \in M_i\) to be given. Then, according to (1), the ratio of the choice probabilities \(x_{ij}\) and \(x_{im}\) is

$$\begin{aligned} \frac{x_{ij}}{x_{im}} = \frac{\mathrm{e}^{v_{ij}}}{\mathrm{e}^{v_{im}}} = \mathrm{e}^{v_{ij} - v_{im}} \quad \forall \ i \in I. \end{aligned}$$
(5)

The IIA property of (5) implies that a new facility or change in the attractiveness of an existing facility other than \(m\) or \(j\) will draw patronage from competing facilities in direct proportion to their choice probabilities. In contrast, in applications, it is extremely unlikely that this property holds (Haynes and Fotheringham 1990; Müller et al. 2012; Hunt et al. 2004). In situations when the IIA property is not valid we should consider discrete choice models other than MNL (mixed logit or nested logit, for example). See Train (2009) for further reading. Müller et al. (2009), Haase (2009) and Haase and Müller (2013) propose approximate approaches that are able to incorporate a large class of discrete choice models into mathematical programs.

2.2 Aggregation issues

The MNL and hence (1) is based on the theory of utility maximization behavior of individuals. That is, each individual chooses the location that maximizes its utility. Given our problem statement of Sect. 2 and the corresponding model (2)–(4), we are interested in aggregate measures (market shares, total patronage etc.) instead of individual choice probabilities. Data on customer demand are usually given as an aggregate measure (number of customers, for example). Now, the question arises how we should compute the choice probability of all customers (individuals) located in a given demand point \(i \in I\)? The answer depends on the specification of the utility \(v_{ij}\). If \(v_{ij}\) does not contain characteristics of the customers (age, income, and so forth) then the choice probability \(x_{ij}\) applies to all customers in \(i \in I\) in the same way and thus, (2) is a proper formulation. In contrast, the incorporation of customer characteristics in \(v_{ij}\) will improve the accurateness of \(x_{ij}\) (Koppelman and Bhat 2006, pp 21–23 and pp 41–46). However, aggregation is more tedious in such a case.

Example 1

For simplicity reasons, we consider only one demand node \(i = i'\). Consider \(J=M_{i'}=\left\{ A,B,C\right\} \). Further, we assume \(i'\) contains two customers \(n\in \left\{ 1,2\right\} \). Let the deterministic utility function for customer \(n\) be given as

$$\begin{aligned} v_{nj} = -g_{i'j} / q_n\quad \forall \ j \in J, \end{aligned}$$
(6)

with \(g_{i'j}\) as the cost for a trip from \(i'\) to \(j\) and \(q_n\) is the income of customer \(n\). The higher the income the lesser the impact of travel cost (Casado and Ferrer 2013). Now, there are basically two ways of computing \(x_{i'j}\):

  1. 1.

    we use the average income of \(n=1\) and \(n=2\) (i.e., the average income of demand node \(i'\)) denoted by \(\overline{q}_{i'} = (q_1 + q_2)/2\) to compute \(\overline{v}_{i'j}\) and thus \(\overline{x}_{i'j}\), or

  2. 2.

    we first compute the choice probabilities for each customer \(x_{nj}\) and then we determine the average choice probability of customers located in \(i'\) as \(\tilde{x}_{i'j} = (x_{n=1,j} + x_{n=2,j})/2\).

In general, (1) is expected to be inaccurate compared to (2) because of the non-linear relationship between \(x_{i'j}\) and \(v_{i'j}\) in (1). Consider the values given in Table 1. As expected, \(\overline{x}_{i'j}\) determined by (1) and \(\tilde{x}_{i'j}\) determined by (2) are different. As shown by Train (2009), pp 29–32 (2) should be preferred. In addition, we observe an interesting pattern if we apply customer characteristics in an appropriate way: the ratio of the average choice probabilities \(\tilde{x}_{i'A}/\tilde{x}_{i'C}\) depends on the existence of facility location \(B\) (non-constant substitution pattern). Although the IIA property does apply to each customer \(n\), it does not apply to the population of \(i'\) as a whole. The key point is that there are two distinct segments of the population (high and low income) with different choice probabilities: We compare two different solutions to (2)–(4), namely solution I (all locations are selected) and solution II (location B is not selected). The customer with low income (\(n=2\)) considers location A to be a better substitute to B than C. In contrast, for customer \(n=1\) (high income), locations A and C are more or less equal substitutes to location B. This pattern is due to the different evaluation of travel cost by the two segments (i.e., customers).

Table 1 Aggregation, choice probabilities and the IIA property

There are two lessons learned so far: First, the more customer characteristics are included in \(v_{ij}\) in an appropriate way, the better are the forecast properties of MNL, \(x_{ij}\), respectively. Second, by applying segmentation to our model (2)–(4) as outlined in (2), we are able to reduce the bias of \(x_{ij}\) and \(F\) due to the IIA of (5) to some extent. In applications, one would be interested in how to classify customers, and how many customer segments are appropriate for a given application. Of course, segmentation makes sense only if the deterministic part of utility contains factors that vary over choice makers. Usually, such factors are socio-economic factors like age, gender, income, occupation, car ownership, and so forth. In empirical studies, socio-economic factors that are continuous measures (age and income, for example) are usually considered as categorical measures. For example, a proband is asked whether his/her age is (a) below 20 years, (b) between 20 and 40 years, (c) between 40 and 60 years, or (d) older than 60 years. Now consider a deterministic utility function with only two socio-economic factors: gender and age. Gender, of course, consists of only two categories: female and male. So, we end up with eight customer segments: the four age levels for each of the two genders. Considering many socio-economic factors with many levels yields a large number of segments. How many segments are appropriate and tractable could not be said in the abstract. It rather depends on the application, in particular, the empirically specified choice model. See Ben-Akiva and Lerman (1985), pp 131–153 for a detailed discussion of aggregation and segmentation.

3 A probabilistic choice model with customer segmentation

In Sect. 2, we have demonstrated that the IIA may yield biased values of \(x_{ij}\) of (1) and hence a biased objective function value \(F\) of (2). Moreover, a partition of the population of a demand point \(i \in I\) into homogenous sub-populations (i.e., segmentation) enables us to reduce the bias due to the IIA. In this section, we propose how to explicitly account for segments of customers (heterogeneous customer demand) in a linear mixed-integer model formulation of (2)–(4).

3.1 Mathematical formulation

In addition to the definitions of Sect. 2, we consider the set

\(S_i\) :

segments of the customers located in demand node \(i \in I\); for example high and low income or male and female or a combination of income and gender.

Next, we denote the parameters

\(\widetilde{h}_{is}\) :

number of customers according to segment \(s \in S_i\) located in node \(i \in I\),

\(\widetilde{v}_{isj}\) :

as the deterministic utility of customers of segment \(s \in S_i\) located in \(i \in I\) patronizing \(j \in M_i\),

\(\pi _{isj}\) :

choice probability of customers of segment \(s \in S_i\) at node \(i \in I\) who access service at a facility located at \(j \in J_i\) given that all \(m \in J\) are established, i.e., \(\pi _{isj} = \mathrm{e}^{\widetilde{v}_{isj}}/ \sum _{m \in M_i}\mathrm{e}^{\widetilde{v}_{ism}}\),

\(\varphi _{isj}\) :

choice probability of customers of segment \(s \in S_i\) at node \(i \in I\) who access service at a facility located at \(j \in J_i\) given that \(j \in J_i\) is the only facility location established, i.e., \(\varphi _{isj} = \mathrm{e}^{\widetilde{v}_{isj}}/ (\mathrm{e}^{\widetilde{v}_{isj}} + \sum _{m \in M_i \setminus J}\mathrm{e}^{\widetilde{v}_{ism}})\), and

\(\zeta _{is}\) :

cumulative choice probability of customers of segment \(s \in S_i\) at node \(i \in I\) who access service at competing facilities given that all potential facilities \(j \in J\) are located, i.e., \(\zeta _{is} = \sum _{l\in M_i \setminus J} (\mathrm{e}^{\widetilde{v}_{isl}} / \sum _{m \in M_i}\mathrm{e}^{\widetilde{v}_{ism}})\). Therefore,

$$\begin{aligned} \zeta _{is} + \sum _{j \in J_i} \pi _{isj} = 1 \quad \forall \ i \in I, s \in S_i. \end{aligned}$$

Finally, we define the non-negative variables

\(\widetilde{x}_{isj}\) :

as the MNL choice probability of customers of segment \(s \in S_i\) at node \(i \in I\) who access service at a facility located at \(j \in J_i\), and

\(z_{is}\) :

as the cumulative choice probability of customers of segment \(s \in S_i\) at node \(i \in I\) who do not access any facility of the considered firm.

Then, our model according to the problem statement of Sect. 2 is

$$\begin{aligned} \text {maximize} \ \sum _{i \in I} \sum _{s \in S_i} \widetilde{h}_{is} \sum _{j \in J_i} \widetilde{x}_{isj} \end{aligned}$$
(7)

subject to

$$\begin{aligned} z_{is} + \sum _{j \in J_i} \widetilde{x}_{isj}= \ 1 \quad \forall \ i \in I, s \in S_i \end{aligned}$$
(8)
$$\begin{aligned} \widetilde{x}_{isj} - \varphi _{isj}y_j\le \ 0 \quad \forall \ i \in I, s \in S_i, j \in J_i \end{aligned}$$
(9)
$$\begin{aligned} \widetilde{x}_{isj} - \pi _{isj}y_j\ge \ 0 \quad \forall \ i \in I, s \in S_i, j \in J_i \end{aligned}$$
(10)
$$\begin{aligned} \widetilde{x}_{isj} - \frac{\pi _{isj}}{\zeta _{is}}z_{is}\le \ 0 \quad \forall \ i \in I, s \in S_i, j \in J_i \end{aligned}$$
(11)
$$\begin{aligned} \sum _{j \in J} y_j&= K \quad \end{aligned}$$
(12)
$$\begin{aligned} \widetilde{x}_{isj}&\ge 0 \quad \forall \ i \in I, s \in S_i, j \in J_i \end{aligned}$$
(13)
$$\begin{aligned} z_{is}&\ge 0 \quad \forall i \in I, s \in S_i \end{aligned}$$
(14)
$$\begin{aligned} y_{j}&\in \left\{ 0,1\right\} \quad \forall \ j \in J. \end{aligned}$$
(15)

We denote \(\widetilde{F}\) as the objective function value of (7). Let be given a combination of \(i \in I, \ s \in S_i,\) and \(j \in J_i\). For convenience reasons, we assume for a moment that \(\left| M\right| =2\) and \(\left| J\right| =1\) with \(M =\left\{ j, k\right\} \) and \(J=\left\{ j\right\} \), accordingly \(M_i=M\) and \(J_i=J\). Now, if \(y_j=0\), then \(\widetilde{x}_{isj}=0\) because of (9) and further \(z_{is}=1\) because of (8). If \(y_j = 1\), then according to (11), \(\widetilde{x}_{isj} = z_{is}\cdot \pi _{isj}/ \zeta _{is}\), because of (7) and \(z_{is}\cdot \pi _{isj}/ \zeta _{is} \le \varphi _{isj}\). Due to (8) and substitution, we get the correct choice probabilities \(\widetilde{x}_{isj}= \mathrm{e}^{\widetilde{v}_{isj}}/\left( \mathrm{e}^{\widetilde{v}_{isj}}+\mathrm{e}^{\widetilde{v}_{isk}}\right) \) with \(k\) indicating the facility location of the competitor. Of course, these coherences are valid for \(\left| M\right| >2\) and \(\left| J\right| >1\) as well. Therefore, constraints (8)–(11) together with (7) yield the MNL choice probabilities. For more details, we refer to Haase (2009) and Aros-Vera et al. (2013). Using \(\varphi _{isj}\) in (9) and \(\pi _{isj}\) in (10) yields bounds on \(\widetilde{x}_{isj}\) that are tighter than simply using \(0 \le \widetilde{x}_{isj} \le y_j\). In contrast to Aros-Vera et al. (2013), we do not consider redundant constraints in our model: Using (11) yields \(\left| I\right| \cdot \left| S_i\right| \cdot \left| J_i\right| \) constraints instead of \(\left| I\right| \cdot \left| S_i\right| \cdot \left| J_i\right| ^2\) constraints.

3.2 Lower bound and objective cuts

To derive an intelligible lower bound for \(\widetilde{F}\) of (7), we consider the binary variable \(w_{mj}\). Further, we define the non-negative variable

$$\begin{aligned} Q = \sum _{i \in I} \sum _{s \in S_i}\sum _{m \in J} \sum _{j \in J \setminus \left\{ m\right\} } \pi _{isj}w_{mj}. \end{aligned}$$
(16)

If we minimize \(Q\) subject to (16) and

$$\begin{aligned} \sum _{j \in J \setminus \left\{ m\right\} } w_{mj}= \ K - 1 \quad \forall \ m \in J \end{aligned}$$
(17)
$$\begin{aligned} w_{mj}\in \ \left\{ 0,1\right\} \quad \forall \ m,j \in J \end{aligned}$$
(18)

the quantity

$$\begin{aligned} a_m = \sum _{i \in I} \sum _{s \in S_i} \widetilde{h}_{is} \frac{\pi _{ism}}{\sum _{j \in J} \pi _{isj} w_{mj}^*} \quad \forall \ m \in J, \end{aligned}$$
(19)

denotes the maximum attractiveness of facility location \(m \in J\) with \(w_{mj}^*\) indicating that \(j\) belongs to the \(K-1\) most attractive facility locations compared to \(m\). If we maximize \(Q\) subject to (16)–(18), then the quantity

$$\begin{aligned} b_m = \sum _{i \in I} \sum _{s \in S_i} \widetilde{h}_{is} \frac{\pi _{ism}}{\sum _{j \in J} \pi _{isj} w_{mj}^*} \quad \forall \ m \in J, \end{aligned}$$
(20)

denotes the minimum attractiveness of facility location \(m \in J\) with \(w_{mj}^*\) indicating that \(j\) belongs to the \(K-1\) least attractive facility locations compared to \(m\). To derive a lower bound, we choose the \(K\)-largest \(j \in J\) according to \(b_j\). Denote this set as \(\tilde{J}\). Accordingly, \(\tilde{J}_i = \tilde{J} \cap M_i\). Now compute the lower bound as:

$$\begin{aligned} LB = \sum _{i \in I} \sum _{s \in S_i} \widetilde{h}_{is} \sum _{j \in \tilde{J}_i} \frac{\mathrm{e}^{\widetilde{v}_{isj}}}{\sum _{m \in M_i \setminus J}{\mathrm e}^{\widetilde{v}_{ism}} + \sum _{m\in \tilde{J}_i} \mathrm{e}^{\widetilde{v}_{ism}}}. \end{aligned}$$
(21)

Finally, we add

$$\begin{aligned}&\widetilde{F} \ge LB \end{aligned}$$
(22)
$$\begin{aligned}&\widetilde{F} \le LB - \sum _{j \in \tilde{J}} b_j \left( 1-y_j\right) + \sum _{j \in J \setminus \tilde{J}} a_j y_j \end{aligned}$$
(23)

to our model (7)–(15) to account for a lower bound (LB) (22) and an objective cut OC1 (23). A lower bound for problems with capacities is presented in Haase and Müller (2014b). Now, we might define the quantities

$$\begin{aligned} \alpha _j = - LB \ + \sum _{i \in I} \sum _{s \in S_i} \widetilde{h}_{is} \sum _{l \in \tilde{J}_i \cup \left\{ j\right\} } \frac{\mathrm{e}^{\widetilde{v}_{isl}}}{\sum _{m \in M_i \setminus J}{\mathrm e}^{\widetilde{v}_{ism}} + \sum _{m\in \tilde{J}_i \cup \left\{ j\right\} } \mathrm{e}^{\widetilde{v}_{ism}}} \quad \forall \ j \in J \setminus \tilde{J} \end{aligned}$$
(24)

and

$$\begin{aligned} \gamma _j \ = \ \sum _{i \in I} \sum _{s \in S_i} \widetilde{h}_{is} \left( \sum _{l \in J \setminus \left\{ j\right\} } \frac{\mathrm{e}^{\widetilde{v}_{isl}}}{\sum _{m \in M_i \setminus J}\mathrm{e}^{\widetilde{v}_{ism}} + \sum _{m \in J \setminus \left\{ j\right\} } \mathrm{e}^{\widetilde{v}_{ism}}} - \sum _{l \in J} \pi _{isj} \right) \quad \forall \ j \in \tilde{J}. \end{aligned}$$
(25)

Based on Benati and Hansen (2002), we can define a second objective cut OC2 alternatively to (23)

$$\begin{aligned} \widetilde{F} \ \le LB + \sum _{j \in J \setminus \tilde{J}} \alpha _j y_j + \sum _{j \in \tilde{J}} \gamma _j \left( 1 - y_j\right) . \end{aligned}$$
(26)

Note, \(\gamma _j\) in (25) is negative for all \(j \in \tilde{J}\) by construction.

We are interested in the impact of the number of segments, the lower bound, the objective cuts, the number of competitors on the solution and the solvability of our approach. The corresponding numerical examples can be found in the Appendix. The major findings of these numerical examples are that (1) segmentation has significant impact on the computational effort, (2) the lower bound (22) provides a quite good solution (it deviates \(<\)1 % from the optimal solution), and (3) the use of the objective cut OC1 (23) is particularly appealing if we do not expect to find an optimal solution within a given time. Further, we solve problem sets with 2 segments, 500 demand points and 10 potential locations to optimality in 1 h computation time. If we consider 50 potential locations, the gap reported by CPLEX is \(<\)8 % in 1 h.

4 Illustrative case example: furniture store location in Germany

In this section, we apply our model of Sect. 3.1 to a hypothetical—but still realistic— branch-extension of a large furniture store company in Germany. Figure 2a shows the already existing facility locations and the potential facility locations of the considered firm, as well as the locations of the main competitors in the market. The firm already runs 46 stores in the year 2012 with a market share of 12.5 % and 46 million customers yielding 3.7 billion Euro revenue. The firm aims to massively expand in the market in the near future. It is intended to establish 5–15 new facilities until 2020. The task is to find out the optimal locations for a given number of new facilities (\(K^+\)) from 50 potential facility locations and the corresponding expected market share of the firm.

We consider the centroids of the 415 German "Kreise” (municipalities) as demand points. The locations of the facilities (existing, potential, and competitors) are given by longitude and latitude coordinates. The euclidean distance in kilometers between a demand point \(i \in I\) and a facility location \(j \in M\) is denoted by \(d_{ij}\). The choice set for each demand node \(i \in I\) is defined by

$$\begin{aligned} M_i = \left\{ j \in M \left| d_{ij} \le \delta \right. \right\} \end{aligned}$$
(27)

with \(M\) as the set of all facility locations and \(\delta \) as a threshold distance. If \(\pi _{isj}<0.00001\) then we remove \(j\) from \(M_i\). There exist 101 facility locations of the competitors. Thus, \(\left| M\right| =197\). Customers do not consider facilities located more distant than \(\delta \) as a conceivable alternative. Since, the main customers of the firm are aged between 15 and 25, we consider two distinct segments of customers: \(\widetilde{h}_{i,s=1}\) as the number of customers aged between 15 and 25 and \(\widetilde{h}_{i,s=2}\) as the number of customers of all other ages. The deterministic part of utility (see Sects. 2, 3.1) is given as

$$\begin{aligned} \widetilde{v}_{is,j=0}&= \beta ^{\text {inc}} \cdot \text {INC}_i \quad \forall \ i \in I, s \in S, \end{aligned}$$
(28)
$$\begin{aligned} \widetilde{v}_{isj}&= \beta ^{\text {dist}}_s \cdot d_{ij} \quad \forall i \in I, s \in S, j \in M_i, j > 0, \end{aligned}$$
(29)

with \(\text {INC}_i\) as the average annual disposable income of the population located in \(i \in I\) in 1,000 Euro. Total population and \(\text {INC}_i\) are given in Fig. 2a. The ratio \(\sum _i \widetilde{h}_{i, s=1} / \sum _i \widetilde{h}_{i, s=2} =0.163\). Coefficients \(\beta ^{\text {inc}}\) and \(\beta ^{\text {dist}}_s\) are the utility contribution per unit of the corresponding attribute (distance and income). Equation 28 denotes the utility for not choosing any of the facility locations of the firm (potential and existing) or the competitors. Roughly speaking, \(j=0\) denotes a dummy facility absorbing all demand not satisfied by the facilities of the firm or the competitor. The dummy facility \(j=0\) comprises the utility of customers either to patronize a small, local furniture store or not to consume furniture anyway. Note that (28) and (29) are rather simplistic specifications of utility to make the application more comprehensible.

In a real-world application, the coefficients \(\beta ^{\text {inc}}, \beta ^{\text {dist}}_s\) of (28) and (29) have to be estimated using empirical choice data (i.e., discrete choice analysis). Large companies can easily afford a comprehensive empirical study to appropriately estimate the coefficients of the utility functions. Here, we cannot obtain such estimates, hence we rely on parameter estimates from other empirical studies. Suarez et al. (2004) provide coefficient estimates for a shopping center choice model. They distinguish between two different segments of customers (target group and others) and estimate coefficients of the distance between the customers location and the shopping center for both customer segments. Here, we employ these coefficient estimates, given as

$$\begin{aligned} \beta ^{\text {dist}}_{s=1}&=\ -0.078 \\ \beta ^{\text {dist}}_{s=2}&= -0.088. \end{aligned}$$

This indicates that the main customers (\(s=1\), population aged between 15 and 25) are less sensitive to distance than other customers. Goldman (1976) provides empirical evidence on the coherence between income and the propensity of shopping at a specific facility. Based on Fotheringham and Trew (1993), we might consider

$$\begin{aligned} \beta ^{\text {inc}} = -0.015. \end{aligned}$$

Now, we are able to compute the expected patronage for each existing facility using (21) and hence the total expected market share of the firm as

$$\begin{aligned} \text {MS} = \widetilde{F} / \sum _{i \in I} \sum _{s \in S_i} \widetilde{h}_{is}. \end{aligned}$$
(30)

We consider this as the base scenario. Figure 2b displays the result. We know that, on average, a customer is assumed to make five shopping visits a year. This yields 41 million customers over all existing facilities and a total expected market share of 11.12 %. The expected market share is below the reference value of 12.5 %. This is reasonable, because we do not consider online purchases and there might be some inconsistencies close to the border of Germany due to transnational purchases of customers. On average, a customer spends 80 Euro per visit yielding an annual revenue of 3.28 billion Euro. This is close to the reference value of 3.7 billion Euro. We conclude that our demand model makes predictions fairly well.

Since our parameters do not stem from a unique study on furniture store customer behavior in Germany, we first investigate the sensitivity of the solution to parameter variations. The locational decision variables \(y_j\) are fixed to one for the already existing facility locations (i.e., \(j<47\)). We solve our model of Sect. 3 for various parameter settings and for different distance thresholds \(\delta \) of (27). We are interested in MS’s dependence on \(K^+\). We have implemented our model in GAMS 23.7 and we use CPLEX 12.2 on a 64-bit Windows Server 2008 with 4 Intel Xeon 2.4 GHz processors and 24 GB RAM for all studies. All problems considered in this section are solved to optimality within minutes. The results of Fig. 3 show a piecewise linear increase of the market share in \(K^+\). The slope is nearly 0.35 indicating that with each additional facility, the total market share of the firm increases by 0.35 % points. Note, the underlying function is not necessarily concave. The sensitivity analysis indicates that the market share is independent from the distance threshold \(\delta > 50\) and the weight of the income \(\beta ^{\text {inc}}\). In contrast, the scale of the market share heavily depends on the distance parameters (\(\beta _s^{\text {dist}}\)). This finding stresses the need for firms to employ the estimates based on unique choice studies (see Street and Burgess 2007; Müller et al. 2008; Louviere et al. 2000 for how to design studies and experiments for discrete choice analysis).

Based on the (linear) relationship between MS and \(K^+\), the firm’s management is enabled to identify a specific number of new facilities to be located. The optimal locations and the expected (annual) patronage of the new facilities can be displayed in maps and enhance the decision making of the firm’s management. Figure 4 exemplifies a market expansion with 5 and 10 new facilities. In a real-world management application, one usually has to account for locally varying locational (and maybe operational) cost. In such a situation, one would be interested in the relationship between cost (or budget) and market share. The firm is further interested in the impact of segmentation of their customers (see Sect. 2.2). Therefore, we consider the following example that extends Example 1.

Fig. 2
figure 2

Existing facilities and potential locations of the firm as well as facilities of the competitors (a). Expected customers in base scenario (b)

Fig. 3
figure 3

Results of sensitivity analysis for \(\delta , \beta _s^{\text {dist}}, \beta ^{\text {inc}}\), and market share (MS). \(K\) of (12) is given by \(46 + K^+\) (46 facilities are already in the market)

Fig. 4
figure 4

Results for two possible scenarios. Parameter settings used: \(\delta = 150\), \(\beta ^{\text {dist}}_{s=1}= -0.078 \), \(\beta ^{\text {dist}}_{s=2}= -0.088\), and \(\beta ^{\text {inc}}= -0.015\). The newly located facilities are labeled, f.e. P16, P22, P25, P36, and P50 in a—where “P” denotes that the corresponding location is selected from the set of potential locations. Non-labeled locations are already existing facilities

Example 2

We expect the more the two segments differ, the larger is the predictive bias of the MNL and thus the larger is the bias of the objective function value if segmentation is neglected. Due to the specification of the deterministic part of utility in (29), the difference in choice probabilities between the two segments corresponds to the difference between \(\beta ^{\text {dist}}_{s=1}\) and \(\beta ^{\text {dist}}_{s=2}\).To evaluate the impact of neglected segmentation, we first consider \(\beta ^{\text {dist}}_{s=1} =\beta ^{\text {dist}}_{s=2} = \beta ^{\text {dist}}\) with \(\beta ^{\text {dist}} = (\beta ^{\text {dist}}_{s=1} + \beta ^{\text {dist}}_{s=2})/2\) in (29). This corresponds to a simple average of utilities as described in (1) of Sect. 2.2. The corresponding solution in terms of selected locations is denoted by \(\overline{J} = \left\{ j \in J\left| y^*_j=1\right. \right\} \). Based on \(\overline{J}\), we compute the MNL choice probabilities using segmentation, i.e., we use \(\beta ^{\text {dist}}_{s=1}\) and \(\beta ^{\text {dist}}_{s=2}\) instead of \(\beta ^{\text {dist}}\) in (29). The corresponding objective function value is denoted as \(\overline{F}\) and the corresponding market share is given by MS(\(\overline{F}\)).

We consider \(\beta ^{\text {dist}}_{s} \in \left\{ \right. -1, -0.1, -0.01, -0.001, -0.0001\left. \right\} \), \(\beta ^{\text {inc}}=-0.015\), and \(\delta = 150\). Further, we consider two scenarios: \(K^+ = 5\) and \(K^+ = 10\). The results are given in Fig. 5. The patterns for the total deviation \(\overline{F}- \widetilde{F}\), relative deviation \(100 \times (\overline{F}- \widetilde{F})/\widetilde{F}\), and the deviation of the market shares \(\text {MS}(\overline{F})-\text {MS}(\widetilde{F})\) are similar. The most eye-catching bias occurs if \(\beta ^{\text {dist}}_{s=1} = -1\). Consider exemplarily \(\beta ^{\text {dist}}_{s=1} = -1\) and \(\beta ^{\text {dist}}_{s=2} = -0.1\), i.e., segment \(s=1\) evaluates each additional kilometer ten times as negative as segment \(s=2\) (i.e., \(\beta ^{\text {dist}}_{s=1}/\beta ^{\text {dist}}_{s=2} = 10\)). In case that segmentation is neglected, the corresponding distance-coefficient is \(\beta ^{\text {dist}} =-0.55\). As a consequence, a large part of customers (recall that, \(\sum _i \widetilde{h}_{i, s=1} / \sum _i \widetilde{h}_{i, s=2} =0.163\)) evaluates distance more than five times as negative as this would be the case with segmentation. Of course, the corresponding deviation is remarkable (\(-\)8.9 % for \(K^+=5\) and \(-\)12.5 % for \(K^+=10\)). The asymmetric pattern in Fig. 5 is due to the uneven distribution of population over the two segments (the population of segment 2 is larger than the population of segment 1): the more the true coefficient of the large part of the population (segment 2) deviates from the average coefficient the larger is the expected predictive error. In contrast, a large deviation of the true coefficient of segment 1 has impact only on a small part of the population and the corresponding expected predictive error is comparably small. Obviously, the extent of the error heavily depends on the scale of the coefficients. Consider, for example, \(\beta ^{\text {dist}}_{s=1} = -1\) and \(\beta ^{\text {dist}}_{s=2} = -0.1\). The corresponding ratio is 10 and the expected error for \(K^+ = 5\) is \(-\)8.88 %. Now, for \(\beta ^{\text {dist}}_{s=1} = -0.1\) and \(\beta ^{\text {dist}}_{s=2} = -0.01\) the corresponding ratio is 10 again. However, the corresponding error is only \(-\)0.18 %. This pattern is due to the non-linear relationship between distance (deterministic utility) and the choice probabilities (i.e., a s-shaped probability function). As the coefficients (weighting of travel distance) get larger (i.e., approaching 0) the probabilities of choosing to patronize a facility approach the largest possible value. For these values of the deterministic utility the difference in the corresponding choice probabilities between the two segments become small.

The bias found in our study is comparable to those reported in studies on spatial aggregation (Andersson et al. 1998; Daskin et al. 1989; Current and Schilling 1987; Murray and Gottsegen 1997). In literature, ratios of segment-specific coefficients larger than 50 are reported (Müller et al. 2012; Koppelman and Bhat 2006, pp 133–134). However, the difference between segment-specific distance-coefficients used in our application is small. We have considered parameter settings that yield a ratio \(\beta ^{\text {dist}}_{s=1}/\beta ^{\text {dist}}_{s=2} = 0.91\) (see Fig. 3). As a consequence, the expected bias is below 1 % if we neglect segmentation in our application. Nevertheless, the consideration of segments yields valuable insights, because the utility function (29) and the corresponding coefficients are arbitrarily chosen. As stated before, for a real application, the company is expected to specify utility functions and estimate the corresponding coefficients on unique choice data. The firm may use such a numerical study to make assumptions about worst-case scenarios.

Fig. 5
figure 5

Results of Example 2: bias of objective function due to neglected segmentation. “Kplus” corresponds to \(K^+\). The values of \(\overline{F}- \widetilde{F}\) are given in million customers. The numerical values corresponding to this figure are given in Appendix (see Table 2)

5 Summary

By an intelligible example, we demonstrate that the independence from IIA of the MNL may yield false predictions. This finding is well founded on empirical studies. When the MNL is used in a mathematical program to incorporate customer choice behavior, the model outcomes are very likely to be biased as well. Although the MNL is founded on individual choice behavior, in facility location planning we are interested in the share of customers of a demand point patronizing a certain facility. If we assume the customers of a demand point are homogenous, i.e., they exhibit the same observable characteristics, then there is no need for segmentation. If we assume the customers to be heterogeneous then segmentation of the customers according to their characteristics (income and age, for example) should be employed. By proper segmentation, we are able to reduce the predictive bias of the MNL in terms of market shares.

In this contribution, we present a model formulation for the maximum capture problem that explicitly allows for customer segmentation using the MNL to find optimal shopping facility locations. Moreover, we propose an intelligible approach to derive a lower bound for our model. Extensive computational studies show the impact of proper segmentation as well as the efficiency of our approach: using aggregate customer characteristics instead of proper segmentation may yield a predictive bias of the objective function value of more than 15 % deviation from the optimal objective function value. Our lower bound is found in \(<\)1 s and deviates \(<\)1 % from the optimal solution. Problems with 2 segments, 50 potential locations and 500 demand points can be solved to a gap \(<\)8 % within 1 h using GAMS/CPLEX. Based on our numerical studies concerning the quality of the lower bound, it is reasonable to assume that the true gap is remarkably smaller than 8 %. We apply our approach in an illustrative case example of a globally operating furniture store company that intends to increase its market share in Germany by branch expansion. This problem can be solved to optimality within few minutes. Our example shows how the novel approach can be used for management decision support.

Based on our findings, several possible directions of future research appear. It is of interest to find analytically bounds on the bias of the objective function value due to missing segmentation under various segmentation patterns and specifications of utility. Further, the explicit consideration of substitution patterns, i.e., correlation between facility locations, is a very important issue to be analyzed. Efficient solution methods are necessary to account for larger problem sets. Finally, our approach is useful to other areas of operations research; assortment optimization, for example Kök and Fisher (2007).