# A strong Lagrangian relaxation for general discrete-choice network revenue management

- 131 Downloads

## Abstract

Discrete-choice network revenue management (DC-NRM) captures both customer behavior and the resource-usage interaction of products, and is appropriate for airline and hotel revenue management, dynamic sales of bundles in advertising, and dynamic assortment optimization in retail. The state-space of the DC-NRM stochastic dynamic program explodes and approximation methods such as the choice deterministic linear program, the affine, and the piecewise-linear approximations have been proposed to approximate it in practice. The affine relaxation (and thereby, its generalization, the piecewise-linear approximation) is intractable even for the simplest choice models such as the multinomial logit (MNL) choice model with a single segment. In this paper we propose a new Lagrangian relaxation method for DC-NRM based on an extended set of multipliers. An attractive feature of our method is that the number of constraints in our formulation scales linearly with the resource capacities. While the number of constraints in our formulation is an order of magnitude smaller that the piecewise-linear approximation (polynomial vs exponential), it obtains a bound that is as tight as the piecewise-linear bound. If we assume that the consideration sets of the different customer segments are small in size—a reasonable modeling tradeoff in many practical applications—our method is an indirect way to obtain the piecewise-linear approximation on large problems effectively. Our results are not specific to a particular functional form (such as MNL), but hold for any discrete-choice model of demand. We show by numerical experiments that our Lagrangian relaxation method can provide substantial improvements over existing benchmark methods, both in terms of tighter upper bounds, as well as revenues from policies based on the relaxation.

## Keywords

Dynamic programming approximations Transportation Revenue management Choice models## 1 Introduction and literature review

In industries such as hotels, advertising, and airlines, the products consume one multiple resources (for example, multi-night stays, bundles of advertising time-slots, multi-leg itineraries) and each product has a specific price, set based on the price sensitivity of the customer segment it is aimed at, the market conditions, and the product characteristics. The resources are perishable; for instance, for an airline, empty at the moment of departure get no revenue, so the inventory “perishes” at departure. The firm’s revenue management function is to decide, at every point in time during the sale period, what products to open for sale—the tradeoff being selling too much at too low a price early and running out of capacity, or, rejecting too many low-valuation customers and ending up with unsold inventory. Network revenue management (NRM) is control based on the demands for the entire network. Chapter 3 of Talluri and van Ryzin [23] contains the necessary background on NRM.

NRM incorporating more realistic models of customer behavior, as customers choosing from a set of offered products, was initiated in Talluri and van Ryzin [22] for the single-resource problem. Discrete-choice models are parsimonious as they model the probability of purchase as a function of product characteristics, such as price and restrictions, reducing the number of parameters to estimate. The subsequent optimization problem can be formulated as a stochastic dynamic program that we call discrete-choice NRM (DC-NRM) dynamic program.

The DC-NRM dynamic program is computationally intractable and hence many approximation methods have been proposed, starting with Gallego et al. [7] and Liu and van Ryzin [15] who formulate the choice deterministic linear program (*CDLP*). Zhang and Adelman [26] propose an affine approximation to the value function, while Meissner and Strauss [16] propose a piecewise-linear approximation. Kunnumkal and Topaloglu [14] use Lagrangian relaxation ideas to come up with a separable approximation. All of the above mentioned approximation methods obtain upper bounds on the value function, with the piecewise-linear approximation obtaining the tightest upper bound.

Unfortunately, most of these approximation methods themselves are intractable, even for simple choice models. Liu and van Ryzin [15] show that *CDLP* is tractable for the multinomial logit (MNL) model provided the subset of products of interest to the different customer segments (consideration sets) are disjoint. However, *CDLP* is NP-complete when the segment consideration sets overlap even under the MNL model with two customer segments (Bront et al. [5], Rusmevichientong et al. [18]). Kunnumkal and Talluri [11] show that the affine approximation of Zhang and Adelman [26] is NP-hard for the MNL model with even a single segment, which implies a similar hardness result for the piecewise-linear approximation as well. These negative results show us the limits of tractability even for the simplest choice model such as the MNL model.

One approach to obtain a tractable model is to approximate the underlying discrete choice model with a simplified model that limits the number of choices. For example, Chen and de Mello [6] consider a buy-up/buy-down model which is tractable and can be solved efficiently. In this paper, we take a different approach to tractability; we work with the underlying discrete-choice model but come up with a tractable approximation to the value function of the DC-NRM dynamic program. In particular, we propose a new Lagrangian relaxation method for DC-NRM based on an extended set of multipliers. The number of constraints in our formulation scales linearly with the resource capacities, making it a tractable alternative for large networks. On the other hand, we show that our extended Lagrangian relaxation obtains a bound that is as tight as the piecewise-linear bound. The biggest practical impact of our work is in showing that the complexity of the Lagrangian relaxation method (in terms of the number of constraints in the linear programming formulation of the problem) scales linearly with the resource capacities, while that of the piecewise-linear approximation is exponential—yet they arrive at the same value function approximation. If we assume that the consideration sets of the different customer segments are small in size—a reasonable modeling tradeoff in many practical applications as we argue next—our method is able to solve relatively large problems, which would have been impossible if one were to try solving the piecewise-linear approximation directly.

Since we work with general discrete-choice models, all the negative complexity results for the MNL model carry over for the Lagrangian relaxation as well as we show that our Lagrangian relaxation is equivalent to the piecewise-linear relaxation. The great advantage however is that the number of constraints in the Lagrangian relaxation depends directly on the size of the consideration sets of the different segments. These fortunately are quite small in practice: in the airline setting for instance, a segment’s consideration set consists of choices (on one airline) for travel between an origin and destination, and typically there are only a few alternatives on a given date (Talluri [20]). For hotels, as the product consists of a multi-night stay and most customers arrive with a fixed duration of stay in mind, the consideration set consists of the types of rooms and products, usually not a very large number.

Research in the marketing area also gives evidence supporting that customers have relatively small consideration sets: Hauser and Wernerfelt [9] report average consideration set sizes of 3 brands for deodorants, 4 brands for shampoos, 2.2 brands for air fresheners, 4 brands for laundry detergents and 4 brands for coffees. (Note that the study is for brands rather than choices of sizes or colors.) Another line of marketing research finds great value in deliberately limiting customer choices to a small number (Iyengar and Lepper [10]). Assuming small consideration set sizes, Talluri [21] and Meissner et al. [17] study tractable approximations to *CDLP* for DC-NRM. While our segment-based relaxation has the same underlying motivation, the development is quite different as we work with the Lagrangian relaxation of the DC-NRM dynamic program.

Our final contribution in this paper is showing by numerical experiments that our Lagrangian relaxation method can provide substantial improvements over existing methods.

Our work builds on previous research on Lagrangian relaxations for NRM. Topaloglu [24] was the first paper to propose a time-based Lagrangian relaxation approach for the perfect segmentation case (also called the independent-class assumption). The Lagrangian relaxation of Topaloglu [24] associates Lagrange multipliers with each product in each time period, and Kunnumkal and Talluri [13] show that it obtains an upper bound that coincides with the piecewise-linear bound. In contrast, for DC-NRM, we give an example which illustrates that the approach proposed by Topaloglu [24] can be weaker than the piecewise-linear approximation. This motivates our approach that associates Lagrange multipliers with every offer set, and obtains a bound that is as tight as the piecewise-linear bound. We note that the Lagrangian relaxation we propose is different from that in Topaloglu [24] and consequently has different structural properties. Moreover, an appealing feature of our method is that it extends very naturally to the case where there are multiple customer segments each interested in a small subset of the products. We build on our Lagrangian relaxation idea and propose a segment-based approximation that is a tractable alternative when the considerations sets of the different segments are small in size.

In parallel to a working version of this paper, Vossen and Zhang [25] study the properties of approximate linear programming methods for NRM and show that the affine and piecewise-linear formulations can be significantly reduced in size. Their reduced formulation of the piecewise-linear program for DC-NRM can be shown to be equivalent to our Lagrangian relaxation method (Sect. 5.4 in Vossen and Zhang [25]). We note however that our starting point is a Lagrangian relaxation of the DC-NRM dynamic program and our proof technique is quite different. Moreover, we show that it is not possible to recover the piecewise-linear approximation upper bound for DC-NRM by using product-specific multipliers as in the perfect segmentation case. Consequently, the expanded form, using offer-set specific multipliers is the best we can hope for. Finally, even though the number o of our Lagrangian relaxation (and the reduced piecewise-linear approximation) is linear in the number of resources, it is still exponential in the number of products. We propose a new segment-based Lagrangian relaxation that remains tractable provided the consideration sets of the different customer segments are small in size.

- 1.
We propose a new Lagrangian relaxation method for DC-NRM based on an extended set of Lagrange multipliers.

- 2.
The number of constraints in our Lagrangian relaxation (in the linear programming formulation of the problem) scales linearly with the resource capacities making it a tractable alternative for large networks. Another appealing feature of our approach is that it decomposes the network problem into a number of single resource problems, which can be potentially solved in a distributed fashion.

- 3.
We show that our Lagrangian relaxation obtains an upper bound on the value function as tight as the piecewise-linear bound. The number of constraints in the Lagrangian relaxation scales linearly with the resource capacities, while the piecewise-linear approximation is exponential, hence leading to a substantial reduction in running time in practice. This result also implies that the Lagrangian relaxation bound is stronger than the affine approximation bound (hence also

*CDLP*) that was not known previously. It should be noted that our results are not specific to a particular functional form (such as MNL), but hold at a full level of generality, for any discrete-choice model of demand. - 4.
We show how our ideas lead to a tractable method for DC-NRM (for any discrete-choice model) when the consideration sets of the different segments are small in size.

- 5.
Our numerical experiments indicate that our Lagrangian relaxation can obtain significantly tighter upper bounds than the benchmark methods including the

*CDLP*and the affine approximation. We also extract policies from our formulation and show by numerical experiments that it can lead to notable improvements in revenue over existing methods.

*CDLP*, the affine and the piecewise-linear approximation methods. In Sect. 4 we describe the Lagrangian relaxation approach and show that it obtains a bound that is as tight as the piecewise-linear bound; we defer the formal proof to Sect. 5. Section 6 describes the segment-based Lagrangian relaxation that is tractable under the assumption of small consideration sets. In Sect. 7 we perform numerical experiments to compare the performance of the Lagrangian relaxation approach with benchmark solution methods.

## 2 Problem formulation

A product is a specification of a price and the set of resources that it consumes. For example, for an airline a product is an itinerary-fare class combination, where an itinerary is a combination of flight legs making up a passenger’s journey; for a hotel, a product is a multi-night stay for a particular room type at a certain price point.

Our model is discrete-time with \(\tau \) intervals, indexed by *t*. The booking horizon begins at time \(t=1\) and ends at \(t=\tau \); all the resources perish instantaneously at time \(\tau +1\). We make the standard assumption that the time intervals are fine enough so that the probability of more than one customer arriving in any single time period is negligible. We index resources by *i*, products by *j*, and time periods by *t*. We refer to the set of all resources as \({\mathcal {I}}\) and the set of all products as \({\mathcal {J}}\). We let \(f_j\) denote the revenue associated with product *j* and \({\mathcal {I}}_j \subset {\mathcal {I}}\) the subset of resources it uses. Similarly, we let \({\mathcal {J}}_i \subset {\mathcal {J}}\) denote the subset of products that use resource *i*.

We use superscripts on vectors to index the vectors and subscripts to indicate their components. For example, we write the resource capacity vector associated with time period *t* as \(\varvec{r}^t\) and use \(r^t_i\) to represent the capacity on resource *i* in time period *t*. Therefore, \(\varvec{r}^1 = \left[ r^1_i\right] \) represents the initial capacities on the resources and \(\varvec{r}^t = \left[ r^{t}_{i}\right] \) denotes the remaining capacities on the resources at the beginning of time period *t*. The remaining capacity \(r^{t}_{i}\) takes values in the set \({\mathcal {R}}_i =\left\{ 0, \ldots , r^1_i\right\} \) and \({\mathcal {R}} = \prod _i {\mathcal {R}}_i\) represents the state space.

### 2.1 Demand model

In each time period the firm offers a subset *S* of its products for sale, called the *offer set*. A customer arrives with probability \(\alpha \) and given an offer set *S*, an arriving customer purchases a product *j* in the set *S* or decides not to purchase. The no-purchase option is indexed by 0 and is always present for the customer. We let \(P_j(S)\) denote the probability that the firm sells product *j* given that a customer arrives and the offer set is *S*. Clearly, \(P_j(S) = 0\) if \(j \notin S\). The probability of no sale given a customer arrival is \(P_0(S) = 1 - \sum _{j \in S} P_j(S)\). We assume that the choice probabilities are given by an oracle, as the model represents a general discrete-choice model; they could conceivably be calculated by a simple formula as in the case of the multinomial logit (MNL) model (Ben-Akiva and Lerman [3]).

### 2.2 DC-NRM dynamic program

*t*. Then \(V_t(\varvec{r}^t)\) must satisfy the Bellman equation

*i*th position and 0 elsewhere and

*t*, where \(\varvec{0}\) is a vector of all zeroes. \(V^{DP}=V_1(\varvec{r}^1)\) denotes the optimal expected total revenue over the booking horizon, given the initial capacity vector \(\varvec{r}^1\).

For brevity of notation, we assume that \(\alpha = 1\) in the remaining part of the paper. We note that this is without loss of generality since this is equivalent to letting \({{\tilde{P}}}_j(S) = \alpha P_j(S)\) and \({{\tilde{P}}}_0(S) = \alpha P_0(S) + 1 - \alpha \), and working with the choice probabilities \(\big \{ {{\tilde{P}}}_j(S) \,|\, \forall j, S \big \}\).

### 2.3 Linear programming formulation of the DC-NRM dynamic program

*t*and is as follows:

*t*. Both the dynamic program (1) and linear program

*DP*are computationally intractable, but the linear program

*DP*turns out to be useful in developing value function approximation methods.

## 3 Value function approximation methods

In this section, we describe three methods in the literature to approximate the DC-NRM dynamic program value function. We begin with the choice-based deterministic linear program and then outline the affine and the piecewise-linear approximations.

### 3.1 Choice-based deterministic linear program (*CDLP*)

*CDLP*), proposed in Gallego et al. [7] and Liu and van Ryzin [15] is given by

*S*is offered at time period

*t*. The objective function captures the total expected revenue, while the first set of constraints ensure that the total expected capacity consumed on each resource does not exceed its available capacity. The second set of constraints ensures that the total frequencies add up to 1 at each time period.

Liu and van Ryzin [15] show that the optimal objective function value of *CDLP* gives an upper bound on the optimal expected revenue. That is, \(V_1(\varvec{r}^1) \le V^{CDLP}\). Since *CDLP* has an exponential number of decision variables it has to be solved using column generation. Liu and van Ryzin [15] show that column generation is efficient for the MNL model with a single segment. However, the column generation procedure is intractable in general, NP-complete for MNL with just two segments (Bront et al. [5] and Rusmevichientong et al. [18]).

### 3.2 Affine approximation

*DP*to obtain the following linear program:

*AF*gives an upper bound on the optimal expected revenue and this bound is tighter than the

*CDLP*bound. While the number of decision variables in

*AF*is manageable, the number of constraints is exponential both in the number of resources as well as the products. Vossen and Zhang [25] show that

*AF*has a reduced formulation where the number of constraints is exponential only in the number of products. Still, the problem has to be solved by constraint generation and the separation problem is intractable even for the MNL choice model with a single segment (Kunnumkal and Talluri [11]).

### 3.3 Piecewise-linear approximation

*DP*. The resulting linear program is

*CDLP*and

*AF*, tractability remains as much, if not more, of an issue for

*PL*as well. Note that linear program

*PL*has \( {\mathcal {O}}(2^{|{\mathcal {J}}|} \tau \prod _{i } r^1_i)\) constraints, which is exponential both in the number of products and the number of resources. Moreover, the separation problem of this linear program is NP-complete, even for MNL with a single segment; see Kunnumkal and Talluri [11].

## 4 Lagrangian relaxation using offer-set specific multipliers

In this section, we present our approximation method to obtain an upper bound on the value function of the DC-NRM dynamic program. We first describe our approach and then show that it obtains an upper bound that is as tight as that obtained by *PL*.

*S*, we let

*S*. We follow the convention that the empty set does not use any resource. We also let \({\mathcal {C}}_i = \left\{ S \,|\, i \in {\mathcal {I}}_S\right\} \) and note that \(i \in {\mathcal {I}}_S\) if and only if \(S \in {\mathcal {C}}_i\). Therefore, \({\mathcal {C}}_i\) can be roughly interpreted as the collection of offer sets that use resource

*i*. Optimality equation (1) can be equivalently written, in an expanded form, as

*S*is offered at time period

*t*. In the expanded formulation, we also have resource level decision variables \(h_{i, S, t}\), which can be interpreted as the frequency with which set

*S*is offered on resource

*i*at time period

*t*. Constraints (5) ensure that for each set

*S*, the frequencies are identical across the resources used by it, while constraints (6) ensure that a set is offered only if there is sufficient capacity on each resource that it uses. Constraints (7)-(9) ensure that the frequencies are nonnegative and add up to at most 1. We note that although the expanded formulation has a number of redundant decision variables and constraints, they turn out to be useful in the development of our approximation method below.

*i*with the boundary condition that \(\vartheta ^\lambda _{i, \tau +1}(\cdot ) = 0\), where the superscript emphasizes the dependence of the value function approximation on the Lagrange multipliers. Collecting the terms involving \(h_{\phi , S, t}\) in the objective function of the relaxed problem, we solve the auxiliary dynamic program

Lemma 1 below shows that the Lagrangian relaxation using offer-set specific multipliers (*LRo*) obtains an upper bound on the optimal expected revenue and that this bound is potentially weaker than the piecewise-linear bound.

### **Lemma 1**

### *Proof*

Appendix. \(\square \)

We introduce some notation for this purpose. Let \(h^\lambda _{i, S, t}(r_i)\) denote an optimal solution to problem (10) where the arguments emphasize the dependence of the optimal solution on the Lagrange multipliers and the remaining capacity on the resource. We define \(h^\lambda _{\phi , S, t}\) in a similar manner for (11). Also, let \(X^\lambda _{i,t}\) denote the random variable which represents the remaining capacity on resource *i* at time period *t* when we offer sets according to the optimal solution to problem (10). We have the following result.

### **Proposition 1**

*Let* \(\lambda \) * and* \({\hat{\lambda }}\) * be two sets of Lagrange multipliers. Then,*

1. \(\vartheta ^{{\hat{\lambda }}}_{i,t}(r_{i}) \ge \vartheta ^{\lambda }_{i,t}\left( r_{i}\right) + \sum _{k=t}^\tau \sum _{S \in {\mathcal {C}}_i} \left\{ \sum _{x=0}^{r_i} \Pr \{ X^\lambda _{i,k} = x \,|\, X^\lambda _{i,t} = r_i \} h^{\lambda }_{i, S, t}(x) \right\} \big [ {\hat{\lambda }}_{i, S, k} - \lambda _{i, S, k} \big ].\)

2. \(\vartheta ^{{\hat{\lambda }}}_{\phi ,t} \ge \vartheta ^{\lambda }_{\phi , t} + \sum _{k=t}^\tau \sum _i \sum _{S \in {\mathcal {C}}_i} h^{\lambda }_{\phi , S, k} [{\hat{\lambda }}_{\phi , S, k} - \lambda _{\phi ,S, k}].\)

### *Proof*

Appendix. \(\square \)

We note that besides showing that \(V^\lambda _1(\varvec{r})\) is a convex function of \(\lambda \), Proposition 1 also gives an explicit expression for its subgradient. This allows us to use subgradient search to find the optimal set of Lagrange multipliers. Proposition 2 below shows that by doing this, we in fact obtain the piecewise-linear bound.

### **Proposition 2**

We defer the proof of Proposition 2 to the next section. Here we note some of its implications. Proposition 2 together with the results in Meissner and Strauss [16] implies that \(V^{LRo} \le V^{CDLP}\) and \(V^{LRo} \le V^{AF}\). Therefore, *LRo* obtains an upper bound that is tighter than both *CDLP* and *AF*. It is also quite surprising that the *LRo* bound is as tight as *PL* since the complexity ( the number of constraints in the linear programming formulation) of *LRo* is linear in the resource capacities \((\sum _i r^1_i)\), while that of *PL* is exponential \((\prod _i r^1_i)\). For typical DC-NRM instances, the number of constraints in *LRo* can be orders of magnitude smaller than that of *PL*. Moreover, since *LRo* decomposes the network problem into a number of single resource problems that can be solved in parallel, it may also be more suitable for distributed computing techniques.

In closing this section, we note that while Lagrangian relaxation ideas have been applied to NRM previously, there are some important differences between our approach and previous proposals. Under the independent-class assumption, Topaloglu [24] proposes relaxing the constraints that the same acceptance decisions be made for each product across the resources that it uses. The resulting Lagrangian relaxation associates a multiplier \(\lambda _{i, j, t}\) for each product *j* and each resource \(i \in {\mathcal {J}}_i\). Kunnumkal and Talluri [13] show that the Lagrangian relaxation with product-specific multipliers (*LRp*) turns out be equivalent to *PL* in the independent-class setting. We give an example in the Appendix which illustrates that the same result fails to hold for DC-NRM. That is, for DC-NRM, *LRp* can be weaker than *PL* (and by Proposition 2, weaker than *LRo* as well). We also note that while *LRp* has a smaller number of multipliers than *LRo*, solving the resource level problem (27) in *LRp* is still intractable for a general discrete-choice model. Consequently, it is not clear whether *LRp* provides any significant computational benefits over *LRo*.

The optimality equation (11) is not redundant for DC-NRM as we show in an example in the Appendix which illustrates that the optimal multipliers in *LRo* can be such that \(\sum _{i \in {\mathcal {I}}_S} \lambda _{i,S,t} < \sum _{j \in S} P_j(S) f_j\). This is in contrast to the perfect segmentation case where it is known that the optimal Lagrange multipliers can be interpreted as the pro-rated fare of a product on each resource that it uses, and consequently \(v^{\lambda }_{\phi , t} = 0\) for all *t*.

## 5 Tightness of the Lagrangian bound

In this section we formally show that the upper bound obtained by *LRo* is as strong as the *PL* bound. We begin with some preliminary results in Sect. 5.1 before proceeding to the proof of Proposition 2 in Sect. 5.2.

### 5.1 Preliminaries

*i*. On the other hand, letting \(R(S) = \sum _{j \in S} P_j(S) f_j\) denote the expected revenue from offering set

*S*and \(\lambda _{\phi , S, t} = R(S) - \sum _{i \in {\mathcal {I}}_S} \lambda _{i, S, t}\), we can write optimality equation (11) equivalently as

*LRo*turns out to be useful when we make the connection between the

*LRo*and

*PL*bounds.

### 5.2 Proof of Proposition 2

*PL*separation problem: For each period

*t*, given values of \(\{ v_{i,t}(r_i) \,|\, \forall t, i, r_i \in {\mathcal {R}}_i \}\), to check if

*t*for all \(\varvec{r}\) and \(S \subset {\mathcal {Q}}(\varvec{r})\). Otherwise, we find a violated constraint and add it to the linear program.

We show below that \(\Phi _t(v) = \Pi _t(v)\) and use this result to show that \(V^{PL} \ge V^{LRo}\). We begin with some preliminary results. Lemma 2 shows that \(\Pi _t(v)\) is an upper bound on \(\Phi _t(v)\). This is intuitive if we interpret \(\Pi _t(v)\) as being a relaxation of \(\Phi _t(v)\). We defer the formal proof to the Appendix.

### **Lemma 2**

\(\Phi _t(v) \le \Pi _t(v)\).

It remains to show that \(\Phi _t(v) \ge \Pi _t(v)\). At a high level, this is because the optimal Lagrange multipliers are able to coordinate the offer set decisions across the different resources. As a result, we are able to construct an offer set that is feasible to the maximization problem on the right hand side of (17) from the optimal solution to \(\Pi _t(v)\). The following lemma shows that we can restrict ourselves to sets in \({\mathcal {C}}_i = \{ S \,|\, i \in {\mathcal {I}}_S\}\) when solving the optimization problem for resource *i*.

### **Lemma 3**

### *Proof*

Appendix. \(\square \)

*i*, offer set \(S \in {\mathcal {C}}_i\) and \(r \in \{1, \ldots , r^1_{i} \}\). Let

*i*and offer set

*S*. We use the arguments \((\lambda , w)\) emphasize the dependence of the set of binding constraints on the given solution. Observe that if \(B_{ i, S}(\lambda , w) \) is empty, then \(w_{i,t} > \lambda _{i,S,t} - \sum _{j \in {\mathcal {J}}_i} P_j(S) \psi _{i,t+1}(r) + \Delta _{i,t}(r)\) for all \(r \in \{1, \ldots , r_{i,1} \}\).

From now on, we use a concept of optimal solutions with minimal set of binding constraints. The linear program \((LP_{\Pi _t(v)})\) has a finite optimal solution and possibly multiple ones. Naturally, any optimal solution has a set of binding constraints out of (18)–(22). Given any optimal solution, we can look for another optimal solution whose set of binding constraints is a strict subset of those of the previous optimal solution. If there is no such optimal solution, we consider that as having a minimal set of binding constraints. Lemmas 4–6 together imply that an optimal solution with a minimal set of binding constraints coordinates the offer set decision across the different resources. Letting \(({\hat{\lambda }}, {{\hat{w}}})\) be an optimal solution to \(LP_{\Pi _t(v)}\) with a minimal number of binding constraints.

### **Lemma 4**

*Fix a set S. Either*

(i) \(B_{i,S}({\hat{\lambda }}, {{\hat{w}}})\)* is nonempty for all*\(i \in {\mathcal {I}}_S\)* and*\({{\hat{w}}}_{\phi , t} = {\hat{\lambda }}_{\phi , S, t}\),* or*

(ii) \(B_{i,S}({\hat{\lambda }}, {{\hat{w}}})\)* is empty for all*\(i \in {\mathcal {I}}_S\)* and*\({{\hat{w}}}_{\phi , t} > {\hat{\lambda }}_{\phi , S, t}\).

### *Proof*

Suppose that the statement of the lemma is false. First consider the case where \(B_{i,S}({\hat{\lambda }}, {{\hat{w}}})\) is nonempty but \(B_{l,S}({\hat{\lambda }}, {{\hat{w}}})\) is empty for \(i, l \in {\mathcal {I}}_S\). Let \(\epsilon = \min _{r \in \{1, \ldots , r^1_{l} \}} \{ \xi _{l, S, t}(r) \} > 0\). Let \(({\widetilde{\lambda }}, {{\widetilde{w}}})\) be given by \({\widetilde{\lambda }}= {\hat{\lambda }}- \delta \varvec{e}^{i, S ,t} + \delta \varvec{e}^{l, S, t}\) and \({{\widetilde{w}}}= {{\hat{w}}}\) for some \(\delta \in (0, \epsilon )\), where \(\varvec{e}^{i,j,k}\) is a vector with a 1 in component (*i*, *j*, *k*) and zeroes everywhere else. Note that \(({\widetilde{\lambda }}, {{\widetilde{w}}})\) is identical to \(({\hat{\lambda }}, {{\hat{w}}})\) except that \({\widetilde{\lambda }}_{i,S,t} = {\hat{\lambda }}_{i,S,t} - \delta \) and \({\widetilde{\lambda }}_{l,S,t} = {\hat{\lambda }}_{l,S,t} + \delta \). We show that \(({\widetilde{\lambda }}, {{\widetilde{w}}})\) is an optimal solution with a strictly fewer number of binding constraints which gives us a contradiction.

*i*and

*l*and offer set

*S*as all other resources and offer sets continue to have the same \(\lambda \)’s and

*w*’s as before. For resource

*i*, since \(B_{i,S}({\hat{\lambda }}, {{\hat{w}}})\) is nonempty, there exists \(r \in \{1, \ldots , r^1_{i} \}\) such that \({{\hat{w}}}_{i,t} = {\hat{\lambda }}_{i, S, t} - \sum _{j \in {\mathcal {J}}_i} P_j(S) \psi _{i,t+1}(r) + \Delta _{i,t}(r)\). We have

*i*and offer set

*S*decreases by at least one. For resource

*l*and offer set

*S*, \({{\widetilde{w}}}_{l,t} - [{\widetilde{\lambda }}_{l,S,t} - \sum _{j \in {\mathcal {J}}_l} P_j(S) \psi _{l,t+1}(r) + \Delta _{l,t}(r)] = {{\hat{w}}}_{l,t} - [ {\hat{\lambda }}_{l,S,t} - \sum _{j \in {\mathcal {J}}_l} P_j(S) \psi _{l,t+1}(r) + \Delta _{l,t}(r)] - \delta > 0\), for all \(r \in \{1, \ldots , r^1_{l} \}\), where the inequality follows from the definition of \(\delta \). Therefore, all constraints (19) continue to be nonbinding for resource

*l*and offer set

*S*. Overall, \(({\widetilde{\lambda }}, {{\widetilde{w}}})\) has strictly fewer binding constraints than \(({\hat{\lambda }}, {{\hat{w}}})\), which gives a contradiction.

The above arguments imply that either \(B_{i,S}({\hat{\lambda }}, {{\hat{w}}})\) is nonempty for all \(i \in {\mathcal {I}}_S\) or \(B_{i,S}({\hat{\lambda }}, {{\hat{w}}})\) is empty for all \(i \in {\mathcal {I}}_S\). Suppose the \(B_{i,S}({\hat{\lambda }}, {{\hat{w}}})\) is nonempty for all \(i \in {\mathcal {I}}_S\) but \({{\hat{w}}}_{\phi , t} > {\hat{\lambda }}_{\phi , S t}\). In this case, pick a resource \(i \in {\mathcal {I}}_S\) and let \({\widetilde{\lambda }}= {\hat{\lambda }}- \delta \varvec{e}^{i ,S, t} + \delta \varvec{e}^{\phi ,S ,t}\) and \({{\widetilde{w}}}= {{\hat{w}}}\), where \(\delta \in (0, \epsilon )\) and \(\epsilon = {{\hat{w}}}_{\phi , t} - {\hat{\lambda }}_{\phi , S, t}\). It can be verified that the number of binding constraints (19) for resource *i* and offer set *S* strictly decreases from \(({\hat{\lambda }}, {{\hat{w}}})\) to \(({\widetilde{\lambda }}, {{\widetilde{w}}})\), while the number of binding constraints (20) remains unchanged, leading to a contradiction. This proves part (i) of the lemma.

On the other hand, if \(B_{i,S}({\hat{\lambda }}, {{\hat{w}}})\) is empty for all \(i \in {\mathcal {I}}_S\) but \({{\hat{w}}}_{\phi , t} = {\hat{\lambda }}_{\phi , S, t}\). Pick a resource \(i \in {\mathcal {I}}_S\). Since \(B_{i,S}({\hat{\lambda }}, {{\hat{w}}})\) is empty, we have \(\epsilon = \min _{r \in \{1, \ldots , r^1_{i} \}} \{ \xi _{i, S, t}(r) \} > 0\). Let \({\widetilde{\lambda }}= {\hat{\lambda }}+ \delta \varvec{e}^{i, S ,t} - \delta \varvec{e}^{\phi , S, t}\), where \(\delta \in (0, \epsilon )\), and \({{\widetilde{w}}}= {{\hat{w}}}\). It can be verified that the number of binding constraints (19) for resource *i* and offer set *S* is exactly the same for \(({\hat{\lambda }}, {{\hat{w}}})\) and \(({\widetilde{\lambda }}, {{\widetilde{w}}})\), while the number of binding constraints (20) decreases by one (\({{\widetilde{w}}}_{\phi ,t} = {{\hat{w}}}_{\phi , t} = {\hat{\lambda }}_{\phi , S, t} > {\widetilde{\lambda }}_{\phi ,S ,t}\)). As a result, \(({\widetilde{\lambda }}, {{\widetilde{w}}})\) has strictly fewer binding constraints than \(({\hat{\lambda }}, {{\hat{w}}})\), which gives a contradiction. This proves part (ii) of the lemma. \(\square \)

*i*. \(\hat{{\mathcal {C}}}_\phi \) has a similar interpretation, but for constraint (20).

### **Lemma 5**

*Let*\(({\hat{\lambda }}, {{\hat{w}}})\)* be an optimal solution to*\(LP_{\Pi _t(v)}\)* with a minimal number of binding constraints*.

(i) \(\hat{{\mathcal {C}}}_i\) is nonempty for all \(i \in \hat{{\mathcal {I}}}^+\).

(ii) If \(S \in \hat{{\mathcal {C}}}_i\), then \(S \in \hat{{\mathcal {C}}}_\phi \) (note that by definition the empty set does not consume any resources and so \(\emptyset \notin \hat{{\mathcal {C}}}_i\)).

### *Proof*

For part (i), for \(i \in \hat{{\mathcal {I}}}^+\), \({{\hat{w}}}_{i,t} > \Delta _{i,t}(0)\). Since \(({\hat{\lambda }}, {{\hat{w}}})\) is optimal there exists \(S \in {\mathcal {C}}_i\) and \(r \in \left\{ 1, \ldots , r^1_{i} \right\} \) such that \({{\hat{w}}}_{i,t} = {\hat{\lambda }}_{i, S, t} - \sum _{j \in S} P_j(S) \psi _{i, t+1}(r) + \Delta _{i,t}(r) > \Delta _{i,t}(0)\) (otherwise, we can reduce \({{\hat{w}}}_{i,t}\) contradicting optimality). Therefore \(B_{i,S}({\hat{\lambda }}, {{\hat{w}}})\) is nonempty and so \(S \in \hat{{\mathcal {C}}}_i\) and \(\hat{{\mathcal {C}}}_i\) is nonempty.

For part (ii), \(S \in \hat{{\mathcal {C}}}_i\) implies that \(i \in {\mathcal {I}}_S\). So we have a set *S* with \(B_{i,S}({\hat{\lambda }}, {{\hat{w}}})\) nonempty. By Lemma 4, \({{\hat{w}}}_{\phi , t} = {\hat{\lambda }}_{\phi , S, t}\) and so \(S \in \hat{{\mathcal {C}}}_\phi \). \(\square \)

Lemma 6 shows that we can find an offer set for which constraint (19) is binding for all resources *i*. This turns out to be crucial for showing that \(\Phi _t(v) \ge \Pi _t(v)\).

### **Lemma 6**

*Let* \(({\hat{\lambda }}, {{\hat{w}}})\) * be an optimal solution to* \(LP_{\Pi _t(v)}\) * with a minimal number of binding constraints. If* \(\hat{{\mathcal {I}}}^+\) * is nonempty, then* \(\cap _{i \in \hat{{\mathcal {I}}}^+} \hat{{\mathcal {C}}}_i\) * is nonempty.*

### *Proof*

If \(|\hat{{\mathcal {I}}}^+| =1\), then the statement holds trivially by part (i) of Lemma 5. Consider the case \(|\hat{{\mathcal {I}}}^+| >1\). If \(\cap _{i \in \hat{{\mathcal {I}}}^+} \hat{{\mathcal {C}}}_i\) is empty, then this implies the following: Fix a resource \(i \in \hat{{\mathcal {I}}}^+\). Part (i) of Lemma 5 implies that \(\hat{{\mathcal {C}}}_i\) is nonempty. Then for every \(S \in \hat{{\mathcal {C}}}_i\) there exists \(l \in \hat{{\mathcal {I}}}^+\) such that \(S \notin \hat{{\mathcal {C}}}_l\). Note that since \(l \in \hat{{\mathcal {I}}}^+\), \({{\hat{w}}}_{l,t} > \Delta _{l,t}(0)\).

So let \(i \in \hat{{\mathcal {I}}}^+\), \({{\hat{S}}}\in \hat{{\mathcal {C}}}_i\) and \(l \in \hat{{\mathcal {I}}}^+\) with \({{\hat{S}}}\notin \hat{{\mathcal {C}}}_l\). If \({{\hat{S}}}\notin \hat{{\mathcal {C}}}_l\), there are two possibilities. First, \({{\hat{S}}}\in {\mathcal {C}}_l\) but \(B_{l, {{\hat{S}}}}({\hat{\lambda }}, {{\hat{w}}})\) is empty. But since \({{\hat{S}}}\in \hat{{\mathcal {C}}}_i\), \(B_{i, {{\hat{S}}}}({\hat{\lambda }}, {{\hat{w}}})\) is nonempty, which this contradicts part (i) of Lemma 4.

We check that \(({\widetilde{\lambda }}, {{\widetilde{w}}})\) is feasible and look at the set of binding constraints associated with this solution. We look at the constraints in \(LP_{\Pi _t(v)}\) one by one and compare the the number of binding constraints in \(({\hat{\lambda }}, {{\hat{w}}})\) with the number in \(({\widetilde{\lambda }}, {{\widetilde{w}}})\).

*Constraints* (21) *and* (22)*:* Since \({{\widetilde{w}}}_{\phi ,t} > {{\hat{w}}}_{\phi ,t}\), constraint (21) continues to hold for \(({\widetilde{\lambda }}, {{\widetilde{w}}})\) and the number of binding constraints do not increase. By construction \(({\widetilde{\lambda }}, {{\widetilde{w}}})\) satisfies constraint (22).

*Constraints* (18)*:* Note that we need to check constraints (18) and (19) only for resource *l*. For resource *l*, we have \({{\widetilde{w}}}_{l,t} = {{\hat{w}}}_{l,t} - \delta > \Delta _{l,t}(0)\) and so constraint (18) continues to be nonbinding.

*Constraints* (19)*:* For \(S \in {\mathcal {C}}_l \backslash \hat{{\mathcal {C}}}_l\) and \(r \in \{1, \ldots , r^1_{l} \}\), we have \({{\widetilde{w}}}_{l,t} = {{\hat{w}}}_{l,t} - \delta > {{\hat{w}}}_{l,t} - \xi _{l ,S, t}(r) = {\hat{\lambda }}_{l ,S, t} - \sum _{j \in {\mathcal {J}}_i} P_j(S) \psi _{l,t+1}(r) + \Delta _{l,t}(r) = {\widetilde{\lambda }}_{l, S, t} - \sum _{j \in {\mathcal {J}}_i} P_j(S) \psi _{l,t+1}(r) + \Delta _{l,t}(r)\). Note that the last equality holds by definition of \({\widetilde{\lambda }}\). So constraint (19) remains nonbinding.

For \(S \in \hat{{\mathcal {C}}}_l\) and \(r \in \{1, \ldots , r_{l}^1\} \backslash B_{l,S}({\hat{\lambda }}, {{\hat{w}}})\), \({{\widetilde{w}}}_{l,t} = {{\hat{w}}}_{l,t} - \delta > {\hat{\lambda }}_{l, S, t} - \sum _{j \in {\mathcal {J}}_i} P_j(S) \psi _{l,t+1}(r) + \Delta _{l,t}(r) - \delta = {\widetilde{\lambda }}_{l, S, t} - \sum _{j \in {\mathcal {J}}_i} P_j(S) \psi _{l,t+1}(r) + \Delta _{l,t}(r)\). Therefore, constraint (19) continues to be nonbinding. For \(S \in \hat{{\mathcal {C}}}_l\) and \(r \in B_{l,S}({\hat{\lambda }}, {{\hat{w}}})\), \({{\widetilde{w}}}_{l,t} = {{\hat{w}}}_{l,t} - \delta = {\hat{\lambda }}_{l,t} - \delta - \sum _{j \in {\mathcal {J}}_i} P_j(S) \psi _{l,t+1}(r) + \Delta _{l,t}(r) = {\widetilde{\lambda }}_{l,t} - \sum _{j \in {\mathcal {J}}_i} P_j(S) \psi _{l,t+1}(r) + \Delta _{l,t}(r)\). So constraints (19) are binding for all such *S* and *r*. Note that these constraints, by definition, were also binding in \(({\hat{\lambda }}, {{\hat{w}}})\). So, \(({\widetilde{\lambda }}, {{\widetilde{w}}})\) satisfies constraints (18) and (19) for resource *l* and the number of binding constraints is exactly the same as in \(({\hat{\lambda }}, {{\hat{w}}})\).

*Constraint* (20)*:* For \(S \in \hat{{\mathcal {C}}}_l\), by definition \(B_{l,S}({\hat{\lambda }}, {{\hat{w}}})\) is nonempty. Part (i) of Lemma 4 implies that \({{\hat{w}}}_{\phi , t} = {\hat{\lambda }}_{\phi , S ,t}\), which means that constraint (20) is binding. We have \({{\widetilde{w}}}_{\phi , t} = {{\hat{w}}}_{\phi ,t} + \delta = {\hat{\lambda }}_{\phi , S t} + \delta = {\widetilde{\lambda }}_{\phi ,S ,t}\) and so constraint (20) holds and continues to be binding. For \(S \notin \hat{{\mathcal {C}}}_l\), \({\widetilde{\lambda }}_{l ,S, t} = {\hat{\lambda }}_{l ,S, t}\). Therefore, \({{\widetilde{w}}}_{\phi , t} = {{\hat{w}}}_{\phi ,t} + \delta \ge {\widetilde{\lambda }}_{l ,S, t}\). So constraint (20) holds and the number of binding constraints do not increase.

Now we argue that the number of binding constraints (20) strictly decreases from \(({\hat{\lambda }}, {{\hat{w}}})\) to \(({\widetilde{\lambda }}, {{\widetilde{w}}})\). For the set \({{\hat{S}}}\), since \({{\hat{S}}}\in \hat{{\mathcal {C}}}_i\), \(B_{i ,{{\hat{S}}}}({\hat{\lambda }}, {{\hat{w}}})\) is nonempty. By, part (i) of Lemma 4, \({{\hat{w}}}_{\phi , t} = {\hat{\lambda }}_{\phi ,{{\hat{S}}}, t}\) and so the constraint is binding in \(({\hat{\lambda }}, {{\hat{w}}})\). But \({{\widetilde{w}}}_{\phi ,t} = {{\hat{w}}}_{\phi ,t} + \delta > {\hat{\lambda }}_{\phi ,{{\hat{S}}}, t} = {\widetilde{\lambda }}_{\phi ,{{\hat{S}}}, t}\) and the constraint is nonbinding in \(({\widetilde{\lambda }}, {{\widetilde{w}}})\). Overall, \(({\widetilde{\lambda }}, {{\widetilde{w}}})\) has strictly fewer number of binding constraints (20) and they are a subset of the set of binding constraints of \(({\hat{\lambda }}, {{\hat{w}}})\) contradicting minimality.

Since \( {{\hat{w}}}_{\phi , t} + \sum _i {{\hat{w}}}_{i,t} = {{\widetilde{w}}}_{\phi , t} + \sum _i {{\widetilde{w}}}_{i,t}\), \(({\widetilde{\lambda }}, {{\widetilde{w}}})\) is optimal and this gives a contradiction. \(\square \)

We are now ready to show that \(\Phi _t(v) \ge \Pi _t(v)\).

### **Proposition 3**

\(\Phi _t(v) \ge \Pi _t(v)\).

### *Proof*

Let \(({\hat{\lambda }}, {{\hat{w}}})\) be an optimal solution to \(LP_{\Pi _t(v)}\) with a minimal number of binding constraints. We consider two cases.

Case 1: Suppose that \(\hat{{\mathcal {C}}}_i\) is empty for all *i*. This means that for all \(S \in {\mathcal {C}}_i\), \(B_{i,S}({\hat{\lambda }}, {{\hat{w}}})\) is empty. It follows that \({{\hat{w}}}_{i,t} = \Delta _{i,t}(0)\) for all *i* (otherwise we can reduce \({{\hat{w}}}_{i,t}\) contradicting optimality). Part (ii) of Lemma 4 implies that \({{\hat{w}}}_{\phi , t} > {\hat{\lambda }}_{\phi ,S ,t}\) for all *S*. It follows that \({{\hat{w}}}_{\phi , t} = 0\). Therefore, \(\Pi _t(v) = \sum _i \Delta _{i,t}(0)\). Note that \(\varvec{r} = \varvec{0}\) and \(S = \emptyset \) is feasible for \(\Phi _t(v)\) and the objective function value associated with this solution is \(\sum _i \Delta _{i,t}(0)\). Therefore \(\Phi _t(v) \ge \sum _i \Delta _{i,t}(0) = \Pi _t(v)\).

Case 2: Suppose that \(\hat{{\mathcal {C}}}_i\) is nonempty for some *i*. We consider two subcases.

*l*such that \(\hat{{\mathcal {C}}}_l\) is nonempty and choose \({{\hat{S}}}\in \hat{{\mathcal {C}}}_l\) such that \(B_{l, {{\hat{S}}}}({\hat{\lambda }}, {{\hat{w}}})\) is nonempty. By part (i) of Lemma 4, \(B_{i, {{\hat{S}}}}({\hat{\lambda }}, {{\hat{w}}})\) is nonempty for all \(i \in {\mathcal {I}}_{{{\hat{S}}}}\) and \({{\hat{w}}}_{\phi , t} = {\hat{\lambda }}_{\phi , {{\hat{S}}}, t}\). So, for all \(i \in {\mathcal {I}}_{{{\hat{S}}}}\), we have \({{\hat{w}}}_{i,t} = {\hat{\lambda }}_{i ,{{\hat{S}}}, t} - \sum _{j \in {\mathcal {J}}_i} P_j({{\hat{S}}}) \psi _{i,t+1}({{\hat{r}}}_i) + \Delta _{i,t}({{\hat{r}}}_i)\), where \({{\hat{r}}}_i \in B_{i, {{\hat{S}}}}({\hat{\lambda }}, {{\hat{w}}})\). Note that \({{\hat{r}}}_i \ge 1\) for all \(i \in {\mathcal {I}}_{{{\hat{S}}}}\). On the other hand, since \(\hat{{\mathcal {I}}}^+\) is empty. we have \({{\hat{w}}}_{i,t} = \Delta _{i,t}(0)\) for all

*i*. In particular, \({{\hat{w}}}_{i,t} = \Delta _{i,t}(0)\) for all \(i \notin {\mathcal {I}}_S\). Putting everything together,

*i*and so \({{\hat{S}}}\subset {\mathcal {Q}}(\varvec{r})\).

*l*, \(B_{l {{\hat{S}}}}({\hat{\lambda }}, {{\hat{w}}})\) is nonempty. Part (i) of Lemma 4 implies that \(B_{i, {{\hat{S}}}}({\hat{\lambda }}, {{\hat{w}}})\) is nonempty for all \(i \in {\mathcal {I}}_{{{\hat{S}}}}\) and that \({{\hat{w}}}_{\phi ,t} = {\hat{\lambda }}_{\phi ,{{\hat{S}}}, t}\). Since \(B_{i, {{\hat{S}}}}({\hat{\lambda }}, {{\hat{w}}})\) is nonempty for \(i \in {\mathcal {I}}_{{{\hat{S}}}}\), there exists \({{\hat{r}}}_i \in \left\{ 1, \ldots , r^1_{i}\right\} \) such that \({{\hat{w}}}_{i,t} = {\hat{\lambda }}_{i ,{{\hat{S}}}, t} - \sum _{j \in {\mathcal {J}}_i} P_j({{\hat{S}}}) \psi _{i,t+1}({{\hat{r}}}_i) + \Delta _{i,t}({{\hat{r}}}_i)\) (\({{\hat{r}}}_i\) need not be unique, we can pick any \(r \in B_{i, {{\hat{S}}}}({\hat{\lambda }}, {{\hat{w}}})\)). We have

*PL*as \( V^{PL} = \min _{v} \sum _i v_{i,1}(r^1_{i})\) subject to \(0 \ge \Phi _t(v)\) for all

*t*with the boundary condition that \(v_{i,\tau +1}(\cdot ) = 0\). Letting \( {\underline{V}} = \min _{v} \sum _i v_{i,1}(r^1_{i}) + \sum _t \Phi _t(v)\) subject to \(0 \ge \Phi _t(v)\) for all

*t*with the same boundary condition, it follows that \(V^{PL} \ge {\underline{V}}\). Using the fact that \(\Phi _t(v) = \Pi _t(v)\) together with the linear programming formulation \(LP_{\Pi _t(v)}\) of \(\Pi _t(v)\), we can write \({\underline{V}}\) as

*LRo*. Therefore, \(V^{PL} \ge V^{LRo}\). \(\square \)

## 6 Segment-based Lagrangian relaxation

Although the number of constraints in *LRo* scales linearly with the resource capacities, it is still exponential in the number of products. In this section, we present a tractable approximation that applies to settings where the total demand is comprised of multiple customer segments with small consideration sets. We first describe the demand model and then present the tractable variant of *LRo*, which we refer to as segment-based Lagrangian relaxation (*sLRo*).

### 6.1 Demand model with multiple customer segments

We consider the case where the total demand is comprised of demand from multiple customer segments. Modeling demand in this manner can be a way to capture heterogenous customer preferences. Moreover, each customer segment is interested only in a small subset of the products (its consideration set). Small consideration sets are a reasonable modeling tradeoff in many situations that also finds support in the empirical and behavioral literature. We let \({\mathcal {G}}\) denote the set of customer segments. Customer segment \(g \in {\mathcal {G}}\) has a *consideration set*\({\mathcal {J}}^g \subset {\mathcal {J}}\) of products that it considers for purchase. A segment *g* customer is indifferent to a product outside its consideration set, in the sense that the customer’s choice probabilities are not affected by products offered outside its consideration set. We assume that the consideration sets of the different customer segments are known to the firm by a previous process of estimation and analysis. We also assume that the consideration sets of the different segments are small enough for its power set to be enumerable.

In each period, we have exactly one customer arrival and an arriving customer belongs to segment *g* with probability \(\alpha ^g\). Since the total arrival rate is 1, we have \(\sum _{g} \alpha ^g = 1\). We let \(P^g_j(S)\) denote the probability that a segment *g* arrival purchases product *j* when *S* is the offer set. Since a segment *g* customer is indifferent to products outside its consideration set, we have \(P_j^g(S) = P_j^g(S \cap {\mathcal {J}}^g) = P_j^g(S^g)\), where \(S^g = S \cap {\mathcal {J}}^g\).

### 6.2 Tractable Lagrangian relaxation

*g*on resource

*i*at time period

*t*. As in

*LRo*, we relax the condition that the frequencies be identical across the resources by associating Lagrange multipliers \(\lambda ^g_{i, S^g, t}\) with them and solve the optimality equation

*i*, with the boundary condition that \(\vartheta ^\lambda _{i,\tau +1}(\cdot ) = 0\), where \({\mathcal {G}}_i = \{ g | \exists j \in {\mathcal {J}}^g \hbox { with } j \in {\mathcal {J}}_i\}\) can be interpreted as the set of customer segments that use resource

*i*. We modify optimality equation (11) in a similar manner for the case with multiple segments:

*sLRo*). Many of the results from Sect. 4 carry over to

*sLRo*. In particular, \(V^\lambda _t(\varvec{r}) = \sum _i \vartheta ^\lambda _{i,t}(r_i) + \vartheta ^\lambda _{\phi , t}\) gives us an upper bound on the value function. We find the tightest upper bound by solving

*sLRo*can be weaker than

*LRo*and we can have \(V^{LRo} < V^{sLRo}\). Still, an advantage of

*sLRo*is its computational tractability. Note that solving

*LRo*requires \({\mathcal {O}}( 2^{|{\mathcal {J}}|} )\) Lagrange multipliers which quickly becomes intractable. On the other hand,

*sLRo*requires \({\mathcal {O}}( \sum _g 2^{ | {\mathcal {J}}^g | })\) Lagrange multipliers, a much more manageable number provided the consideration set for each segment is small in size. Moreover, when the consideration sets of the segments overlap, it is possible to further tighten

*sLRo*by adding the product-cut equalities (PC-equalities) described in Meissner et al. [17] and Talluri [21]; we omit the details.

## 7 Computational experiments

In this section, we compare the upper bound and revenue performance of *sLRo* with benchmark solution methods. We consider two choice models in our numerical experiments. The first one is the MNL choice model and the second is the exponomial choice model. We begin by describing the two choice models and then describe the different benchmark solution methods and the experimental setup.

### 7.1 MNL choice model with multiple customer segments

*j*that is in the consideration set of segment

*g*. Similarly it associates a preference weight \(\omega ^g_0\) with a segment

*g*arrival not purchasing anything. The probability that a segment

*g*arrival purchases product

*j*when

*S*is the offer set is (Ben-Akiva and Lerman [3])

### 7.2 Exponomial choice model with multiple customer segments

*g*that are part of the offer set, then the probability that a segment

*g*arrival purchases product

*j*when

*S*is the offer set is (Alptekinoğlu and Semple [2])

### 7.3 Benchmark methods

*Choice Deterministic Linear Program (CDLP)* This is the solution method described in Sect. 3.1. If customer choice is according to the MNL model and the consideration sets are disjoint, then *CDLP* has an equivalent, compact sales based formulation that is described in Gallego et al. [8]. Therefor, we solve *CDLP* as a compact linear program. On the other hand, if customer choice is according to the exponomial model, to our knowledge there is no compact formulation of *CDLP*. Since *CDLP* has a large number of variables, we use column generation to solve *CDLP*. To our knowledge, there is no efficient method to solve the column generation sub-problem under the exponomial choice model and so we enumerate over the offer sets to solve the column generation sub-problem. The disjoint nature of the consideration sets makes the *CDLP* column generation sub-problem separable by the customer segments and this makes the enumeration procedure manageable. We stop the column generation procedure when we are within 1% of optimality. We obtain a bound on the optimality gap by using the optimal dual variables from the restricted master problem (see for example, Proposition 3 in Adelman [1]).

*Affine Approximation (AF)* This is the solution method described in Sect. 3.2. We use the reduced formulation described in Vossen and Zhang [25] to solve *AF*. While the number of variables in the reduced formulation is manageable, it still has a large number of constraints. We solve *AF* by generating constraints on the fly and stop when we are within 1% of optimality (Adelman [1]). For the MNL choice model, the separation problem of *AF* can be solved as a mixed-integer linear program (Zhang and Adelman [26]). However, we are not aware of a similar formulation for the exponomial choice model. Solving the separation problem by brute force involves enumerating over all possible offer sets *and* resource levels and becomes intractable.

*Segment-based Lagrangian Relaxation (sLRo)* This is the solution method described in Sect. 6. In our computational experiments, we use subgradient search to solve problem (23). We use a step size of \(250/\sqrt{k}\) at iteration *k* of the subgradient algorithm and run the algorithm for 200 iterations. Although, our step size selection does not guarantee convergence (Bertsekas [4], Chapter 7), it provided good solutions and stable performance in our test problems.

### 7.4 Hub-and-spoke network

We have a network with a single hub serving *N* spokes. There is one flight from each spoke to the hub and one flight from the hub to each spoke, so that there are 2*N* flights in total. Figure 1 shows the structure of the network with \(N=6\). Note that the flight legs correspond to the resources in our DC-NRM formulation.

The total number of fare-products is \(2N(N+1)\). There are 4*N* fare-products connecting hub-to-spoke and spoke-to-hub origin-destination pairs. Of these, half are high fare-products whose revenues are drawn from the Poisson distribution with a mean of 30, while the remaining are low fare-products whose revenues are drawn from the Poisson distribution with a mean of 10. There are \(2N(N-1)\) fare products connecting spoke-to-spoke origin-destination pairs. Half of them are high fare-products whose revenues are drawn from the Poisson distribution with a mean of 300, while the remaining are low fare-products whose revenues are drawn from the Poisson distribution with a mean of 100.

Each origin-destination pair is associated with a customer segment and a segment is only interested in the fare-products connecting its origin-destination pair. For the MNL choice model, the preference weights of the high fare-products are drawn from the Poisson distribution with a mean of 200, while that of the low fare-products are drawn from the Poisson distribution with a mean of 80. The no-purchase preference weights are drawn from the Poisson distribution with a mean of 10. We remark that we set the problem parameters in the same manner as Meissner and Strauss [16]. For the exponomial choice model, the preference weight (ideal utility) of each product is drawn uniformly from the interval \([-4, 18]\), while the preference weight for the no-purchase option is set to 1. We follow Alptekinoğlu and Semple [2] in setting the parameters of the exponomial choice model.

*g*when there is ample capacity on all the flight legs, the nominal load factor is

We first describe the results for the test problems where choice is according to the MNL model. Table 1 gives the upper bounds obtained by the different solution methods along with their run times. The first column in the table describes the problem characteristics by using \((N, \zeta )\). The second to fourth columns, respectively, give the upper bounds obtained by *CDLP, AF* and *sLRo*. The next two columns, respectively, give the percentage gap between the upper bounds obtained by *CDLP* and *AF* with respect to *sLRo*. The last three columns give the CPU seconds required by *CDLP*, *AF* and *sLRo*, respectively. All of our computational experiments are carried out on a Core i7 desktop with 3.4-Ghz CPU and 16-GB RAM. We use CPLEX 12.2 to solve the *CDLP* and *AF* linear programs. We implement the subgradient search algorithm to solve *sLRo* in C++. We see that *sLRo* obtains significantly tighter upper bounds than *CDLP* and *AF*. The average gap between the upper bounds obtained by *CDLP* and *sLRo* is around 4%, while that between *AF* and *sLRo* is around 3%. In terms of computation time, *CDLP* has a compact linear programming representation for the MNL model and so it solves in a fraction of a second. The running times of *AF* and *sLRo* are in minutes. However *sLRo* solves much faster than *AF* and the differences in run times are more noticeable as the size of the network, as measured by the number of spokes, increases.

*CDLP*, we use \({{\widetilde{V}}}_t(\varvec{r}) = \sum _i {\hat{\mu }}_i r_i\), where \({\hat{\mu }}= \{{\hat{\mu }}_i | \forall i \}\) are the optimal values of the dual variables associated with constraints (3). For

*AF*, we use \({{\widetilde{V}}}_t(\varvec{r}) = {\hat{\theta }} _t + \sum _i {\hat{V}}_{i,t} r_i\), where \({\hat{\theta }} = \{ {\hat{\theta }} _t | \forall t\}\), \({\hat{V}}= \{ {\hat{V}}_{i,t} | \forall i, t \}\) is an optimal solution to

*AF*. For

*sLRo*, we use \({{\widetilde{V}}}_t(\varvec{r}) = \sum _i \vartheta ^{{\hat{\lambda }}}_{i,t}(r_i) + \vartheta ^{{\hat{\lambda }}}_{\phi , t}\), where \({\hat{\lambda }}\) is an optimal solution to problem (23). For each solution method, we use the corresponding value function approximation \({{\widetilde{V}}}_t(\varvec{r})\), and if \(\varvec{r}^t\) is the vector of remaining resource capacities at time period

*t*, we use the policy of offering the set that attains the maximum in the optimization problem

*sLRo*generates higher revenues than the corresponding benchmark method at the 95% level and \(\odot \) if the revenue differences are not statistically significant at the 95% level.

*sLRo*on average generates revenues that are about 2% higher than

*CDLP*and about 1% higher than

*AF*.

*CDLP*and

*sLRo*along with their run times. As mentioned, solving the

*AF*separation problem becomes intractable for the exponomial choice model and so we do not have

*AF*as a benchmark solution method in the table. We see that

*sLRo*continues to obtain significantly tighter upper bounds than

*CDLP*and the

*sLRo*bound is on average about 5% tighter than the

*CDLP*bound.

*sLRo*is agnostic to the form of the choice model and its solution time does not change very much compared to the MNL case. However, for the exponomial choice model,

*CDLP*has to be solved using column generation and consequently the solution times are larger compared to the MNL case. Still,

*CDLP*solves in under a minute even for the exponomial choice model. Table 4 gives the expected revenues obtained by

*CDLP*and

*sLRo*. The results are similar to the MNL case and we see that

*sLRo*obtains significantly higher revenues than

*CDLP*, with an average revenue improvement of around 6%.

Comparison of the upper bounds on the optimal expected total revenue and the CPU times for the test problems with the MNL choice model

Problem | Upper bound | % Gap with sLRo | CPU s | |||||
---|---|---|---|---|---|---|---|---|

\((N, \zeta )\) | CDLP | AF | sLRo | CDLP | AF | CDLP | AF | sLRo |

(10, 1.0) | 30,296 | 30,025 | 29,495 | 2.72 | 1.80 | 0.60 | 655.67 | 302.41 |

(10, 1.2) | 29,345 | 29,069 | 28,265 | 3.82 | 2.84 | 0.58 | 954.98 | 259.06 |

(12, 1.0) | 29,863 | 29,543 | 28,988 | 3.02 | 1.91 | 0.97 | 1004.97 | 381.74 |

(12, 1.2) | 29,109 | 28,754 | 27,801 | 4.71 | 3.43 | 0.96 | 1707.46 | 320.48 |

(14, 1.0) | 29,820 | 29,417 | 28,803 | 3.53 | 2.13 | 1.49 | 1800.82 | 467.31 |

(14, 1.2) | 28,946 | 28,468 | 27,442 | 5.48 | 3.74 | 1.49 | 3081.48 | 389.43 |

Avg. | 3.88 | 2.64 |

Comparison of the expected revenues for the test problems with the MNL choice model

Problem | Expected revenue | % Gap with sLRo | |||
---|---|---|---|---|---|

\((N, \zeta )\) | CDLP | AF | sLRo | CDLP | AF |

(10, 1.0) | 28,727 | 28,830 | 29,390 | 2.26 \(\checkmark \) | 1.91 \(\checkmark \) |

(10, 1.2) | 27,434 | 27,495 | 27,974 | 1.93 \(\checkmark \) | 1.71 \(\checkmark \) |

(12, 1.0) | 28,196 | 28,405 | 28,803 | 2.11 \(\checkmark \) | 1.38 \(\checkmark \) |

(12, 1.2) | 26,796 | 27,337 | 27,432 | 2.32 \(\checkmark \) | 0.35 \(\odot \) |

(14, 1.0) | 28,161 | 28,527 | 28,551 | 1.37 \(\checkmark \) | 0.09 \(\odot \) |

(14, 1.2) | 26,441 | 26,605 | 26,881 | 1.64 \(\checkmark \) | 1.03 \(\checkmark \) |

Avg. | 1.94 | 1.08 |

Comparison of the upper bounds on the optimal expected total revenue and the CPU times for the test problems with the exponomial choice model

Problem | Upper bound | % Gap with sLRo | CPU s | ||
---|---|---|---|---|---|

\((N, \zeta )\) | CDLP | sLRo | CDLP | CDLP | sLRo |

(10, 1.0) | 18,966 | 18,102 | 4.77 | 11.68 | 245.66 |

(10, 1.2) | 18,241 | 17,283 | 5.54 | 15.78 | 210.11 |

(12, 1.0) | 18,024 | 17,246 | 4.51 | 30.04 | 277.27 |

(12, 1.2) | 17,395 | 16,526 | 5.26 | 38.23 | 245.68 |

(14, 1.0) | 20,613 | 20,001 | 3.06 | 35.30 | 330.26 |

(14, 1.2) | 19,930 | 19,001 | 4.89 | 57.93 | 292.71 |

Avg. | 4.67 |

Comparison of the expected revenues for the test problems with the exponomial choice model

Problem | Expected revenue | % Gap with sLRo | |
---|---|---|---|

\((N, \zeta )\) | CDLP | sLRo | CDLP |

(10, 1.0) | 18,932 | 19,756 | 4.17 \(\checkmark \) |

(10, 1.2) | 17,530 | 18,636 | 5.93 \(\checkmark \) |

(12, 1.0) | 16,031 | 17,086 | 6.18 \(\checkmark \) |

(12, 1.2) | 15,079 | 16,332 | 7.67 \(\checkmark \) |

(14, 1.0) | 17,238 | 18,096 | 4.74 \(\checkmark \) |

(14, 1.2) | 15,847 | 17,178 | 7.75 \(\checkmark \) |

Avg. | 6.07 |

## 8 Conclusions

In this paper we develop a new Lagrangian relaxation approach for DC-NRM with strong theoretical properties and numerical performance. We show that the Lagrangian relaxation equals the piecewise-linear approximation with the number of constraints scaling linearly with the resource capacities, compared to the exponential number when solving the piecewise-linear approximation directly. We build on these ideas and proposed a segment-based relaxation that is tractable when the consideration sets of the different customer segments are small. Our numerical experiments show that the proposed approach can provide significant benefits, both in terms of tighter upper bounds and higher expected revenues. Finally, we note that our results apply at the highest level of generality, for any discrete-choice model of customer demand behavior. An interesting future research direction is to see if more tractable methods can be obtained by specializing to specific choice models such as MNL or nested logit.

## Notes

## References

- 1.Adelman, D.: Dynamic bid-prices in revenue management. Oper. Res.
**55**(4), 647–661 (2007)MathSciNetCrossRefzbMATHGoogle Scholar - 2.Alptekinoğlu, Aydin, Semple, John H.: The exponomial choice model: a new alternative for assortment and price optimization. Oper. Res.
**64**(1), 79–93 (2016)MathSciNetCrossRefzbMATHGoogle Scholar - 3.Ben-Akiva, M., Lerman, S.: Discrete-Choice Analysis: Theory and Application to Travel Demand. MIT Press, Cambridge (1985)Google Scholar
- 4.Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)zbMATHGoogle Scholar
- 5.Bront, J.J.M., Méndez-Díaz, I., Vulcano, G.: A column generation algorithm for choice-based network revenue management. Oper. Res.
**57**(3), 769–784 (2009)CrossRefzbMATHGoogle Scholar - 6.Chen, L., Homem de Mello, T.: Mathematical programming models for revenue management under customer choice. Eur. J. Oper. Res.
**203**(2), 294–305 (2010)CrossRefzbMATHGoogle Scholar - 7.Gallego, G., Iyengar, G., Phillips, R., Dubey, A.: Managing flexible products on a network. Tech. Rep. TR-2004-01, Dept of Industrial Engineering, Columbia University, NY (2004)Google Scholar
- 8.Gallego, G., Ratliff, R., Shebalov, S.: A general attraction model and sales-based linear program for network revenue management under customer choice. Oper. Res.
**63**(1), 212–232 (2015)MathSciNetCrossRefzbMATHGoogle Scholar - 9.Hauser, J.R., Wernerfelt, B.: An evaluation cost model of consideration sets. J. Consum. Res.
**16**, 393–408 (1990)CrossRefGoogle Scholar - 10.Iyengar, S.S., Lepper, M.: When choice is demotivating: Can one desire too much of a good thing? J. Personal. Soc. Psychol.
**76**, 995–1006 (2000)CrossRefGoogle Scholar - 11.Kunnumkal, S., Talluri, K.T.: A new compact linear programming formulation for choice network revenue management. Tech. Rep., Universitat Pompeu Fabra (2012)Google Scholar
- 12.Kunnumkal, S., Talluri, K.: Technical note—a note on relaxations of the choice network revenue management dynamic program. Oper. Res.
**64**(1), 158–166 (2016a)MathSciNetCrossRefzbMATHGoogle Scholar - 13.Kunnumkal, S., Talluri, K.T.: On a piecewise-linear approximation for network revenue management. Math. Oper. Res.
**41**(1), 72–91 (2016b)MathSciNetCrossRefzbMATHGoogle Scholar - 14.Kunnumkal, S., Topaloglu, H.: A new dynamic programming decomposition method for the network revenue management problem with customer choice behavior. Prod. Oper. Manag.
**19**(5), 575–590 (2010)CrossRefGoogle Scholar - 15.Liu, Q., van Ryzin, G.J.: On the choice-based linear programming model for network revenue management. Manuf. Serv. Oper. Manag.
**10**(2), 288–310 (2008)CrossRefGoogle Scholar - 16.Meissner, J., Strauss, A.K.: Network revenue management with inventory-sensitive bid prices and customer choice. Eur. J. Oper. Res.
**216**(2), 459–468 (2012)MathSciNetCrossRefzbMATHGoogle Scholar - 17.Meissner, J., Strauss, A.K., Talluri, K.T.: An enhanced concave programming method for choice network revenue management. Prod. Oper. Manag.
**22**(1), 71–87 (2013)CrossRefGoogle Scholar - 18.Rusmevichientong, P., Shmoys, D., Tong, C., Topaloglu, H.: Assortment optimization under the multinomial logit model with random choice parameters. Prod. Oper. Manag.
**23**(11), 2023–2039 (2014)CrossRefGoogle Scholar - 19.Schweitzer, P., Seidmann, A.: Generalized polynomial approximations in markovian decision processes. J. Math. Anal. Appl.
**110**, 568–582 (1985)MathSciNetCrossRefzbMATHGoogle Scholar - 20.Talluri, K.T.: Airline revenue management with passenger routing control: a new model with solution approaches. Int. J. Serv. Technol. Manag.
**2**, 102–115 (2001)CrossRefGoogle Scholar - 21.Talluri, K.T.: New formulations for choice network revenue management. INFORMS J. Comput.
**26**(2), 401–413 (2014)MathSciNetCrossRefzbMATHGoogle Scholar - 22.Talluri, K.T., van Ryzin, G.J.: Revenue management under a general discrete choice model of consumer behavior. Manag. Sci.
**50**(1), 15–33 (2004a)CrossRefzbMATHGoogle Scholar - 23.Talluri, K.T., van Ryzin, G.J.: The Theory and Practice of Revenue Management. Kluwer, New York (2004b)CrossRefzbMATHGoogle Scholar
- 24.Topaloglu, H.: Using Lagrangian relaxation to compute capacity-dependent bid prices in network revenue management. Oper. Res.
**57**, 637–649 (2009)MathSciNetCrossRefzbMATHGoogle Scholar - 25.Vossen, T.W.M., Zhang, D.: Reductions of approximate linear program for network revenue management. Oper. Res.
**63**(6), 1352–1371 (2015)MathSciNetCrossRefzbMATHGoogle Scholar - 26.Zhang, D., Adelman, D.: An approximate dynamic programming approach to network revenue management with customer choice. Transp. Sci.
**43**(3), 381–394 (2009)CrossRefGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.