1 Introduction and literature review

A feasible and widely used strategy for handling malfunctions in any business process is to wait until the malfunction occurs and then, as a reaction to an error, replace the defective part. This strategy comes with considerable losses, since in the time period the repair takes place, functionality is at least reduced, very often suspended.

Looking at the field of aviation, it occurs quite often that a more or less important part (ranging from passenger seats to minor important electronic parts to sensors relevant for flight safety) of an aircraft has an unexpected malfunction of a certain criticality. The criticality of parts is defined in so-called minimum equipment lists (MEL), that also determine rules for the replacement necessities of spare parts. Depending on the criticality of the malfunctioning part, the aircraft might be grounded at an airport where it arrives after the malfunction occurs. At certain airports the repair of the aircraft is quite difficult due to a lack of spare parts or educated personnel. Furthermore, the route assigned to the aircraft might not be easily assignable to another (reserve) aircraft. If this is the case, the malfunction leads to uncontrolled delay and in some cases also to flight cancellations. Delay and flight cancellations lead to complex re-planning activity with the aim to restore the feasibility of the flight schedule. To a certain extent, the described problems are avoidable, as we will demonstrate in this paper.

The underlying methodology is based on recent developments integrating data analysis and optimal decision making, which enable to derive predictions about maintenance events that may become necessary in the near future. This is called Predictive Maintenance (PM). The idea of PM in this context is that, given that a large amount of sensor data for the aircraft under consideration is created, it is possible to find some correlations between the measured sensor data and near future maintenance necessities. According to [1], PM generates considerable business benefits compared to alternatives like reactive or preventive approaches, if it can be integrated into the airlines operational optimization processes.

In this article we focus on the utilization of the output of such a PM functionality, provided that we can access it. To integrate the PM functionality into the aircraft-flight assignment optimization, a prediction of a PM device, which is considered to be a probability distribution governing the failure time of a certain spare part, is translated into a set of disruption scenarios.

Our contributions to the state-of-the-art are the following: First, a mathematical optimization model for Tail Assignment is adapted to be able to optimize the reactions to certain failure scenarios. This results in a large stochastic optimization problem. The functionality of the stochastic optimization problem is to find a planning Tail Assignment solution and corresponding recovery plans for each failure scenario, such that the planning costs plus the expected recovery costs for the failure scenarios are minimized. Second, a solution methodology for the stochastic optimization problem based on the L-shaped method (see [2] for a detailed introduction into stochastic programming) is developed and presented. Due to the structure of the stochastic optimization problem, we have to adapt the L-shaped method to be applicable. We present the necessary details to implement the algorithm and demonstrate its feasibility. Third, we present an empirical study providing information about the practical relevance of the approach, using a set of instances based on a pre-pandemic schedule of a large German airline. It is shown that the proposed algorithm outperforms several heuristics.

The presented approach extends the state-of-the-art. To the best of our knowledge, a solution framework as presented has never appeared in the literature before. The only approach we know dealing with recovery decisions in the planning stage is presented in [3]. It calculates robust schedules with scenarios consisting of flight leg cancellations and airport closures. The approach is flight string based. Costs for switching between different aircraft or aircraft types are penalized with the same constant. We extend this work in the following way: We present how to handle aircraft-on-ground scenarios, our approach is connection-based and hence easier to implement, and our approach is able to take individual switching costs into account.

The Tail Assignment (TA) problem has been modeled by a variety of scientists. In [4], a graph to model flight networks which is called the connection network is introduced. The article states to carry out “fleet assignment”, but basically the model has more potential, since it is possible to model individual aircraft and furthermore track the routes of the aircraft without any additional effort. The slightly easier assignment of aircraft types (not individual aircraft) to flight legs has been modeled by [5]. In this article, the time-space network has been introduced, which is also a graph network, but has a structure based on a thought time line for every airport. In [6], a model based on indicator variables for the assignment of aircraft (types) to whole sequences of flights, objects called flight strings, is introduced. This approach has the drawback that its implementation is quite challenging compared to the other two presented, since the number of variables grows exponentially with the problem size, prohibiting their explicit handling. A so-called branch-and-price framework is necessary, which combines integer optimization techniques with column generation. Branch-and-price is quite incompatible with many state-of-the-art optimization programs, like, e.g., Gurobi [7], which do not intend variables to be inserted during the solution process. An interesting approach to scale up solution algorithms for optimization models to be suitable for larger problem instances can be found in [8], where the so-called rolling-horizon approach is described, which divides a problem over a large time horizon into smaller overlapping sub problems and solves them subsequently. There exist several attempts to integrate two or more related optimization problems, like, e.g., runway scheduling as described in [9], pre-tactical runway scheduling as described in [10], or crew assignment as described in [11] with the Tail Assignment problem. In [12], fleet assignment is combined with aircraft routing and crew assignment. The large optimization model is solved with the help of Benders Decomposition. In [13], the tail assignment problem is solved at the recovery stage of airline operations, taking into account acceleration of certain flights to improve the reactions to certain disruptions. In [14], the passenger flow is modeled explicitly to find good aircraft assignments.

An approach combining Predictive Maintenance and Tail Assignment is not known to the authors.

In the literature, there are a couple of methodologies for models to optimize the Tail Assignment taking into account different kinds of operational uncertainty. In [15], an approach is presented which aims to find the aircraft assignment with the lowest potential for propagated delay. To be more precise, the authors assume a probability distribution on each flight leg for the so-called primary delay (which occurs unavoidably and originally caused on the flight leg). It is assumed that too short connections after flight legs which are likely to be (primarily) delayed cause so-called propagated delay. The aim is to reduce the probability or the expected value of the propagated delay. Furthermore, in [16], an approach is presented which is based on flight strings, and for each flight string the expected delay is calculated and penalized in the objective function. In [17], several methods to incorporate delay considerations into the tail assignment problem are presented, including extreme value approaches and chance constraints. These approaches have in common that they do not consider re-assignment of aircraft during operations. Instead, the approaches aim for schedules that are as robust as possible from the beginning, under the assumption that they never change. This is reasonable to generate plans that are resilient against delays, which occur frequently. For disruptions resulting in extreme disturbances, considering dynamic re-planning is reasonable for generating optimal plans. The latter is our setting, since the predicted failures occur with a considerably lower frequency, but cause more severe disruptions.

Another idea to optimize flight schedules under operational uncertainty is presented in [18]. In this thesis, a methodology to maximize the number of swapping opportunities in a given schedule is presented. In case of a severe aircraft disruption, routes with higher priorities can then be served more likely, while routes with lower priorities are canceled, even if the affected aircraft has been originally assigned to the prior.

A similar idea based on route prioritization is presented in the thesis [19].

According to [20], the resilience of a flight schedule rises, if aircraft rotations are designed in a way, that many aircraft of the same type are serving as many as possible flights involving an airport, while as few as possible aircraft types are serving flights at this airport. This concept is known as “station purity”. This leads to rotations, which can easily be re-planned if an aircraft malfunction occurs. Furthermore, the concept of short cycles is introduced, i.e., the number of flights an aircraft serves until it returns to a hub airport assigned to the aircraft is minimized. The justification is that in case of a flight cancellation, there is a certain probability that all flights in its cycle have to be canceled, and hence, keeping the cycles short mitigates the negative impact of this disruption.

Beyond optimizing indicators for schedule robustness, there exist few approaches incorporating the reaction to certain disruption scenarios explicitly. In [3], a two-stage recoverable robust (or rather a stochastic one) optimization model is presented. The Tail Assignment problem is modeled using a flight string based approach based on [6]. Depending on a set of first stage decisions which determines the planned assignment of aircraft to flight strings, decisions to recover from a finite set of disruption scenarios including flight cancellations and airport closures are explicitly included into the model. The huge optimization model is then solved using the well-known L-shaped method (see [2] for a detailed introduction into modeling and solution methods from stochastic programming).

The structure of the paper is the following. In Sect. 2 we present a tiny example motivating the problem statement. In Sect. 3, the Tail Assignment model we use is presented. In Sect. 4 the information which might be provided by a Predictive Maintenance device is briefly introduced. In Sect. 5 we describe how this information is incorporated into the Tail Assignment model. In Sect. 6 we present our solution approach. In Sect. 7 we present our computational results. In Sect. 8 we discuss our results, give an outlook to possible future research directions and state concluding remarks.

2 An introductory example

In this section, an illustrative example is presented, which demonstrates that an exact solution algorithm for the integration of malfunction prediction into the Tail Assignment optimization process outperforms several intuitive heuristic approaches.

Consider a (part of a) flight schedule consisting of six flights, which are sequentially served by two different aircraft, Aircraft A and Aircraft B. The schedule is summarized in Table 1.

Table 1 Example flight schedule

Now we consider a PM device which gives us the information that with an estimated probability of 20% each, a part of Aircraft A will be broken after the next flights, which causes the necessity to replace the part immediately (i.e., at the next arrival station) after its malfunction. To be more precise, the probability that the part breaks after exactly one flight is 0.2, the probability that the part breaks after exactly two flights is 0.2, and the probability that the part breaks after three or more flights is 0.6. The part can be replaced at the airports HUB1 and HUB2 with a time duration of one additional hour added to the obligatory 30 min turnaround which an aircraft has to go through after each arrival. If the part breaks immediately before the aircraft arrives station OUT, the aircraft cannot be repaired and is out of service This enforces a cancellation in our example. We penalize a flight cancellation with 180 (monetary units) and delay minutes linearly with 1 (monetary unit; i.e., we assume that a passenger with a flight cancellation is approximately as angry/has comparable claims for compensation payments as a passenger with three hours delay for a short-range flight). Assuming that currently it is 10:30, the operator has certain options to resolve the situation.

  1. (a)

    Lean back and see what happens (reactive approach). In this case, the probability is 20% that Aircraft A has a malfunction after the first flight, at the cost of 120 min of delay, since the part is then replaced at HUB1 and flights 2 and 3 are delayed by 1 h each. The probability is 20% that the malfunction occurs after the second flight, Flight 3 has to be canceled, because the part cannot be replaced at OUT. The penalty for this is assumed to be 180. The probability is 60% that the malfunction occurs after flight 3, with a penalty of 0. In total, the expected penalty is 60.

  2. (b)

    Maintain at the first possible station (preventive approach). In this case, the part is exchanged after the first arrival of Aircraft A, causing a one hour delay for flights 2 and 3, independently on the time when the malfunction occurs. The expected penalty is 120.

  3. (c)

    Maintain at the first suitable station (passive approach). This approach exchanges the part at the next station where it is possible and the maintenance procedure does not generate delay. For this example, this would be to exchange the part after Flight 3, and the calculation is identical to case (a), i.e., the expected penalty is 60.

  4. (d)

    Maintain at the first possible station, and reassign flights afterwards (preventive approach +). This strategy schedules the maintenance event for Aircraft A at the first possible opportunity, i.e., immediately after Flight 1. Afterwards, the aircraft assignment is re-optimized to reduce the penalty. The result is that Aircraft A is assigned to flights 5 and 6, while Aircraft B is assigned to Flight 2 and Flight 3 after Flight 4. The expected penalty is 30, since half an hour delay is caused in any case.

The best strategy which can be chosen for this example is the following: We re-assign the flights as in case (d), but we do not schedule the maintenance event after Flight 1. Instead, the maintenance event is carried out when the malfunction occurs, leading to an expected penalty of 12, since with 40% probability, we have 30 min delay, and with 60% probability, no penalty occurs.

The drawback of the optimal strategy is, that, in contrast to approaches (a)–(d), solutions following this strategy are not easy to determine, especially, if the operator uses decision support software, which is incapable of taking uncertain events into account.

The stochastic optimization framework we present in this paper takes into account the reactions to many realizations of the moment of the malfunction of the defective part at once and overcomes the drawbacks of the presented heuristic approaches.

Hence, using this framework, an automatized optimization software is able to determine a schedule, such that the expected penalty caused by the malfunction is minimized.

In the next section, a basic mathematical optimization model for the Tail Assignment problem is introduced.

3 Basic Tail Assignment modeling

We describe our modeling of the TA problem in this section. It is based on [4]. The graph network underlying the model is a connection network. We consider the aircraft as initial nodes of this network and the flight legs as inner nodes of this network. The arcs of the network represent feasible flight connections or connections of aircraft and flight leg. The latter represents that the flight leg is the first leg that is served by the aircraft. All necessary input data for the optimization problem are summarized in Table 2.

Table 2 Input data of TA problem

The input consists of a set of airports S, a set of aircraft types K, and a schedule of flight legs L. For each aircraft type \(k\in K\), the set \(Q_k\) contains the individual aircraft of this type. The compatibility of aircraft types and flight legs is expressed by (redundant) sets \(L_k\) and \(K_l\), i.e., the set of flight legs an aircraft type can serve or the set of aircraft types which can serve a particular flight leg.

The set A contains the set of feasible flight connections, i.e., it contains every pair of flight legs which can be served consecutively by at least one aircraft type. The set A also contains pairs (ql) whenever leg l can be the first flight leg of aircraft q. The compatibility of a flight connection and an aircraft type is expressed by the set \(A_k\), respectively. \(A_k\) contains pairs of flight legs (ij) whenever leg j can be flown immediately after leg i by the same aircraft of type k. \(A_k\) also contains pairs of aircraft and flight leg (ql), whenever leg l can be the first flight leg of aircraft q of type k.

Each flight leg has departure time, arrival time, departure station and arrival station \(t^{{\textrm{dep}}}_l\), \(t^{{\textrm{arr}}}_l\), \(s^{{\textrm{dep}}}_l\), \(s^{{\textrm{arr}}}_l\), as well as cancellation costs \(c_l\). The assignment costs of an aircraft type k to a flight leg l are \(c_{lk}\). Using this as input, we can set up optimization problem (1).

$$\begin{aligned} \min \sum _{l\in L}c_l z_l + \sum \limits _{k\in K} \sum \limits _{l\in L_k} c_{lk} x_{lk}&\end{aligned}$$
(1a)
$$\begin{aligned} \text{ s.t. } \sum \limits _{k\in K_l} x_{lk} + z_l = 1&\quad \forall l\in L\end{aligned}$$
(1b)
$$\begin{aligned} \sum \limits _{a\in A_k: l=a^-} x_{ak} \le x_{lk}&\quad \forall l\in L_k, k\in K\end{aligned}$$
(1c)
$$\begin{aligned} \sum \limits _{a\in A_k: l=a^+} x_{ak} = x_{lk}&\quad \forall l\in L_k, k\in K\end{aligned}$$
(1d)
$$\begin{aligned} \sum \limits _{a\in A_k: a^-=q} x_{ak} \le 1&\quad \forall q\in Q_k, k \in K \end{aligned}$$
(1e)
$$\begin{aligned} x, z \text { binary}. \end{aligned}$$
(1f)

Problem (1) is a Tail Assignment model without maintenance scheduling, formulated over a connection network. It contains connection variables \(x_{ak}\), fleet-flight assignment variables \(x_{lk}\) and cancellation variables \(z_l\). The interpretation of the model variables are documented in Table 3. We note that \(a^-\) denotes the first element of a pair a, and \(a^+\) denotes the second element of a pair a.

Table 3 Model variables of Problem (1)

The objective function (1a) minimizes the sum of cancellation costs (\(c_l z_l\)) and explicit fleet-flight assignment costs (\(c_{lk} x_{lk}\)). Cancellations are formally allowed, but are supposed not to occur in an optimal planning solution. Constraints (1b) assure either flight coverage by exactly one aircraft type or flight cancellation (then flight coverage by zero fleets) for each flight leg \(l\in L\). Constraints (1c) and (1d) ensure that each flight has a predecessor connection and at most one successor connection. Constraint (1e) ensures that each aircraft is used for at most one flight sequence.

Remark

Model (1) does not incorporate flight delays. To incorporate flight delays, two methods are known to the authors. The first one is to introduce copies for every flight leg/aircraft type combination, each one assigned to a certain amount of delay. Popular choices are 5, 10, 15 or 30 min steps with not more than 10 copies per flight leg, to keep the model size acceptably small. We refer the interested reader to, e.g., [16], which describes the idea of this modeling technique well, even though it is presented for a flight route based approach. The second one is to introduce explicit continuous variables for the delay of each flight leg. These variables can be brought into relation with the connection variables using linear constrains, following, e.g., the lines of [21]. The article presents an approach for a similar setting, that tracks the time which has passed since the last line maintenance event taken place for a specific aircraft.

It is possible to transfer the approach presented in this article quite straight-forwardly to Tail Assignment models that take delay into account. To keep the presentation of the approach sufficiently simple, we abstain from integrating delay considerations into our optimization models.

Now that the basic modeling approach for the Tail Assignment problem is introduced, we proceed with a description how we consider the output of a Predictive Maintenance device.

4 Data provided by predictive maintenance devices

In this section, we describe in detail what kind of information we consider to be provided by a Predictive Maintenance (PM) device. PM devices sometimes produce point estimates of the so-called “remaining useful life” of the corresponding part, often with a certain confidence level. It is also plausible that some PM devices deliver confidence bands for the remaining useful life of the part. Both require necessarily the estimation of quantiles of the random variable “part malfunction time”, which is the one we are interested in. Assuming the output of a PM device also provides the functionality to handle uncertainty in its output. Whenever a device has a non-negative false-positive rate, this can also be translated into a “disruption” scenario where no failure occurs. This can be directly taken into account by our framework. The quantiles translate into an approximate encoding of the underlying distribution. Hence, as already mentioned in Sect. 1, we consider the output of the device an estimated probability distribution, which gives information about the risk of failure of a specific aircraft part. This risk depends on the time which has passed since the PM device has estimated the distribution. The distribution is considered to be encoded as a finite collection of quantiles of the distribution function.

Beyond that, no further specifications on the distribution of the part failure time variable are needed. The approach we present is universally compatible with any PM device fulfilling these specifications. Later, in Sect. 7, we will specify which distribution has been used to generate our computational proof of concept.

Formally, a PM device provides a particular aircraft \({\hat{q}} \in Q_k\) for a fleet \(k\in K\), a corresponding spare part \({\hat{\pi }}\) which is element of the set of spare parts under consideration, denoted as \(\varPi \). Furthermore, the device is giving a discrete probability distribution on a set of positive reals \(\{t_\xi : \xi \in \varXi \}\), defined using an arbitrary finite basis set \(\varXi \). The probability distribution can be defined by numbers \(p_\xi \in [0, 1], \xi \in \varXi \).

Furthermore, to quantify the effort and duration necessary to carry out repairs, we consider the information summarized in Table 4 to be available. Beyond the set of spare parts \(\varPi \), a replacement of a part causes certain costs \(c^{{\textrm{rep}}}_{\pi , s}\), depending on the airport where it takes place, and takes a certain amount of time \(t^{{\textrm{rep}}}_{\pi , s}\), also depending on the airport. For all airports, at which the replacement cannot be carried out, this value is set to \(+\infty \).

Table 4 Additional input data for part replacements

In the next section, we will describe how this information can be integrated into an optimization procedure for the Tail Assignment problem.

5 A novel model for Tail Assignment with predictive maintenance

We start this section with the description of the adaptations which have to be made to Model (1) if one is interested in incorporating replacement measures and recovering from a certain (deterministic) disruption given an initial assignment of aircraft (types) to flight legs. Thereafter, we will describe how the integrated planning and recovery model can be solved using the L-shaped method, as described in [2].

To incorporate replacement measures, the aircraft \({\hat{q}}\in Q_k\) which is affected by the malfunction prediction is transferred into a separate, artificial, newly generated fleet. We remove \({\hat{q}}\) from \(Q_k\), add a fleet \({\hat{k}}\) to K (with all parameters identical to the original fleet k), and set \(Q_{{\hat{k}}}:= \{{\hat{q}}\}\). The interpretation of the artificial fleet \({\hat{k}}\) is “aircraft \({\hat{q}}\) before the repair is carried out”. Additionally, we copy this newly created fleet once again to get another artificial fleet \({\tilde{k}}\), which has characteristics identical to the original fleet. The interpretation of the artificial fleet \({\tilde{k}}\) is “aircraft \({\hat{q}}\) after the repair is carried out”. Furthermore, a set \(A^{{\textrm{rep}}}\) is introduced, containing all flight connections on which the replacement measure can take place without generation of additional delay. Formally, if the aircraft under consideration is \({\hat{q}}\), belonging to fleet k, and the spare part which is affected by the malfunction prediction is \(\pi \),

$$\begin{aligned} A^{{\textrm{rep}}}:= \left\{ (i, j) \in A_k: t^{{\textrm{arr}}}_i + t^{{\textrm{rep}}}_{\pi , s_{(i, j)}} \le t^{{\textrm{dep}}}_j\right\} , \end{aligned}$$

where \(s_{(i, j)}\) is the airport of connection (ij), and \(t^{{\textrm{arr}}}_i\) is considered to be 0 for \(i = {\hat{q}}\) and to be \(\infty \) for \(i\in Q_k {\setminus } \{{\hat{q}}\}\).

We want to note that \(A^{{\textrm{rep}}}\) as defined above is very sensitive to \(t^{{\textrm{rep}}}\). Little changes in \(t^{{\textrm{rep}}}\) can change \(A^{{\textrm{rep}}}\) considerably, leading to completely different solutions in the end. Furthermore, it is on the one hand possible to miss very beneficial repair connections that have a duration that is only a couple of minutes lower than \(t^{{\textrm{rep}}}\). On the other hand, it is possible to not distinguish between very tight repair arcs with a duration that is only a little bit higher than \(t^{{\textrm{rep}}}\) and repair arcs with considerably higher duration. The latter are especially to prefer if \(t^{{\textrm{rep}}}\) cannot be determined with a high precision. To circumvent both cases, we suggest a variant of the definition of \(A^{{\textrm{rep}}}\), which contains also repair connections which are shorter than the repair time. Further, connections with a short duration are penalized, which reflects in the model by increased cost parameters for the respective repair arc variables. Another possibility to circumvent these effects is to incorporate delay considerations explicitly into the model. For ease of presentation, we abstain from further discussions in this direction.

Besides the variable sets for our fleet copies \({\hat{k}}\) and \({\tilde{k}}\), we introduce additional binary variables \(x^{{\textrm{rep}}}_a\) for all \(a \in A^{{\textrm{rep}}}\) to Model (1). These variables equal 1 if and only if flight connection a is served by the affected aircraft, and the replacement of the part for which a malfunction is predicted is taking place at this connection, and 0 otherwise. We modify Constraint (1c) for fleet copy \({\hat{k}}\) and all flight legs \(l \in L_{{\hat{k}}}\) to

$$\begin{aligned} \sum \limits _{a\in A_k: l=a^-} x_{ak} + \sum _{a \in A^{{\textrm{rep}}}: l=a^-} x_a^{{\textrm{rep}}} \le x_{lk},\qquad \qquad \qquad (1\textrm{c}^{*}) \end{aligned}$$

Constraint (1e) for q to

$$\begin{aligned} \sum \limits _{a\in A_k: a^-= {\hat{q}}} x_{ak} + \sum _{a \in A^{{\textrm{rep}}}: a^- = {\hat{q}}} x_a^{{\textrm{rep}}} \le 1.\qquad \qquad \quad \qquad (1\textrm{e}^{*}) \end{aligned}$$

We modify Constraint (1d) for fleet copy \({\tilde{k}}\) and all flight legs \(l\in L_{{\tilde{k}}}\) to

$$\begin{aligned} \sum \limits _{a\in A_k: l=a^+} x_{ak} + \sum _{a\in A^{{\textrm{rep}}}: l = a^-} x^{{\textrm{rep}}}_a = x_{lk}.\qquad \qquad \quad \qquad (1\textrm{d}^{*}) \end{aligned}$$

Finally, we restrict all variables \(x_{a{\tilde{k}}}\) to 0 for all \(\{a \in A: a^- = {\hat{q}}\}\).

To model recovery from a certain malfunction event, we introduce a aircraft type swap cost matrix for each flight leg \(l \in L\). To be more precise, we introduce parameters \({\tilde{c}}_{lkk'}\ge 0, l\in L, k, k' \in K_l\cup \{\text {canc}\}\). Their interpretation is: If flight leg l has been planned to be served by aircraft type k (or has been canceled, “canc”), then additional costs of \({\tilde{c}}_{lkk'}\) emerge when in the recovery to a malfunction event aircraft type \(k'\) serves the flight (or it is canceled). If, for example, a flight leg is re-planned from a larger aircraft type k to a smaller aircraft type \(k'\), the additional costs could be the estimated costs for re-booking all the passengers exceeding the seat capacity of aircraft type \(k'\), minus the difference of operating costs of the two aircraft types on the flight leg, plus a constant penalty for administrative effort connected to re-planning measures.

As stated in Sect. 4, the output of a PM device is considered to give a distribution on a finite set of times, i.e., we have a finite set of scenarios \(\varXi \) and for every \(\xi \in \varXi \) a particular probability \(p_\xi \) and a time \(t_\xi \). Depending on the scenario, we can define the set of flight legs departing earlier than the scenario time \(t_\xi \). Formally, this is

$$\begin{aligned} L^\xi := \left\{ l\in L: t^{{\textrm{dep}}}_l < t_\xi \right\} . \end{aligned}$$

Furthermore, given a feasible solution \(({\bar{x}}, {\bar{z}})\) of the modified Model (1), we define for each flight leg \(l\in L\) the fleet which is assigned to the flight leg in the planning solution as \(k^*_l\), formally,

$$\begin{aligned} k^*_l:= {\left\{ \begin{array}{ll} k \in K_l: {\bar{x}}_{lk} = 1 &{}\quad \text { if } {\bar{z}}_l = 0\\ \text {canc} &{}\quad \text { if } {\bar{z}}_l = 1. \end{array}\right. } \end{aligned}$$

Additionally, we re-define the set of connections for each aircraft type to express the set of connections which are allowed to be used if malfunction scenario \(\xi \) occurs as \(A^\xi _k, k\in K\). Formally,

$$\begin{aligned} A^\xi _k = {\left\{ \begin{array}{ll} A_k &{}\quad \text { if } k \ne {\hat{k}}\\ \left\{ (i, j) \in A_k: t^{{\textrm{dep}}}_j \le t_\xi \right\} &{}\quad \text { otherwise.} \end{array}\right. } \end{aligned}$$

This set ensures that that after the moment of the occurrence of the malfunction, the aircraft which is affected is not allowed to be used without a repair, i.e., without going over to the connection network of the repaired copy \({\tilde{k}}\) using a replacement arc.

Using these definitions, we set up the following continuously relaxed mixed-integer linear optimization problem for each scenario \(\xi \), modeling the recovery in case the malfunction encoded in \(\xi \) occurs, depending on a planning solution \(({\bar{x}}, {\bar{z}})\).

$$\begin{aligned}R_\xi ({\bar{x}},{\bar{z}}) &:= \min \sum _{l\in L} {\tilde{c}}_{lk^*_l\text {canc}} z^\xi _l \nonumber \\&\quad + \sum _{k \in K_l} {\tilde{c}}_{lk^*_lk} x^\xi _{lk} + \sum _{a \in A^{{\textrm{rep}}}} c^{{\textrm{rep}}}_{{\hat{\pi }}, s_a} x_a^{{\textrm{rep}}, \xi } \end{aligned}$$
(2a)
$$\begin{aligned}&\quad \text {s.t. }\sum _{k \in K_l} x^\xi _{lk} + z^\xi _{l} = 1 \quad \forall l\in L\end{aligned}$$
(2b)
$$\begin{aligned}&\quad \sum \limits _{a\in A_k^\xi : l=a^-} x^\xi _{ak} \le x^\xi _{lk} \quad \forall l\in L_k, k\in K{\setminus } \{{\hat{k}}\}\end{aligned}$$
(2c)
$$\begin{aligned}&\quad \sum \limits _{a\in A_{{\hat{k}}}^\xi : l=a^-} x^\xi _{a{\hat{k}}} + \sum _{a \in A^{{\textrm{rep}}}: l=a^-} x_a^{{\textrm{rep}},\xi } \le x^\xi _{l{\hat{k}}} \quad \forall l\in L_{{\hat{k}}}\end{aligned}$$
(2d)
$$\begin{aligned}&\quad \sum \limits _{a\in A_k^\xi : l=a^+} x^\xi _{ak} = x^\xi _{lk} \quad \forall l\in L_k, k\in K {\setminus } \{{\tilde{k}}\}\end{aligned}$$
(2e)
$$\begin{aligned}&\quad \sum \limits _{a\in A_{{\tilde{k}}}^\xi : l=a^+} x^\xi _{a{\tilde{k}}} + \sum _{a \in A^{{\textrm{rep}}}: l=a^+} x_a^{{\textrm{rep}},\xi } = x^\xi _{l{\tilde{k}}} \quad \forall l\in L_{{\tilde{k}}}\end{aligned}$$
(2f)
$$\begin{aligned}&\quad \sum \limits _{a\in A_k^\xi : a^-=q} x^\xi _{ak} \le 1\quad \forall q\in Q_k, k \in K {\setminus } \{{\hat{k}}, {\tilde{k}}\}\end{aligned}$$
(2g)
$$\begin{aligned}&\quad \sum \limits _{a\in A_{{\hat{k}}}^\xi : a^-={\hat{q}}} x^\xi _{a{\hat{k}}} + \sum _{a \in A^{{\textrm{rep}}}: a^- = {\hat{q}}} x^{{\textrm{rep}},\xi }_a \le 1\end{aligned}$$
(2h)
$$\begin{aligned}&\quad x^\xi _{a{\tilde{k}}} = 0 \quad \forall a \in A^\xi _{{\tilde{k}}}: a^- = {\hat{q}}\end{aligned}$$
(2i)
$$\begin{aligned}&\quad x^\xi _{lk} = {\bar{x}}_{lk} \quad \forall k\in K_l, l \in L^\xi \end{aligned}$$
(2j)
$$\begin{aligned}&\quad x^\xi , z^\xi \ge 0. \end{aligned}$$
(2k)

The interpretation of the model is quite similar to the interpretation of Model (1). The objective function (2a) minimizes the sum of the recovery costs plus the replacement costs. Constraints (2b) make sure that in the recovery solution, every flight leg is served by exactly one aircraft type or is canceled. Constraints (2c)–(2i) ascertain correct predecessor/successor relationships between the aircraft type/flight leg assignments, whereby aircraft type \({\hat{k}}\) has the possibility to take an expensive replacement arc to transform into fleet \({\tilde{k}}\). Constraints (2j) ensure that the recovery solution only differs from the planning solution after the malfunction has actually occurred.

6 Innovative solution approach

What remains to present is an appropriate solution procedure. The approach we present is capable of finding a global optimal solution under some mild assumptions. The approach is based on a specification of Benders Decomposition, introduced originally by [22]. For an excellent survey of Benders Decomposition we refer the interested reader to [23].

Benders Decomposition is often applied to scenario-expanded stochastic programs. In this context, Benders Decomposition is also known as the L-shaped method. Benders Decomposition is known to be very efficient in solving scenario-expanded stochastic programs. The reason is that these problems decompose into a small master problem and a sub problem with block-diagonal structure. Benders Decomposition offers the possibility to solve many small optimization problems (i.e., the master problem and each block of the sub problem) individually, instead of solving one huge optimization problem at once, which would be necessary if no decomposition approach is applied. This has, e.g., the advantage that the memory requirements become drastically lower and that the solution procedure can be parallelized straight-forwardly. An excellent description of the L-shaped method and its applications, as well as stochastic optimization in general, can be found in [2].

Benders Decomposition, applied to our problem statement, aims for a feasible solution to Model (1), which minimizes the planning costs plus the expected recovery costs given the planning solution which is chosen. To be precise, the objective function is modified to

$$\begin{aligned} \min \sum _{l \in L} \left( c_l z_l + \sum _{k \in K_l} c_{lk} x_{lk}\right) + \sum _{\xi \in \varXi } p_\xi R_{\xi }(x, z). \end{aligned}$$

Since it is very difficult to find a closed analytical expression for \(R_\xi , \xi \in \varXi \), the idea of Benders Decomposition is to generate a sequence of linear functionals underestimating \(R_\xi \), which are derived of solutions of Model (2), in the following referred to as the sub problem. These functionals can easily be incorporated as cutting planes into Model (1) , which is in the following referred to as the master problem. To be precise, in the master objective function, the functions \(R_\xi \) are replaced by additional variables \(\sigma _\xi \ge 0, \xi \in \varXi \), leading to

$$\begin{aligned} \min \sum _{l \in L} \left( c_l z_l + \sum _{k \in K_l} c_{lk} x_{lk}\right) + \sum _{\xi \in \varXi } p_\xi \sigma _\xi . \end{aligned}$$

Linear functionals \(f^\omega _\xi , \omega \in {\mathbb {N}} \) underestimating \(R_\xi \) are generated dynamically during the solution process, and constraints of the form

$$\begin{aligned} \sigma _\xi \ge f^\omega _\xi (x, z) \end{aligned}$$

are added to the master problem. According to the theory of [22], functionals underestimating \(R_\xi \) can be generated with the help of optimal (or at least feasible) solutions of the dual of Model (2), as Lemma 1 states.

Before, we need the following definition of an auxiliary quantity \(\varPsi ^\xi \), enabling to write down the formula for the valid underestimating functional for \(R_\xi \) in a compact form in Lemma 1 afterwards.

Definition 1

Let \(({\bar{x}}, {\bar{z}})\) be a feasible solution of Model (1) and let \((\bar{\beta }, \bar{\gamma }, \bar{\varepsilon }, \bar{\eta }, \bar{\zeta }, \bar{\vartheta }, \bar{\varphi }, \bar{\tau }, \bar{\rho }, \bar{\kappa })\) be an (appropriately defined) optimal solution of the dual of Model (2) corresponding to \(({\bar{x}}, {\bar{z}})\) and \(\xi \), i.e., \(\bar{\beta }\) is the vector of dual multipliers for Constraint (2b), \(\bar{\gamma }\) is the vector of dual multipliers for Constraint (2c), etc., then we define for \(l \in L\), \(k \in K\), where \({\tilde{K}}:= K {{\setminus }}\{{\hat{k}}, {\tilde{k}}\}\)

$$\begin{aligned} \tilde{\varPsi }^\xi _{lk}({\bar{x}}, {\bar{z}}):= {\left\{ \begin{array}{ll} \bar{\beta }_l - \bar{\gamma }_{lk} - \bar{\eta }_{lk},&{}\quad ~k\in {\tilde{K}}\\ \bar{\beta }_l - \bar{\varepsilon }_{lk} - \bar{\eta }_{lk},&{}\quad ~k = {\hat{k}}\\ \bar{\beta }_l - \bar{\gamma }_{lk} - \bar{\zeta }_{lk},&{}\quad ~k = {\tilde{k}}\\ \end{array}\right. } \end{aligned}$$

and we define

$$\begin{aligned} \varPsi ^\xi _{lk}({\bar{x}}, {\bar{z}}):= {\left\{ \begin{array}{ll} \max \{0, \tilde{\varPsi }^\xi _{lk}({\bar{x}}, {\bar{z}}) + \bar{\kappa }_{lk}\},&{}\quad l\in L^\xi \\ \max \{0, \tilde{\varPsi }^\xi _{lk}({\bar{x}}, {\bar{z}})\},&{}\quad l\in L{\setminus } L^\xi . \end{array}\right. } \end{aligned}$$

Furthermore, we define

$$\begin{aligned} \varPsi ^\xi _{l, \text {canc}}({\bar{x}},{\bar{z}}):= \max \{0, \bar{\beta }_l\} \end{aligned}$$

With the help of this definition, we state the following.

Lemma 1

Let \(({\bar{x}}, {\bar{z}})\) be a feasible solution of Model (1) and let \((\bar{\beta }, \bar{\gamma }, \bar{\varepsilon }, \bar{\eta }, \bar{\zeta }, \bar{\vartheta }, \bar{\varphi }, \bar{\tau }, \bar{\rho }, \bar{\kappa })\) be an (appropriately defined) optimal solution of the dual of Model (2) corresponding to \(({\bar{x}}, {\bar{z}})\) and \(\xi \), then,

$$\begin{aligned} R_\xi (x, z)&\ge \sum _{l \in L^\xi }\sum _{k \in K_l} \bar{\kappa }_{lk} x_{lk} \\&\quad +\bar{\tau } + \sum _{k \in K{\setminus } \{{\hat{k}}, {\tilde{k}}\}} \sum _{q\in Q_k} \bar{\varphi }_{qk} + \sum _{l\in L} \bar{\beta }_l\\&\quad + \sum _{l\in L: {\bar{z}}_l = 0} (1 - x_{lk_l^*}) \sum _{k \in K_l \cup \{\text {canc}\}} \varPsi ^\xi _{lk}({\bar{x}}, {\bar{z}})\\&\quad + \sum _{l\in L: {\bar{z}}_l = 1} (1 - z_l) \sum _{k \in K_l \cup \{\text {canc}\}} \varPsi ^\xi _{lk}({\bar{x}}, {\bar{z}}) \end{aligned}$$

for all (xz) feasible for Model (1), with equality in case \((x, z) = ({\bar{x}}, {\bar{z}})\).

The technical proof can be found in the “Appendix”. The statement of the Lemma 1 can be exploited to formulate Algorithm 1, which is capable of solving the presented model using an adaptation of Benders Decomposition known as Branch and Benders Cut (B &BC).

B &BC is a variant of Benders Decomposition that can be applied when the master problem is non-convex or has some integral variables and is solved with algorithms that are based on Branch-and-Cut (B &C), see, e.g., [24]. The standard version of Benders Decomposition solves the current master problem to optimality in each step, before adding one Benders cut to the master problem, which can be time consuming and inefficient. B &BC embeds the generation of Benders cuts into the B &C run, and generates Benders cuts at different stages of the B &C algorithm, for example, when new integral solutions are found or when some nodes are solved to optimality. This saves computational resources especially if solving a single sub problem is considerably cheaper than solving the master problem to optimality. Literature explaining this approach in detail can be found in [23].

Algorithm 1 gets Model (1) as input, where \(R_\xi \) is encoded with the help of Model (2). The first step is to start a B &C framework to solve Model (1). It is important that the framework has the possibility to insert user specified cuts during the optimization run. The algorithm then loops over the following steps, until no violated Benders cut can be identified for the current solution: Whenever an integral feasible solution of Model (1) is detected by the framework, Model (2) is solved, and a corresponding valid underestimating functional is determined and the corresponding cut is added to Model (1). We want to note that generally all solutions of the dual of Model (2) define a valid Benders cut. Nevertheless, in case multiple dual optimal solutions exist, there is a choice to make. Several techniques exist to find the “best” Benders cut, e.g., the strategy to identify a pareto-optimal Benders cut as described in [25], or the strategy to choose a cut that is implied by a so-called minimal infeasible system as presented in [26]. We used the latter to carry out our computational experiment, since it performed exceptionally well. The last step is to determine an integer feasible solution for Model (2) for each scenario.

We want to note that whenever an integer feasible solution of Model (1) is identified by the algorithm, this solution can be implemented in practice. Hence, the algorithm generates a lot of potentially usable intermediate results throughout the whole solution process and is hence useful, even if it has to be terminated early due to time restrictions.

figure a

In the following, we will demonstrate how the presented algorithm performs on real-world based schedules.

7 Computational results

In this section, the results of our computational study are presented. All programs have been written using the programming language Python, version 3.10. Optimization problems have been solved using the Gurobi optimizer, version 9.5.1, see [7]. The programs have been executed on a node of a high performance computing cluster using a Xeon E3-1240 v6 CPU with four cores at 3.7GHz base frequency and with a total memory of 32 GB. All calculations are terminated after 3000 s with the best solution found so far, or whenever a relative optimality gap of \(10^{-4}\) is reached.

The instances have been created based on real-world schedules. Each instance consists of a flight schedule data, fleet data, aircraft data and airport data. Each flight leg in the schedule has start/arrival station/time as attributes. Furthermore it has a distance attribute indicating the distance between the origin and destination of the flight leg. It has a demand attribute indicating how many passengers want to use it and a compensation attribute indicating the amount of money which has to be paid if a passenger who wants to use the leg cannot use it. Each fleet has a capacity indicating how many passengers can be transported, and furthermore a field containing the costs which emerge if this aircraft is used to fly one kilometer. The costs of assigning an aircraft type k to a specific flight leg l are calculated as the sum of passenger dissatisfaction costs (\(\max \{\text {demand}_l - \text {capacity}_k, 0\} \cdot \text {compensation}_l)\) and operational costs (\(\text {distance}_l \cdot \text {kmcosts}_k)\). For each aircraft it is denoted at which airport it is located in the beginning of the schedule. For each airport, costs and replacement time of a spare part which is prone to the malfunction is defined. We note here that there is no reason to take more than one kind of spare parts into consideration for the proof of concept which is presented here, since changing the properties of the spare part does not influence the problem structure. For all instances, we have assumed that the malfunction can occur during the whole length of the schedule, following a discrete uniform probability distribution. To be precise, we first specified the number of scenarios, and then determined a set of equidistant points in time which represent the failure time of the respective scenario. They are all considered to have the identical probabilities.

Remark

We are fully aware that the calculation of assignment costs and the output of the PM device is strongly simplified here. Nevertheless, it is the case that our methodology is applicable, even if the assignment costs and the failure time distribution are calculated using a more complex methodology. Hence, the proposed strategy is suitable to demonstrate the capabilities of the approach.

We have created 18 instances, with flight legs ranging from 26 to 80. The schedules are built such that the number of aircraft available is enough to serve all the flights while being all necessary, i.e., a group of aircraft of one less would not be able to serve the whole schedule. Additionally, the schedule is set up in a way that an immediate replacement of the malfunction part enforces a couple of flight cancellations, i.e., the schedule begins with a number of quite early parallel flights equal to the total number of aircraft.

Beyond the solution algorithm presented in Sect. 5, we have applied four benchmark heuristics to solve the problems, which have already been sketched in the introductory Sect. 1.

The first heuristic is to not react to the malfunction prediction, and evaluating how the assignment behaves which is optimized for the case that no malfunction occurs. The second forces the replacement of the affected spare part immediately at the next airport on the flight string of the aircraft where the replacement can be executed, without changing the initial Tail Assignment. The third forces the replacement of the affected spare part immediately at the next airport on the flight string of the aircraft where the replacement can be executed. Afterwards, the aircraft assignment is reoptimized. The fourth does not change the initial aircraft assignment, but it schedules the part replacement to the first time slot where it is feasible along the flight string assigned to the affected aircraft without causing flight cancellations.

In Fig. 1, the cost performance of Algorithm 1 compared to the best of the benchmark algorithms is summarized. The graph is to read as “for x-axis percent of the instances, at least y-axis percent of the expected costs could be saved compared to the best benchmark”.

All but one instance could be solved to optimality by Algorithm 1. Nevertheless, the solution determined by the algorithm is better than the best of the heuristically determined ones.

The results show that the proposed strategy outperforms (at least in the cases presented) the benchmark approaches.

Fig. 1
figure 1

Relative difference of solution values calculated by the novel Algorithm 1 compared with the best benchmark algorithm

We also investigated the runtime behavior of Algorithm 1. It is summarized in Fig. 2. The red dots represent the runtimes of the 17 of 18 test instances which could be solved to optimality.

The graph shows that the runtimes for the instances (except a few outliers) are within 5 min, which is a realistic time frame an operator might invest to recalculate an aircraft assignment. Hence, it is realistic to say that the approach can be applied in practice.

Fig. 2
figure 2

Runtime behavior of Algorithm 1

We conclude with Fig. 3, which compares the nominal solution values (initial assignment costs of the solutions without considerations of replanning necessities which might occur) of the stochastic solution calculated by Algorithm 1 and the solution which is optimal for the case that no malfunction occurs, which is calculated by the first benchmark algorithm.

Fig. 3
figure 3

Relative difference in nominal solution values of planning and stochastic solution

Since the costs calculated by this benchmark procedure represent a lower bound on the total costs that emerge from operating the schedules (which applies, e.g., if the Predictive Maintenance device delivering the prediction provides a false-positive), it makes sense to investigate if the nominal solution values of the solutions calculated by Algorithm 1 increase too much.

The results imply that the operational costs emerging when implementing the solution proposed by Algorithm 1 are not excessively high (below 25% for all but 2 instances) compared with the amount of money the solutions calculated by a classical Tail Assignment optimizer seem to cost.

Compared with the potential expected savings a stochastic solution yields, which are presented in Fig. 1, the operational cost increment caused by the implementation of the stochastic solution is low.

To summarize, the computational results presented strongly support that the proposed methodology is applicable in practice for appropriate schedule sizes, and that considerable savings of expected operational costs emerged by the malfunctions can be made, with a low increment in the nominal operational costs.

In the next section, we will present a few suggestions for further research based on the results presented here.

8 Summary, outlook

We have presented a basic model for the Tail Assignment problem and a formal framework for the structure of a malfunction prediction. In addition, we have presented a stochastic optimization approach which is capable of optimizing an aircraft assignment incorporating a single malfunction prediction such that the assignment costs plus the expected costs to recover from the malfunction are minimized. The computational results prove that the approach can be applied to schedules with up to 80 flights within a reasonable amount of time. The results furthermore prove that the savings which can be made by the utilization of the proposed methodology are quite high, if compared to heuristic benchmark approaches which could be applied to react to malfunction predictions.

The presented study in which malfunction predictions are incorporated into the aircraft assignment process using a stochastic optimization framework, shows a positive result. Nevertheless, we present a couple of interesting research directions which could be followed.

In general, incorporating delay considerations is possible. It would be interesting to implement the method for an optimization model which incorporates delay considerations.

To be applicable to larger schedule sizes, it is interesting to investigate how the approach can be incorporated into a metaheuristic framework like, e.g., the rolling-horizon approach as presented in [8]. This approach splits optimization problems ranging over a large time horizon into several smaller optimization problems with overlapping time horizons, solving them sequentially. It is proved that this framework produces excellent results in practice and pertains solution quality guarantees.

The approach could further be applied to the situation with more than one malfunction prediction, delivering feasible instead of optimal solutions for the problem under investigation, and upper and lower bounds on the solution value of the problem.

Furthermore, we note that the instances investigated have the property that the schedules are quite tight. It would be interesting to investigate how the approach behaves in case that the malfunction prediction occurs when there is enough time to fix the problem immediately, or when the schedule is so loose that the aircraft suffering the malfunction is dispensable.

Another interesting direction for further investigation is to test how the approach performs in a practical environment. This includes interaction with, e.g, the passenger flow, or crew constraints. The approach can further reach different performance levels on different airline network types. Depending on the outcomes of these experiments, it could be beneficial to incorporate the re-planning of passenger routing decisions or crew assignment decisions directly into the presented framework.