1 Introduction

In many practical applications, the decision maker’s utility function is of a step-wise nature even though the underlying measure and its dependence on the real-valued optimization variables is continuous. For example, in radiotherapy one tries to have a sufficiently high dose in the target and then, provided this is the case, as little dose in surrounding organs at risk as possible. While the dose values in those nearby organs are continuously dependent on administered radiation, from a physician’s point of view they are judged on whether they exceed certain threshold values associated with particular side effects. Thereby, a dose in an organ might be labeled as either “dangerously high”, “acceptable”, or “ideal”. The multi-criteria decision making would then deliberate on the trade-offs between different organs based on these broad categorizations.

We formalize this observation by defining the following type of multi-criteria optimization problem:

$$\begin{aligned} \mathcal {P}: \quad \min _{\varvec{x}\in M} d_{1}(f_{1}(\varvec{x})),\dots ,d_{k}(f_{k}(\varvec{x})) \end{aligned}$$
(1)

where we assume \(M\subseteq \mathbb {R}^{n}\) to be compact and non-empty, \(f_{1},\dots ,f_{k}: M\rightarrow \mathbb {R}^{}\) to be continuous functions, and \(d_{1},\dots ,d_{k}:\mathbb {R}^{}\rightarrow \mathbb {R}^{}\) to be monotone increasing. We let \(\varvec{f}:= (f_{1},\dots ,f_{k})\) and \(\varvec{d}:=(d_{1}\circ \pi _1,\dots ,d_{k}\circ \pi _{k})\), where \(\pi _{i}\) denotes the projection from a vector in \(\mathbb {R}^{k}\) to the i-th component. We assume that there is no a priori preference between the objectives.

To establish the discrete nature of our problem, we require the following assumption for the monotone increasing utility functions \(d_{i}\):

Assumption 1.1

(Discrete utility) For \(i \in \{1,\ldots ,k\}\) there is a finite set \(\Lambda _i \subset \mathbb {R}^{}\) such that \(d_{i}: \mathbb {R}^{}\rightarrow \Lambda _i\).

In the literature, there is a vast amount of general-purpose approaches that can be applied to multi-criteria problems with discrete and continuous decision variables and objectives alike (Steuer and Choo 1983; Benson and Sayin 1997; Das and Dennis 1998; Schandl et al. 2002). While these are applicable to \(\mathcal {P}\), they do not take into account its special structure. An efficient algorithm for \(\mathcal {P}\) should ideally combine certain characteristics of algorithms intended specifically for continuous problems with those specific for discrete problems.

When facing a problem with the structure of \(\mathcal {P}\), the aim is to find the non-dominated solutions with respect to the discrete utility function evaluations \(d_{i}\). However, we also have an underlying continuous problem that we can potentially capitalize on:

$$\begin{aligned} \mathcal {P}^{c}: \quad \min _{\varvec{x}\in M} f_{1}(\varvec{x}),\dots ,f_{k}(\varvec{x}). \end{aligned}$$
(2)

Similar to how non-dominated solutions are obtained in the purely continuous setting, efficient solutions to (2) can be calculated with potentially very efficient general-purpose continuous solvers, such as IPOPT (Wächter and Biegler 2006) or knitro (Byrd et al. 2006). Suitable scalarization methods are weighted sum if the problem is convex, and \(\epsilon \)-constraint (see e.g. Ehrgott 2005), or Pascoletti–Serafini scalarization (Pascoletti and Serafini 1984) in the case of non-convex problems. Particularly suited approaches for approximating the Pareto front of a continuous problem are sandwiching (Serna 2012; Bokrantz and Forsgren 2012) in case the problem is convex, and hypervolume or hyperboxing algorithms (Bringmann and Friedrich 2010; Teichert 2014) if the problem is non-convex.

However, algorithms for continuous problems are approximation algorithms: they aim at calculating a discrete set of non-dominated solutions that represent the Pareto front up to some approximation error measure, often by establishing lower and/or upper bounds on the front (Sayın 2000; Klamroth et al. 2002; Eichfelder 2009). In contrast, for our setting we do not aim at an approximation of the Pareto front, but a full representation. That is, we want to find at least one efficient solution for each non-dominated point in the image space of the discrete utility functions. In this regard, our approach is in line with many algorithmic approaches for calculating the Pareto front of a purely discrete problem (Ulungu and Teghem 1994; Holzmann and Smith 2018).

While it is in fact possible to solve the multi-criteria problem \(\mathcal {P}\) analogously to a purely discrete problem with those techniques, such an approach is unnecessarily computationally expensive, as the more difficult discrete problem is solved repeatedly and not the potentially simpler continuous problem. On the other hand, one could solve the continuous problem \(\mathcal {P}^{c}\) in the approximative sense and, in a second step, infer the discrete front. But then many points on the continuous Pareto front may be calculated that are redundant for the discrete Pareto front of \(\mathcal {P}\). This is why in this paper we aim to combine both worlds. We will present algorithms that avoid solving the discrete problem and rather solve the simpler continuous problem \(\mathcal {P}^{c}\). Simultaneously, the algorithms will incorporate the discrete aspects of \(\mathcal {P}\) to avoid solving redundant continuous problems.

The outline for this paper is as follows. We first investigate the relationship between the solutions of \(\mathcal {P}\) and \(\mathcal {P}^{c}\) (Sect. 2). Based on this theoretical insight, we then propose different algorithms to find all non-dominated points for \(\mathcal {P}\) (Sect. 3). Finally, we demonstrate and evaluate these algorithms using a set of test problems and an example from radiotherapy (Sect. 4). We conclude with a discussion of our results (Sect. 5).

2 Theoretical results

In this section, we investigate the theoretical properties of \(\mathcal {P}\) (1). In particular, we describe the relationship between peculiar solutions of \(\mathcal {P}\) and their corresponding counterparts with respect to the underlying continuous problem \(\mathcal {P}^{c}\) (2). This is crucial for the development and the analysis of the algorithms described later in this paper. Note that for all observations within this section, Assumption 1.1 is not required; they hold for arbitrary monotone functions.

Before we present the theoretical results, we summarize some notation used throughout this paper. For two vectors \(\varvec{y}^1,\varvec{y}^2\in \mathbb {R}^{k}\) we write \(\varvec{y}^1\leqq \varvec{y}^2\) if the inequality \(y^1_i \le y^2_i\) holds for every component \(i=1,\dots k\) (analogously for \(\geqq ,<,>\)). We say that a point \(\varvec{y}\in \mathbb {R}^{k}\) dominates a point \(\varvec{y}'\in \mathbb {R}^{k}\) if:

$$\begin{aligned} y_i \le y_i'&\text { for all } i \in \{1,\dots ,k\}\\ y_i < y_i'&\text { for at least one } i \in \{1,\dots ,k\}. \end{aligned}$$

We say that a point \(\varvec{y}\in \mathbb {R}^{k}\) strictly dominates a point \(\varvec{y}'\in \mathbb {R}^{k}\) if for all \(i \in \{1,\dots ,k\}\) the inequality \(y_i < y_i'\) holds. The image of the combined MCO problem \(\mathcal {P}\) is denoted by

$$\begin{aligned} \mathcal {Y}:= (\varvec{d}\circ \varvec{f}) (M). \end{aligned}$$
(3)

For the the underlying continuous problem \(\mathcal {P}^{c}\) the image is denoted by

$$\begin{aligned} \mathcal {Y}^c:= \varvec{f}(M). \end{aligned}$$
(4)

A point \(\varvec{x} \in M\) is called efficient solution for \(\mathcal {P}\) (for \(\mathcal {P}^{c}\)) if there is no \(\varvec{y} \in \mathcal {Y} \quad (\varvec{y} \in \mathcal {Y}^c)\) such that \(\varvec{y}\) dominates \((\varvec{d}\circ \varvec{f})(\varvec{x})\) (dominates \(\varvec{f}(\varvec{x})\)). The corresponding image point is called non-dominated. A point \(\varvec{x} \in M\) is called weakly efficient solution for \(\mathcal {P}\) (for \(\mathcal {P}^{c}\)) if there is no \(\varvec{y} \in \mathcal {Y} \quad (\varvec{y}\in \mathcal {Y}^c)\) such that \(\varvec{y}\) strictly dominates \((\varvec{d}\circ \varvec{f})(\varvec{x})\) (strictly dominates \(\varvec{f}(\varvec{x})\)). The corresponding image point is called weakly non-dominated.

We can now begin to introduce statements about the relationship between particular solutions of \(\mathcal {P}\) and \(\mathcal {P}^{c}\). We first observe the connection between weakly non-dominated points of the two problems.

Lemma 2.1

If \(\varvec{y}^{c,*}\in \mathcal {Y}^c\) is a weakly non-dominated point of \(\mathcal {P}^{c}\) then \(\varvec{y}^*:=\varvec{d}(\varvec{y}^{c,*})\in \mathcal {Y}\) is weakly non-dominated for \(\mathcal {P}\).

Proof

For an arbitrary point \(\varvec{y}\) in the image \(\mathcal {Y}\) of \(\mathcal {P}\) there is at least one point \(\varvec{y}^c\in \mathcal {Y}^c\) such that \(\varvec{y}=\varvec{d}(\varvec{y}^c)\). Now consider the weakly non-dominated point \(\varvec{y}^{c,*}\in \mathcal {Y}^c\). There must be at least one index \(i\in \{1,\dots ,k\}\) such that

$$\begin{aligned} y^{c,*}_{i}\le y^c_{i}, \end{aligned}$$

as otherwise \(\varvec{y}^c\) would strictly dominate \(\varvec{y}^{c,*}\) and \(\varvec{y}^{c,*}\) could not be a weakly non-dominated point. For the same index i it follows by monotonicity of \(d_{i}\) that

$$\begin{aligned} y^*_{i} = d_{i}(y^{c,*}_{i}) \le d_{i}(y^c_{i}) = y_{i}. \end{aligned}$$

This shows that \(\varvec{y}\) does not strictly dominate \(\varvec{y}^*\). As \(\varvec{y}\) was chosen arbitrary in \(\mathcal {Y}\), this shows that also \(\varvec{y}^*\) is a weakly non-dominated point. \(\square \)

Lemma 2.1 also implies that a non-dominated point is mapped to a weakly non-dominated point. Unfortunately this point is not necessarily non-dominated anymore, as the following example shows:

Example 2.2

Consider the following example (illustrated in Fig. 1):

  • \(M=\{\varvec{x}\in \mathbb {R}^{2}\mid \Vert \varvec{x}\Vert _2 \le 1\}\)

  • \(k=2\), \(f_1(\varvec{x})=x_{1}\), \(f_2(\varvec{x})=x_{2}\)

  • for \(t \in \mathbb {R}^{}\): \(d_1(t) = d_2(t)\) and

    $$\begin{aligned} d_1(t)={\left\{ \begin{array}{ll} 1 &{} \text {if}\quad t \le -0.5,\\ 2 &{} \text {if}\quad t > -0.5. \end{array}\right. } \end{aligned}$$

The point \((-1,0)\) is a non-dominated point for problem \(\mathcal {P}^{c}\). However, by \(\varvec{d}\) it is mapped to the point (1, 2). This point is dominated by (1, 1), which is the image of the feasible solution \((-0.5,-0.5)\) under \(\varvec{d}\circ \varvec{f}\).

Fig. 1
figure 1

Illustration of Example 2.2. The image set \(\mathcal {Y}^c\) of the inner problem \(\mathcal {P}^{c}\) is depicted in gray and its Pareto front in blue. The non-dominated point \((-1,0)\), depicted in red, is mapped by \(\varvec{d}\) to the weakly non-dominated point (1, 2). This point is dominated by (1, 1)

The example also shows that the reverse relationship does not hold. If \(\varvec{d}(\varvec{y}^c)\) is a weakly non-dominated point, then the preimage can be strictly dominated. The point \((-0.5,-0.5)\) is dominated in \(\mathcal {P}^{c}\) by the point \((-\frac{\sqrt{2}}{2},-\frac{\sqrt{2}}{2})\). However, \(\varvec{d}((-0.5,-0.5))\) is a non-dominated point. This means that not all preimages of a non-dominated point are again (weakly) non-dominated. The following can be shown about the existence of non-dominated points in the preimage:

Lemma 2.3

Let \(\varvec{y}^*\in \mathcal {Y}\) be a non-dominated point of \(\mathcal {P}\). Then there exists a non-dominated point \(\varvec{y}^{c,*}\in \mathcal {Y}^c\) of \(\mathcal {P}^{c}\) such that

$$\begin{aligned} \varvec{y}^*=\varvec{d}(\varvec{y}^{c,*}). \end{aligned}$$

Proof

Let

$$\begin{aligned} Z=\varvec{d}^{-1}(\varvec{y}^*)\cap \mathcal {Y}^c \end{aligned}$$

denote the preimage of the non-dominated point \(\varvec{y}^*\in \mathcal {Y}\) given in the statement. By assumption this set is nonempty. We want to show that we can find a point in Z that is non-dominated for \(\mathcal {P}^{c}\). First consider any point \(\varvec{z} \in Z\) and a point \(\varvec{z}' \in \mathcal {Y}^c\) such that:

$$\begin{aligned} z'_i \le z_i \text { for all } i \in \{1,\dots ,k\}. \end{aligned}$$
(5)

Then by monotonicity of \(d_i\) we have:

$$\begin{aligned} d_i(z'_i) \le d_i(z_i) = y^*_{i} \text { for all } i \in \{1,\dots ,k\} \end{aligned}$$

As \(\varvec{y}^*\) is non-dominated, we must have:

$$\begin{aligned} \varvec{d}(\varvec{z}') = \varvec{y}^*. \end{aligned}$$

This means that every point \(z\in \mathcal {Y}^c\) satisfying the inequalities in Eq. (5) is already contained in Z.

As we did not assume any continuity properties for d, the set Z is not necessarily closed. This is why we consider the closure of this set denoted by \(\overline{Z}\). As we assumed that f is continuous and M is compact, the set \(\overline{Z}\) is as a subset of \(\mathcal {Y}^c = \varvec{f}(M)\) compact.

Now fix an arbitrary point \(\varvec{z}^* \in Z\) and consider the following optimization problem:

$$\begin{aligned} \min _{\varvec{z} \in \overline{Z}} \quad&\sum _{i=1}^kz_i\\ \text {s.t.} \quad&z_i \le z^*_i \text { for all } i \in \{1,\dots , k\} \end{aligned}$$

The feasible set is compact and non-empty by construction. Moreover every feasible point of this problem is by the arguments given above in Z. This means that also an optimal solution \(\varvec{y}^{c,*}\in \mathcal {Y}^c\) to this problem is again in Z, which then means that

$$\begin{aligned} d(\varvec{y}^{c,*}) = d(\varvec{y}^*). \end{aligned}$$

It remains to argue that \(\varvec{y}^{c,*}\) is instead a non-dominated point for \(\mathcal {P}^{c}\). Because of optimality, \(\varvec{y}^{c,*}\) cannot be dominated by any point in \(\overline{Z}\). Moreover, every point in \(\mathcal {Y}^c\) that would dominate \(\varvec{z}\) also satisfies (5) and is again in Z. Together this means that the constructed point cannot be dominated by any point in \(\mathcal {Y}^c\) and is indeed a non-dominated point for problem \(\mathcal {P}^{c}\). \(\square \)

This Lemma is crucial for an algorithmic approach. It shows that we can find the Pareto front of the perhaps more complex problem \(\mathcal {P}\) by calculating points of the Pareto front of the continuous problem \(\mathcal {P}^{c}\). For completeness we show in the next example that the statement of Lemma 2.3 does not hold true, if one replaces non-dominated by weakly non-dominated.

Example 2.4

Consider the same feasible set and the same objectives as in Example 2.2 but consider the following utility function:

  • for \(t\in \mathbb {R}^{}: d_1(t) = d_2(t)\) and

    $$\begin{aligned} d_1(t)={\left\{ \begin{array}{ll} 1 &{} \text {if}\quad t \le 0.5,\\ 2 &{} \text {if}\quad t > 0.5. \end{array}\right. } \end{aligned}$$

For an illustration, see Fig. 2. In this example, the point (2, 1) is a weakly non-dominated point for \(\mathcal {P}\). However, all points in \(\mathcal {Y}^c\) that are mapped to this point are strictly dominated in \(\mathcal {P}^{c}\), e.g. by \((-1,0)\).

Fig. 2
figure 2

Illustration of Example 2.4. The image set \(\mathcal {Y}^c\) of the inner problem \(\mathcal {P}^{c}\) is depicted in gray and its Pareto front in blue. Every point in the dark area is strictly dominated in the inner problem \(\mathcal {P}^{c}\), but it is the preimage of a weakly non-dominated point (1,2) of \(\mathcal {P}\)

3 Algorithms

In this section, we discuss different algorithms tailored at calculating the Pareto front for a problem \(\mathcal {P}\) with discrete utility mappings. That is, for the algorithms presented in this section to be applicable, we from now on explicitly require that Assumption 1.1 is satisfied. This means that for the image of \(\mathcal {P}\) we have:

$$\begin{aligned} \mathcal {Y} \subseteq \prod _{i=1}^{k} \Lambda _i. \end{aligned}$$

The proposed algorithms utilize the special structure of \(\mathcal {P}\). On the one hand, all proposed methods employ a gradient-based solver to find either feasible or weakly efficient solutions \(\varvec{x}\) of \(\mathcal {P}\) by iteratively solving particular single-criteria optimization problems—so-called scalarizations—derived from the continuous inner problem \(\mathcal {P}^{c}\). On the other hand, the algorithms also take the discrete utility mappings into account, namely when choosing which scalarization to solve next and when evaluating their progress.

Fig. 3
figure 3

Illustration of exhaustively solving the multi-criteria problem \(\mathcal {P}\). The preimages of the points \(\varvec{y} \in \prod _{i=1}^{k} \Lambda _i\) divide the image space of the continuous problem \(\mathcal {P}^{c}\) into boxes. By finding specific solutions to the inner problem—which map to points on the continuous Pareto front depicted in blue—we can determine whether each box, and thus each \(\varvec{y} \in \prod _{i=1}^{k} \Lambda _i\), is non-dominated, dominated, or unattainable

The proposed algorithms aim to exhaustively solve the multi-criteria problem \(\mathcal {P}\) (see Fig. 3). This means that:

  • For all \(\varvec{y} \in \prod _{i=1}^{k} \Lambda _i\), the algorithm determines whether \(\varvec{y}\) is non-dominated, dominated or unattainable.

  • For any \(\varvec{y} \in \prod _{i=1}^{k} \Lambda _i\) that is non-dominated, the algorithm calculates a solution \(\varvec{x} \in M\) such that \(\varvec{y} = (\varvec{d} \circ \varvec{f})(\varvec{x})\).

In the applications we will introduce in Sect. 4 the cardinality of each individual \(\Lambda _i\) will not only be finite but also small (3–5). Additionally we will not have arbitrary many objectives in mind (in the examples 2–4). This means that it is possible to iterate over all elements in \(\prod _{i=1}^{k} \Lambda _i\) and to save information for each element. However, our goal is to keep the computational effort – in the sense of optimization problems to solve over all elements – small.

During application of the following algorithms a point \(\varvec{y} \in \prod _{i=1}^{k} \Lambda _i\) can have four different current states encoded by \(\sigma (\varvec{y})\):

  • \(\sigma (\varvec{y})\) = attainable: The algorithm found an \(\varvec{x}\) with

    $$\begin{aligned} (\varvec{d} \circ \varvec{f})(\varvec{x}) \leqq \varvec{y}. \end{aligned}$$

    In this case we save \(\varvec{x}\) in \(X^*(\varvec{y})\).

  • \(\sigma (\varvec{y})\) = dominated: The algorithm determined a point \(\varvec{\tilde{y}}\in \mathcal {Y}\) that is attainable and that dominates \(\varvec{y}\).

  • \(\sigma (\varvec{y})\) = unattainable: The algorithm checked that there is no point \(\varvec{x}\) that is mapped to \(\varvec{y}\) or any point that dominates \(\varvec{y}\).

  • \(\sigma (\varvec{y})\) = unknown: The algorithm did not make any statement about this point, yet.

Note that we do not need to explicitly encode all points that are attainable but non-dominated. If we found correctly all points that are attainable and of these points all that are dominated, then the complement of the dominated points are the non-dominated ones. We can also be sure to have calculated a preimage \(\varvec{x}\) for each non-dominated \(\varvec{y}\). As for each attainable point \(\varvec{y}\) we found an \(\varvec{x}\) with:

$$\begin{aligned} (\varvec{d} \circ \varvec{f})(\varvec{x}) \leqq \varvec{y}. \end{aligned}$$

For a non-dominated point none of these inequalities can be strict and we must have:

$$\begin{aligned} (\varvec{d} \circ \varvec{f})(\varvec{x}) = \varvec{y}. \end{aligned}$$

In the following, we call a point \(\varvec{y}\) a supported point of \(\mathcal {P}^{c}\) with weights \(\varvec{w}\) (see e.g. Ehrgott 2005) if it solves the following optimization problem:

$$\begin{aligned} \min _{\varvec{y} \in \mathcal {Y}^c} \sum _{i=1}^{k} w_i \cdot y_i. \end{aligned}$$

3.1 Propagation rules

An immediate approach to exhaustively solve the problem \(\mathcal {P}\) consists of solving, for each \(\varvec{y} \in \prod _{i=1}^{k} \Lambda _i\), the individual feasibility problem

$$\begin{aligned} \begin{aligned}&\text {Find } \varvec{\varvec{x}\in M} \\&\text {such that }(\varvec{d} \circ \varvec{f})(\varvec{x}) \leqq \varvec{y}. \end{aligned} \end{aligned}$$
(6)

However, solving all these problems without additional considerations would be very inefficient. Efficiency can be improved by propagating information on a found solution \(\varvec{x}\) of (6) and its image points \(\varvec{y}^c\) and \(\varvec{y}\), and thereby deducing the attainability or unattainability of other points \(\varvec{\tilde{y}} \in \prod _{i=1}^{k} \Lambda _i\). These propagation rules are the following:

  1. I.

    If \(\varvec{y}\) is unattainable, and \(\varvec{\tilde{y}} \leqq \varvec{y}\), then \(\varvec{\tilde{y}}\) is unattainable.

  2. II.

    If \(\varvec{y}\) is attainable, and \(\varvec{\tilde{y}} \geqq \varvec{y},\varvec{\tilde{y}} \ne \varvec{y} \), then \(\varvec{\tilde{y}}\) is dominated.

  3. III.

    Let \(\varvec{y}^{c}\) be a weakly non-dominated point of \(\mathcal {P}^{c}\). If \(\varvec{\tilde{y}} < \varvec{d}(\varvec{y}^{c})\), then \(\varvec{\tilde{y}}\) is unattainable.

  4. IV.

    Let \(\varvec{y}^{c}\) be a supported point of \(\mathcal {P}^{c}\) with weights \(\varvec{w}\). If \((\sup (\varvec{d}^{-1}(\varvec{\tilde{y}})) - \varvec{y}^{c})^{T}\varvec{w} < 0\), then \(\varvec{\tilde{y}}\) is unattainable.

Note that the supremum over the set in the last rule is understood componentwise.

Lemma 3.1

The propagation rules I to IV, introduced above, are valid.

Proof

The first two rules are self evident. For the third rule, note that as \(\varvec{y}^{c}\) is weakly non-dominated for \(\mathcal {P}^{c}\), \(\varvec{d}(\varvec{y}^{c})\) is weakly non-dominated for \(\mathcal {P}\) according to Lemma 2.1, and therefore there cannot be a point \(\varvec{\tilde{y}}\) that strictly dominates \(\varvec{d}(\varvec{y}^{c})\). For the last rule, as \(\varvec{y}^{c}\) is supported, by definition \(\varvec{w}^{T}\varvec{y}^{c}\le \varvec{w}^{T}\varvec{\tilde{y}^{c}}\) for all \(\varvec{\tilde{y}^{c}}\) from the image \(\mathcal {Y}^c\) of the feasible set. This can be rewritten as

$$\begin{aligned} (\varvec{\tilde{y}^{c}} - \varvec{y}^{c})^{T}\varvec{w} \ge 0 \end{aligned}$$

for all \(\varvec{\tilde{y}^{c}} \in \mathcal {Y}^{c}\). Now assume for a point \(\varvec{\tilde{y}} \in \prod _{i=1}^{k} \Lambda _i\) we have

$$\begin{aligned} 0&> (\sup (\varvec{d}^{-1}(\varvec{\tilde{y}})) - \varvec{y}^{c})^{T}\varvec{w} \\&= \sup (\varvec{d}^{-1}(\varvec{\tilde{y}}))^{T}\varvec{w} -\varvec{w}^{T}\varvec{y}^{c}. \end{aligned}$$

Now observe that \(\sup (\varvec{d}^{-1}(\varvec{\tilde{y}}))^{T}\varvec{w} \ge \varvec{w}^{T}\varvec{\tilde{y}^{c}}\) for all \(\varvec{\tilde{y}^{c}} \in \varvec{d}^{-1}(\varvec{\tilde{y}})\) as \(\varvec{w} \geqq 0\) and therefore the dot product with \(\varvec{w}\) is a positive map. Hence

$$\begin{aligned} 0&> \varvec{w}^{T}\varvec{\tilde{y}^{c}} -\varvec{w}^{T}\varvec{y}^{c} \\&= (\varvec{\tilde{y}^{c}} - \varvec{y}^{c})^{T}\varvec{w} \end{aligned}$$

for all \(\tilde{y}^{c} \in \varvec{d}^{-1}(\varvec{\tilde{y}})\), which proves that the preimage \(\varvec{d}^{-1}(\varvec{\tilde{y}})\) and the image of the feasible set are disjoint, and hence \(\varvec{\tilde{y}}\) is unattainable. \(\square \)

3.2 Box checking algorithm

The first algorithm we discuss is called the box checking algorithm. It systematically checks the attainability of points \(\varvec{y}\in \prod _{i=1}^{k} \Lambda _i\), making effective use of the propagation rules I.- III. Attainability is checked by solving specific scalarizations of the underlying continuous problem \(\mathcal {P}^{c}\).

Definition 3.2

We call a single-criteria optimization problem a \(\varvec{y}\)-constraint scalarization of \(\mathcal {P}^{c}\), and denote it with \(\mathcal {S}(\mathcal {P}^{c}, \varvec{y})\), if the following hold:

  1. i.)

    The optimization problem has an optimal solution if and only if \(\varvec{y}\) is attainable.

  2. ii.)

    For any optimal solution \(\varvec{x}\) of the problem, the relation \((\varvec{d}\circ \varvec{f})(\varvec{x}) \leqq \varvec{y}\) holds.

Such a \(\varvec{y}\)-constraint scalarization solves the feasibility problem (6). Under the assumption that all utility functions \(d_i\) are left-continuous, a version of a \(\varvec{y}\)-constraint scalarization is the following Pascoletti-Serafini scalarization (Pascoletti and Serafini 1984), where \(\varvec{p}:= \sup (\varvec{d}^{-1}(\varvec{y}))\) and \(\varvec{q}:= \varvec{p} - \inf (\varvec{d}^{-1}(\varvec{y}))\):

$$\begin{aligned} \begin{aligned}&\min _{\varvec{x} \in M, \alpha \le 0} \alpha \\&\varvec{p} + \alpha \varvec{q} \geqq \varvec{f}(\varvec{x}). \end{aligned} \end{aligned}$$
(7)
Fig. 4
figure 4

Illustration of the Pascoletti-Serafini scalarization, which can be used in the box checking algorithm. The scalarization finds the intersection point of the box diagonal and the Pareto front

The applicable propagation rules depend on the chosen scalarization. If the feasibility problem (6) is solved, only the basic propagation rules I. and II. can be applied. If the Pascoletti–Serafini problem (7) is used as scalarization, propagation rule III. can additionally be applied, as any optimal solution to (7) is weakly Pareto efficient. The observations result in Algorithm 1.

Algorithm 1
figure a

Box checking algorithm

At the beginning of Algorithm 1, the state map \(\sigma \) and the solution map \(X^{*}\) are initialized (ll 1–2). Then, in each iteration a point \(\varvec{y}\) is picked whose state is unknown, and a \(\varvec{y}\)-constraint scalarization is set up (ll 4–5). If the scalarization can be solved to feasibility, \(\sigma (\varvec{y})\) is set to attainable and the solution is stored in the solution map (ll 6–9); otherwise, \(\sigma (\varvec{y})\) is set to unattainable (l 11). The state information is additionally updated according to the applicable propagation rules (l 13). When no point with state unknown remains, the algorithm returns the status map \(\sigma \) and the solution map \(X^{*}\) (l 16).

Note that the algorithm is guaranteed to terminate, as in each iteration at least one point \(\varvec{y}\) is removed from the set of points with \(\sigma (\varvec{y})\) unknown. Also, for all points \(\varvec{y} \in \prod _{i=1}^{k} \Lambda _i\) that have been found to be non-dominated, the algorithm will have found a solution \(\varvec{x} \in M\) with \(\varvec{y} = (\varvec{d} \circ \varvec{f})(\varvec{x})\). That is, on termination of the algorithm, the problem \(\mathcal {P}\) is solved exhaustively.

One crucial aspect of the box checking algorithm is the strategy employed to pick the next \(\varvec{y}\) with \(\sigma (\varvec{y})\) unknown at the beginning of each iteration. In our implementation, the strategy is as follows. For a given unknown \(\varvec{y}\), denote \(N^{1}(\varvec{y})\) the number of unknown \(\varvec{\tilde{y}}\) that would be found unattainable if \(\varvec{y}\) was unattainable (according to propagation rule I). Conversely, denote \(N^{2}(\varvec{y})\) the number of unknown \(\varvec{\tilde{y}}\) that would be found attainable if \(\varvec{y}\) was attainable (according to propagation rule II). Pick the unknown \(\varvec{y}\) for which \(min\left\{ N^{1}(\varvec{y}), N^{2}(\varvec{y})\right\} \) is largest. If this leaves more than one option to choose, pick one with largest \(N^{1}(\varvec{y}) + N^{2}(\varvec{y})\) among those.

3.3 Algorithm using supported solutions

This algorithm (given in Algorithm 2) utilizes weighted sum scalarizations (see Eq. ) of the inner continuous problem \(\mathcal {P}^{c}\). As the points obtained through a weighted sum scalarization are, by definition, not only non-dominated but even supported, all of the propagation rules I.-IV. can be applied.

Algorithm 2
figure b

Supported solutions algorithm

After initialization (ll 1–2), Algorithm 2 iteratively calculates a new point \(\varvec{y^*}\) as the image of a solution \(\varvec{x^*}\) of a weighted sum scalarization (ll 5–6). In each iteration, the solution map is updated, and the state information is refreshed according to the propagation rules (ll 7–9). The algorithm returns the status map \(\sigma \) and the solution map \(X^{*}\) when no point \(\varvec{y}\) with state unknown remains (l 16).

The algorithm can be used effectively as a first phase algorithm for both convex and non-convex problems. As all four propagation rules I.-IV. can be applied, a lot of progress can be achieved early on. However, a switch to the box checking Algorithm 1 should be triggered the moment no meaningful progress is achieved any more.

Even for convex problems, the algorithm on its own is in later iterations not the most effective choice. Unlike the box checking algorithm, its termination is not guaranteed. The reason for this is that the calculated points are determined indirectly through the weights. If only a few particular \(\varvec{y} \in \prod _{i=1}^{k} \Lambda _i\) remain unknown, they cannot be targeted directly. Figure 5 shows a worst-case scenario, where the algorithm does not make progress over many iterations.

The strategy for picking the next weight vector \(\varvec{w}\) at the beginning of each iteration (line 4) is as follows. For the first \(k\) iterations, we choose the extreme compromises, where one of the entries of \(\varvec{w}\) is set to one and the rest are set to zero. For later iterations, we calculate the convex hull of the previously found solutions. We determine the face that intersects with the greatest number of preimage sets \(\varvec{d}^{-1}(\varvec{y})\) and pick its normal as the new \(\varvec{w}\).

Fig. 5
figure 5

An example of bad performance of the supported solutions algorithm. While the first three iterations reduced the number of \(\varvec{y} \in \prod _{i=1}^{k} \Lambda _i\) with state unknown (white area) to one, in the next four iterations no further progress was made

3.4 Algorithm for convex problems

If \(\mathcal {P}^{c}\) is a convex problem, Algorithm 2 can be modified to, in each iteration, calculate and employ the convex hull of the images \(\varvec{f}(X^{*})\), where \(X^{*}\) denotes the set of all solutions found so far. The following Lemma will be needed.

Lemma 3.3

Let \(\{ \varvec{x}^1,..., \varvec{x}^n \} \in M\) be a set of efficient solutions to the convex problem \(\mathcal {P}^{c}\), and let

$$\begin{aligned} \varvec{d}^{-1}(\varvec{y}) \cap \text {conv} (\varvec{f}(\varvec{x}^{1}),..,\varvec{f}(\varvec{x}^{n})) \ne \emptyset \end{aligned}$$

for a point \(\varvec{y} \in \prod _{i=1}^{k} \Lambda _i\). Then there is \(\varvec{x}^{\text {int}} \in M\) such that \((\varvec{d} \circ \varvec{f})(\varvec{x}^{\text {int}}) \leqq \varvec{y}\).

Proof

Let \(\varvec{y}^{c} \in \varvec{d}^{-1}(\varvec{y}) \cap conv (\varvec{f}(\varvec{x}_{1}),..,\varvec{f}(\varvec{x}_{n}))\).Then by definition

$$\begin{aligned} \varvec{d}(\varvec{y}^{c}) = \varvec{y}, \end{aligned}$$

and there are coefficients \(0 \le c_i \le 1\) \((i = 1,..,n)\) such that

$$\begin{aligned} \varvec{y}^{c} = \sum _{i=1}^{n} c_i \varvec{f}(\varvec{x}_{i}). \end{aligned}$$

Now let \(\varvec{x}^{\text {int}}:= \sum _{i=1}^{n} c_i \varvec{x}_{i}\). Because of convexity of \(\varvec{f}\), we have

$$\begin{aligned} \varvec{f}(\varvec{x}^{\text {int}}) \leqq \varvec{y}^{c}. \end{aligned}$$

By the componentwise monotonicity of \(\varvec{d}\), we obtain \((\varvec{d} \circ \varvec{f})(\varvec{x}^{\text {int}}) \leqq \varvec{y}\), which shows the claim. \(\square \)

Algorithm 3
figure c

Algorithm for convex problems

Along the lines of Algorithm 2, Algorithm 3 iteratively calculates a new point \(\varvec{y^*}\) as the image of the solution \(\varvec{x^*}\) of a weighted sum scalarization (ll 5–6). After updating the solution set and the state information according to the propagation rules (ll 7–9), Lemma 3.3 is applied to potentially identify additional points \(\varvec{y}\) as attainable. If that is the case, the interpolated solution \(\varvec{x}^{\text {int}}\) is stored in the solution map (ll 10–14).

In accordance with Lemma 3.3, the interpolated solution \(\varvec{x}^{\text {int}}\) in Algorithm 3 is defined as follows: if \(\varvec{d}^{-1}(\varvec{y}) \cap conv (\varvec{f}(X^{*})) \ne \emptyset \) is established by finding coefficients \(0 \le c_i \le 1\) \((i = 1,..,n)\) such that for \(\varvec{y}^{c}:= \sum _{i=1}^{n} c_i \varvec{f}(\varvec{x}_{i})\) we have \(\varvec{d}(\varvec{y}^{c}) = \varvec{y}\), then \(\varvec{x}^{\text {int}}:= \sum _{i=1}^{n} c_i \varvec{x}^{i}\) (see proof of Lemma 3.3 for the rationale behind this.)

For picking the weight vector \(\varvec{w}\) at the beginning of each iteration (line 4), we choose the same strategy as the one described above for the algorithm using supported solutions.

4 Examples

In this section, we will investigate the performance of the described algorithms over both a set of test problems and a real world example from radiotherapy plan optimization. The three algorithms we will compare are the following:

  1. (A)

    Box checking Algorithm 1 using the Pascoletti–Serafini scalarization (7).

  2. (B)

    Combined algorithm where, in a first phase, the supported solutions algorithm (Algorithm 2) is employed until no further progress is made (i.e. an iteration occurs where the number of boxes with status unknown is not reduced.) From then on, the box checking algorithm with Pascoletti-Serafini scalarization is used until the problem is solved exhaustively.

  3. (C)

    Combined algorithm where, in the first phase, the algorithm for convex problems (Algorithm 3) is used instead.

4.1 Randomized problems

We consider the following problem

$$\begin{aligned} \begin{aligned} (d_1^K(x_1), ..., d_k^K(x_k))&\rightarrow \min \\ s.t. \ \ {}&\sum _{i=1}^{k} \left( \frac{x_i - c_i}{r_i} \right) ^{2} \le 0 \\&0 \le x_i \le 1 \ \ \forall i = 1,..,k \end{aligned} \end{aligned}$$
(8)

where the coordinates \(c_i\) are randomly chosen from [0.6, 1.0] and the half lengths \(r_i\) are randomly chosen from [0.2, 0.5]. We investigate different utility functions \(\varvec{d}^{K}\) that differ in the cardinality K of their image sets \(\Lambda _i\). Namely,

$$\begin{aligned} d_i^3(x_i)= & {} {\left\{ \begin{array}{ll} 0.4 &{} \text {if}\quad 0 \le x_i \le 0.4,\\ 0.7 &{} \text {if}\quad 0.4< x_i \le 0.7,\\ 1.0 &{} \text {if}\quad x_i> 0.7 \end{array}\right. } \quad \forall i = 1,..,k \\ d_i^4(x_i)= & {} {\left\{ \begin{array}{ll} 0.4 &{} \text {if}\quad 0 \le x_i \le 0.4,\\ 0.6 &{} \text {if}\quad 0.4< x_i \le 0.6,\\ 0.8 &{} \text {if}\quad 0.6< x_i \le 0.8,\\ 1.0 &{} \text {if}\quad x_i> 0.8 \end{array}\right. } \quad \forall i = 1,..,k \\ d_i^5(x_i)= & {} {\left\{ \begin{array}{ll} 0.4 &{} \text {if}\quad 0 \le x_i \le 0.4,\\ 0.5 &{} \text {if}\quad 0.4< x_i \le 0.5,\\ 0.6 &{} \text {if}\quad 0.5< x_i \le 0.6,\\ 0.8 &{} \text {if}\quad 0.6 < x_i \le 0.8,\\ 1.0 &{} \text {if}\quad x_i > 0.8 \end{array}\right. } \quad \forall i = 1,..,k. \end{aligned}$$

The underlying continuous problem \(\mathcal {P}^{c}\) of the randomized problems (8) is convex, allowing the application of all algorithms (A), (B) and (C). For a specific problem instance with \(k=2\) and \(K = 4\), Fig. 6 illustrates the course of the box checking algorithm (A), and Fig. 7 depicts the course when using the combined algorithm for convex problems (C) instead. We see that for this instance, the combined algorithm for convex problems outperforms the box checking algorithm, finishing in 3 rather than 6 iterations.

Fig. 6
figure 6

The box checking algorithm, using the Pascoletti-Serafini scalarization, for a problem instance of 8 with \(k= 2\) and \(K = 4\). In total, six iterations are necessary to exhaustively solve the problem

Fig. 7
figure 7

The convex algorithm applied to a problem instance of Eq. 8 with \(k= 2\) and \(K = 4\). Only three iterations are necessary to determine all non-dominated points. Note that the solution for one of the non-dominated outcomes is obtained by convex combination of previously calculated points according to Lemma 3.3 (orange)

We can compare the performance of algorithms (A)–(C) by measuring the progress p over the iterations, with p being defined as:

$$\begin{aligned} p = \frac{1 - \vert \{\varvec{y} \in \prod _{i=1}^{k} \Lambda _i \vert \sigma (\varvec{y}) unknown \} \vert }{\vert \prod _{i=1}^{k} \Lambda _i \vert }. \end{aligned}$$
(9)

Figure 8 shows the averaged progress for the artificial problem 8 over varying number of objectives \(k \in \left\{ 2,3,4\right\} \) and image set cardinalities \(K \in \left\{ 3,4,5 \right\} \). To create each of the averaged plots, 10 randomly created problem instances were solved and the mean of obtained progress values p after each iteration was calculated.

The combined convex algorithm (C) performs best by exploiting the convexity of the problem. The next best choice is the combined algorithm that uses the supported solutions algorithm as a first phase. The box checking algorithm on its own does not perform as well. With higher dimension k and higher image set cardinality K, the differences in algorithm performance become more pronounced.

Fig. 8
figure 8

Averaged progress for algorithms (A)–(C): box checking algorithm (dark blue), combined algorithm with supported solutions algorithm as first phase (purple dotted) and combined algorithm with convex algorithm as first phase (red). The averages were taken over 10 randomized problem instances of 8. The number of objectives k was varied from 2 to 4 and the image set cardinality K was varied from 3 to 5

4.2 Radiotherapy planning example

In radiotherapy planning, the aim is to deliver the prescribed dose to the target volumes while sparing nearby organs at risk. In intensity-modulated radiation therapy (IMRT), the irradiation—the so-called fluence—is delivered from different angles around the patient and can additionally be modulated over the cross section of each beam by moving collimator leaves in and out of the beam field.

The fluence is represented by a vector \(\varvec{x}\geqq 0\) and constitutes the optimization variable of the multi-criteria radiotherapy planning problem (Küfer et al. 2002; Küfer et al. 2003; Craft et al. 2012). The dose influence matrix D depends on the patient anatomy as established by a CT scan and promotes the mapping from the fluence to the dose, such that the entries of \(D\varvec{x}\) represent the dose for each voxel in the voxelized patient anatomy.

The optimization objectives then evaluate the dose vector entries over the target volumes and organs at risk. These evaluation functions are continuous and often convex. Possible choices are minimum, maximum, mean, p-norms or one-sided p-norms. Thus, the (continuous) multi-criteria radiotherapy planning problem has the properties required for the inner problem \(\mathcal {P}^{c}\) in our setting.

In clinical practice, certain threshold values for the objectives play a crucial role. In the case of organs at risk, violating a threshold is linked to specific side effects. For the targets, failing to meet a specific value may increase the probability of the tumor recurring. Often, these clinical goals are further categorized into those which represent the absolute minimum requirement, and others that correspond to an average or an ideal dose distribution in the volume. Taking these discretizations (“minimum”, “average”, “ideal”) into account, we obtain the structure of the continuous problem with discretized utility \(\mathcal {P}\).

As an example, we consider the TG119 case. The anatomy of the case has a U-shaped target volume, an organ at risk (OAR) in the center of the target, and the surrounding normal tissue region. We define the beam geometry as 7 equidistantly spaces beams, see Fig. 9.

Fig. 9
figure 9

Geometry of the TG119 case. It has a U-shaped target volume (white), an organ at risk (magenta) in the center of the target, and surrounding normal tissue. Seven beams are spaced equidistantly (white lines)

To set up the optimization problem, we formulate one objective for each structure as detailed in Table 1. Additionally, for each objective we define the discretization mappings into the categories 3 (“minimum”), 2 (“average”), and 1 (“ideal”) by certain upper bounds, also given in Table 1, such that the discretized utility functions \(d_i\) \((i = 1,2,3)\) map to the smallest category for which \(f_i(D\varvec{x})\) falls below the category’s upper bound.

Table 1 Optimization model for the TG119 case: optimization objectives for target, OAR and tissue, and threshold values for the discretization into categories 1–3

Again, we compared the algorithms (A)–(C). Figure 10 shows the result obtained with all three algorithms. There are two Pareto-optimal solutions to the discretized problem: one where the target and the tissue dose quality are ideal while the OAR dose quality is average, and one where the OAR and tissue dose quality are ideal and the target dose quality is average. This is reflected in the dose volume histograms for the two solutions, where the first solution shows a better slope at the prescription dose level of 60 Gy, while the other solution exhibits a lower OAR curve.

Fig. 10
figure 10

The result for the TG119 example. On the left side, the OAR mean objective and its discretization into categories is plotted against the tissue mean objective and its discretization for each of the 3 categories of the target underdose objective. There are two Pareto optimal outcomes in the discretized space. These are achieved by two efficient solutions \(\varvec{x}\) mapping to the marked points under the continuous objectives defined in Table 1 (the target objective value being shown in purple.) The dose distributions for the two solutions—displayed as dose-volume histograms at the right—represent two distinct compromises, with the upper solution achieving a better target coverage and the lower solution a better sparing of the OAR

Figure 11 shows the progress—as defined in (9)—for the different algorithms over the iterations. For this particular example, the algorithms do not differ much in their efficiency. The combined algorithm that uses the convex algorithm as a first phase performs the best, followed by the box checking algorithm and the other combined algorithm.

Fig. 11
figure 11

Progress for algorithms (A)–(C) for the TG119 radiotherapy planning example: box checking algorithm (dark blue), combined algorithm with supported solutions algorithm as first phase (purple), and combined algorithm with convex algorithm as first phase (red)

5 Conclusion

In this paper we considered multi-criteria optimization problems which are based on a continuous problem. The outputs of the objectives are mapped using a discrete utility function making the overall multi-criteria optimization a discrete one. For example, such categories can be used to quickly find candidate solutions for a decision maker, especially when the Pareto front is no longer easily visualized. In a first step a decision maker might want to get a rough overview of the different alternatives instead of navigating locally.

In this paper, we first studied the connection between solutions of the underlying continuous problem and the combined multi-criteria problem. The results were used to introduce several algorithms that combine the discrete and continuous structure. In numerical examples, we were able to show that our approaches can save a lot of computation compared to a naive approach. For larger problems with more objectives and more categories, the experiments showed that it is beneficial to use as much information as possible and also to exploit convexity.

Several aspects were beyond the scope of this paper, but may be of interest for future investigations. We focused on fully discrete utilities. In Sect. 2 we already pointed out that this is not necessarily required from a theoretical point of view. All results hold for monotone functions in general. It would be interesting to look at combined continuous and discrete utilities. We also focused on problems with a moderate number of categories and objectives. This way it was not a problem to iterate over all possible category combinations. It would be interesting to study how the developed strategies transfer to this larger setting.