A Multi-Objective Interpretation of Optimal Transport

This paper connects discrete optimal transport to a certain class of multi-objective optimization problems. In both settings, the decision variables can be organized into a matrix. In the multi-objective problem, the notion of Pareto efficiency is defined in terms of the objectives together with nonnegativity constraints and with equality constraints that are specified in terms of column sums. A second set of equality constraints, defined in terms of row sums, is used to single out particular points in the Pareto-efficient set which are referred to as “balanced solutions.” Examples from several fields are shown in which this solution concept appears naturally. Balanced solutions are shown to be in one-to-one correspondence with solutions of optimal transport problems. As an example of the use of alternative interpretations, the computation of solutions via regularization is discussed.


Introduction
The theory of optimal transport is an extremely rich research area, with applications ranging from probability theory to mathematical physics, from image processing to partial differential equations, and from city planning to differential geometry. The two volumes by Rachev and Rüschendorf [1,2] and the celebrated books by Fields medalist Villani [3,4] provide ample evidence of the spectacular growth of the field since the original work of Monge [5], Kantorovich [6], and Fréchet [7]. Specialized treatments for applied mathematicians and for economists, respectively, are provided in the recent books by Santambrogio [8] and Galichon [9].
The purpose of the present paper is to highlight a connection between discrete optimal transport and a certain class of multi-objective problems. This connection appears to have gone unnoticed so far, even though the proof is not difficult, and the corresponding result for strictly concave problems has been known for several decades [10][11][12]. By means of this connection, the techniques of optimal transport can be applied to certain multi-objective problems, while at the same time an alternative perspective on optimal transport is obtained. As an illustration of the latter point, it will be shown below how the popular entropic regularization of optimal transport problems [13,14] has a natural interpretation as an isoelastic regularization within the multi-objective framework. The transfer from a single objective to multiple objectives also entails a change from an additive to a multiplicative environment and in this way leads naturally to applications of nonlinear Perron-Frobenius theory. This theory has been used in [15] (cf. also [16]) to analyze a matrix scaling problem that can be related to single-objective concave optimization, and it comes up perhaps even more naturally in the context of multi-objective problems [17]. The application of nonlinear Perron-Frobenius theory will be illustrated below.
In this paper, the term "multi-objective problem" is used to cover multi-agent problems (several agents, each with their own objective), as well as multi-criteria or vector optimization problems (one agent with several objectives). The solution concept that will be employed can be used in both contexts, as illustrated in Sect. 2.2. The examples in this section also show the relevance of the concept in different fields: operations research, statistics, and actuarial science.
From the multi-agent point of view, the multi-objective problems considered in this paper may be placed within the broad class of fair division problems [18,19]. Typical features of problems considered in this active area of research include multiple agents, each equipped with their own utility functionals, and resources in limited supply. Solution concepts are based on notions of efficiency and fairness. While many different notions of fairness have been proposed, the one that appears to be used most often is envy-freeness: no agent would prefer anybody else's share to her own [20]. In the present paper, a different concept of fairness is used, which is less standard in the literature on fair division, and which is based on bringing in a notion of size that is independent of any particular agent. This fairness concept can be viewed as a version of "exogenous rights," mentioned as one of four principles of distributive justice by Moulin [19,Ch. 2], or of "entitlements" as discussed by Brams [18]. In contrast to the use of entitlements in [18], the size constraint is taken in this paper as an equality constraint, rather than as an inequality constraint. It will be assumed below that allotments to agents can be represented as vectors and that size is expressed as a single number determined by a linear functional.
Size constraints can be used as a way to single out unique solutions among the set of all efficient solutions. In that sense, their role is comparable to, for instance, the Nash collective utility function [21]. The use of size constraints presupposes the availability of a measurement functional by which the magnitudes of allotments can be determined. In this sense, size constraints call for a richer problem environment than the Nash bargaining solution does. In addition to the availability of a measurement functional, the problem data should provide, for each of the agents/objectives, the value of the right-hand side of the equation that expresses the corresponding size constraint. Below, these numbers appear as row sums that are prefixed.
In the literature on multi-criteria optimization (see for instance [22]), several methods have been proposed to select particular solutions among the set of all Paretoefficient solution, such as scalarization and lexicographic ordering. Scalarization is a way to relate a multi-objective problem to a single-objective problem. While this paper also establishes a connection between single-objective and multi-objective problems, and weighted-sum scalarization will be used as part of the proof technique, it should be emphasized that the connection established below cannot be viewed as a scalarization. The objective function in the associated single-objective problem cannot be obtained through aggregation of the objective functions in the multi-criteria problem by means of an achievement function, as in [22,Section 4.6]. Multi-objective optimal transport has been studied for instance by Isermann [23], but again this is different from the problem considered in this paper, which is to connect optimal solutions of singleobjective optimal transport problems to balanced solutions (as introduced below) of a class of multi-objective problems.
In the context of equilibrium economics, efficient solutions that satisfy budgetary size constraints have been termed BCPE solutions (budget-constrained Pareto-efficient solutions) by Balasko [24]. Gale and Sobel [11] do not introduce a separate term for the combination of Pareto efficiency and satisfaction of budget constraints; they speak of optimal and fair solutions (optimal and proportional in [25]). Bühlmann and Jewell [12] use the term fair Pareto optimal risk exchanges (FAIRPOREX). Neither of these terminologies appear to be entirely appropriate to cover the range of different situations exemplified below. For generality and simplicity, the term balanced solutions will be used in this paper. 1 Individual rationality and behavioral effects do not play a role in the definition of balanced solutions. In some (but not all) of the examples shown below, it would in fact be quite unnatural to introduce these notions. All in all, it may perhaps be said that the concept of a balanced solution is closer in spirit to the multi-criteria setting than to the multi-agent setting. In the case where multiple agents are involved, the concept represents a social planner's point of view.
The organization of the paper is as follows. Following this introduction, a formal definition of the notion of balanced solutions is given in Sect. 2, along with three examples of situations in which this notion may be used. The same section also provides a characterization of balanced solutions. This characterization is used in Sect. 3 to establish the connection to optimal transport. Regularization and the connection to Perron-Frobenius theory are discussed in Sect. 4. A discussion of numerical issues follows in Sects. 5, and 6 concludes.

Definitions
In the optimal transport problem, the decision variables are organized into a matrix, and there are constraints related to row sums, as well as constraints related to column sums. Rows and columns typically correspond to objects that are different in nature, for instance producers and consumers. The problem is formulated with a single objective. As will be illustrated below, situations arise in practice which similarly call for a table to be filled, taking into account row sum constraints, as well as column sum constraints, but which are different in the sense that there are multiple objectives, which relate (by convention) to the rows of the matrix that is to be formed. Additionally, as shown in the examples below, there is often a certain status difference between the row constraints and the column constraints. The column constraints express "feasibility," whereas the row constraints express a notion of "fairness." While the column constraints can be thought of being imposed by Nature, the row constraints are manmade, and the right-hand sides used in these constraints are to some extent a matter of choice. Such a hierarchy leads to the notion of what we will call "balanced" solutions of the multi-objective problem. The idea is that the objective functions together with the feasibility constraints are used to define a Pareto-efficient surface and that the row constraints are then used to select specific solutions on the efficient surface. Under suitable circumstances, it may happen that there is a unique balanced solution. A precise definition is given below. In the interest of generality, the terms "primary" and "secondary" are used to refer to the two types of constraints, rather than "feasibility" and "fairness." Definition 2.1 A multi-objective matrix allocation (MOMA) problem is specified by (i) The decision variables x ij , which together form a matrix X of size n × m; (ii) Objective functions given by (iii) Primary constraints given by where c 1 , . . . , c m are given positive constants; (iv) Secondary constraints given by where r 1 , . . . , r n are given positive constants.
A specification of the problem above can be given in terms of the triple (g, c, r ), where g = (g ij (·)) i=1,...,n; j=1,...,m is the matrix of reward functions, c ∈ R m ++ is the vector of given column sums, and r ∈ R n ++ is the vector of given row sums. 2 The functions g ij : R + → R ∪ {−∞} will be assumed to be twice continuously differentiable (for convenience), strictly increasing, and concave. An obvious necessary condition for a MOMA problem to be feasible is that the sum of the row sums equals the sum of the column sums: This condition will be referred to as the global feasibility condition. We take it as a standing assumption in all MOMA problems to be discussed below.

Definition 2.2
An n × m matrix X is said to provide a balanced solution of a given MOMA problem iff it satisfies the following two conditions.
(i) The decision matrix X is Pareto efficient with respect to the objective functions i and the primary constraints of the MOMA problem. In other words, there does not exist a matrixX such thatX satisfies the primary constraints, i (X i· ) ≥ i (X i· ) for all i = 1, . . . , n, and i (X i· ) > i (X i· ) for at least one i ∈ {1, . . . , n}. (ii) The matrix X satisfies the secondary constraints of the MOMA problem.

Remark 2.1
The formulation as given above refers to the point of view of maximization. An analogous formulation can be given for minimization problems. In the case of minimization, the decision variables represent quantities that are preferred to be small, at least from the perspective of the i-th objective, such as "payment to be made by agent i if outcome j occurs" or "part of chore j to be carried out by agent i." For such problems, natural assumptions on the functions g ij (cost functions in this case, rather than reward functions) are that they are increasing and convex.

Remark 2.2
In the above, we work with unweighted sums. A similar problem with weighted row and column sums n i=1 p i x ij = c j , m j=1 q j x ij = r i ( p i > 0, q j > 0 for all i and j) can be transformed to a problem with unweighted sums by the substitutions In this sense, the use of unweighted sums entails no loss of generality. On the other hand, it might be argued that such usage may not offer the best preparation for the continuous case. For simplicity of notation, however, in this paper unweighted sums will be used.
where s i > 0 and t ij are constants (i = 1, . . . , n; j = 1, . . . , m), then any balanced solution of the original problem is also a balanced solution of the modified problem, and vice versa. This is immediate from the definition of balanced solutions and from the additive separability of the objective functions.

Remark 2.4
If X is a balanced solution of the MOMA problem specified by (g, c, r ), and s is any positive number, thenX := X/s is a balanced solution of the problem specified by (g, r/s, c/s) withg ij (x ij ) := g ij (sx ij ), and vice versa. Under the global feasibility constraint, it is therefore no restriction of generality to assume that i r i = j c j = 1.
A linear MOMA problem is one in which the reward functions are linear, i.e., g ij (x ij ) = b ij x ij with constant coefficients b ij > 0.

Examples
Three examples are given of situations in which MOMA problems, and in particular linear MOMA problems, arise naturally. These examples refer to task allocation, statistical classification, and risk sharing, respectively. Example 2.1 Mathematical formalization of the problem of chore division appears to have been first undertaken by Martin Gardner [28]. The problem is naturally formulated as a minimization problem, and the exogenous rights that were mentioned in Sect. 1 are in this case exogenous obligations. Consider a situation in which n agents must carry out m different tasks. For an example that may be familiar to many readers, one can think of m courses, which need to be taught by n available lecturers. The size of the contribution to be made by each of the agents is supposed to be prefixed; in the example of course allocation, this corresponds to a teaching load which is given in advance for each lecturer. The contribution size is computed by means of a linear formula, in which each of the tasks has its own nominal weight. The decision variable x ij represents the fraction of task j that is assigned to agent i. The objective functions (cost in this case) are the disutilities of the agents that result from their package of assigned tasks.
The primary constraints express that the task fractions must be nonnegative and that each task must be fulfilled completely. The secondary constraints correspond to the predetermined magnitudes of the contributions of the agents. The column sums are all 1 (i.e., 100%), and the row sums are given by the teaching loads. In this way, a MOMA problem is defined. If the agents' disutilities depend linearly on the task fractions, then we have a linear MOMA problem.

Example 2.2
Statistical classification is often considered as part of the field of machine learning [29]. The central problem in this area is to place objects into mutually exclusive categories, on the basis of observed characteristics. Examples of such situations are numerous and include, for instance, various types of diagnosis. To stay within the discrete context, it will be assumed here that only a finite number m of different characteristics can be observed (in other words, the "feature space" is finite) and that there is a finite number n of categories. The probability that a given object has characteristic j and belongs to category i will be denoted by p ij . It is assumed that these probabilities are known. The situation in which n = 2 corresponds to the problem of testing a simple null hypothesis against a simple alternative. Already in this case, the multiple-objective nature of the problem is apparent; the probability of a type-I error must be weighed against the probability of a type-II error. For a general number of categories, we can define with each category an objective function which is given by the probability of correct classification for an object in that category. Decisions may be randomized; we denote by x ij the probability by which the classifier will place an object with characteristic j into category i. A problem of MOMA type arises when the percentage of objects to be placed in each of the categories is determined in advance. Such a prescription may be reasonable in circumstances in which classification is carried out for the purpose of treatment (for instance, medical treatment, or participation in an educational program), and there are capacity constraints for each form of treatment. The decision variables are given by the numbers x ij . The probability that an object will be classified into category i, given that it indeed belongs to that category, is given by m j=1 p ij x ij / m j=1 p ij (i = 1, . . . , n). These probabilities may be taken as objective functions; in the case of two categories, they are known as specificity and sensitivity, and are in a one-to-one relationship with the probabilities of type-I errors and type-II errors. The primary constraints are given by the nonnegativity constraints on the decision variables x ij , and by the condition that all objects with a given feature j should be classified into one of the categories ( n i=1 x ij = 1 for all j). The secondary constraints correspond to the prescribed percentages of objects to be placed in each category. Since the objective functions are linear in this case, we have a linear MOMA problem.

Example 2.3
Risk sharing is a classical topic in actuarial science (see for instance [30]) and remains a very active field of research. While part of the literature on risk sharing is focused on information asymmetries and behavioral effects which we do not consider here, the problem has been formulated also in the MOMA framework [12]. For a simple illustration, consider n individuals who form a group that will carry out a project with an uncertain payoff (say, a fishing expedition). For the purpose of discreteness, assume that there are m possible outcomes. The variable x ij denotes the share of revenue that agent i will receive in case outcome j materializes. The right-hand side of the column constraint n i=1 x ij = c j refers to the total amount that is available in outcome j. It would also be reasonable to impose the nonnegativity constraint x ij ≥ 0. The decision variables x ij are to be determined at the beginning of the project, before the "veil of uncertainty" has been lifted. To express a notion of fairness, the group might agree on state prices q j for each of the possible outcomes, and fix a value for the allotment to each agent, i.e., m j=1 q j x ij = v i , where the numbers v i are chosen in relation to the contributions (in terms of effort and money) made by participants. If the participants are risk neutral, i.e., their objectives are given by their subjective expected payoffs (in other words, i (X i· ) = m j=1 p ij x ij where the coefficients p ij refer to subjective probabilities), then we obtain a linear MOMA problem. In a more actuarial context, one might consider a group of companies which agree to a mutual reinsurance agreement with respect to claims that will come in during a given period. Fairness may then for instance be expressed on the basis of the expected claim sizes of the portfolios that the companies bring into the pool. In this case, the problem is more naturally formulated as a minimization problem, since the agents prefer smaller realized claims to larger ones.

Characterization of Balanced Solutions
Consider a linear MOMA problem given by It will be assumed throughout that the coefficients b ij and the prescribed row and column sums r i and c j are positive and that the global feasibility condition (1) is satisfied. Under these conditions, the feasible set in decision space corresponding to the primary constraints is nonempty. The expressions above can refer to maximization as well as to minimization. In both cases, the coefficients b ij are supposed to be positive. Of course, the two cases differ by a reversal of the direction of the inequality used in the definition of Pareto efficiency. It follows from a theorem of Isermann [31] (see also [22,Thm. 6.6]) that a matrix X is a Pareto-efficient solution of the problem consisting of the objectives and the primary constraints, taken as a maximization problem, if and only if it is a solution of a weighted-sum optimization problem of the form where α 1 , . . . , α n are positive parameters. As a result, we obtain the following characterization of balanced solutions of linear MOMA problems.
is a balanced solution of the linear MOMA problem specified by (2), taken as a maximization problem, if and only if there exist positive numbers α 1 , . . . , α n (weights) and positive numbers β 1 , . . . , β m (dual variables relating to equality constraints), such that the following conditions are satisfied: Proof The weighted-sum optimization problem (3) is a linear program with equality and inequality constraints. By linear programming duality (see for instance [32]), a matrix X is optimal for (3) if and only if there exist β j ∈ R such that the complementary slackness conditions (4a) and the column constraints (4b) are satisfied. The set of all Pareto-efficient solutions of the multi-objective linear program specified by the objective functions and the primary constraints of (2) is therefore given by matrices X ∈ R n×m + for which there exist α i > 0 and β j ∈ R such that (4a) and (4b) hold. By the standing assumption that b ij > 0, this can only happen for dual variables which are positive.

Remark 2.5
An analogous statement to Lemma 2.1 can be proven in the case of minimization; the only difference lies in the direction of the inequality in the first term of the disjunction in (4a). Because of the constraints x ij ≥ 0 and n i=1 x ij = c j > 0, for every index j there must be a least one index i such that x ij > 0; for this index i, the equality α i b ij = β j holds. It follows that, also in the MOMA minimization case, the dual variables β j are positive.
balanced solutions of such problems and optimal solutions of related single-objective matrix allocation (SOMA) problems of the form Specifically, under regularity conditions, it was shown by [11] that a matrix X is a balanced solution of the MOMA problem of Definition 2.1 if and only if X is an optimal solution of the SOMA problem (5), where the reward functions f ij of the SOMA problem relate to the utility functions g ij of the MOMA problem by the prescription In other words, if one starts with the utility functions of the SOMA problem, then the utility functions of the corresponding MOMA problem are found from these by executing the following operations in succession: differentiation, exponentiation, integration. If one applies this procedure to a linear function, then the result is again linear, since the exponential of a constant is a constant. This suggests that linear MOMA problems should correspond to SOMA problems of the form (2.1) with linear reward functions. Such linear SOMA problems are known as optimal transport problems. In this section, we confirm the conjecture that there is a one-to-one relationship between solutions of the discrete optimal transport problem on the one hand, and solutions of related linear MOMA problems on the other hand.

Remark 3.1
For MOMA minimization problems with increasing and strictly convex disutility functions, a result that is analogous to the one in [11] can be proven. The relation between the cost functions g ij (x) in the MOMA minimization problem and the cost functions f ij (x) in the SOMA minimization problem is given by the same formula (6) as in the maximization case.

Remark 3.2
In the single-objective case, maximization and minimization problems can be simply converted into each other by the usual device of changing the sign of the objective function. The same is not true for MOMA problems in which we are looking for balanced solutions, since here we require that objective functions are increasing, both in the case of maximization and in the case of minimization. Instead, one can make use of the connection via the equivalent SOMA problem as specified in (6). It is seen that a matrix X is a balanced solution of the MOMA maximization problem with reward functions g ij if and only if X is a balanced solution of the MOMA minimization problem with cost functionsĝ ij (and the same row and column constraints), when the functions g ij andĝ ij are related via The conjugation defined by (7) relates for instance log x (increasing, concave) to 1 2 x 2 (increasing, convex). This means that a logarithmic MOMA maximization problem is equivalent to a quadratic MOMA minimization problem. For another example, the increasing and concave power functions x 1−γ /(1 − γ ) with γ > 0, popular in utility theory, are linked by (7) to the increasing and convex power functions x 1+γ /(1 + γ ). A linear MOMA maximization problem with coefficients b ij is equivalent to a linear MOMA minimization problem with coefficients b −1 ij .
The optimal transport problems that will be considered below, in the maximization case, are of the discrete form The given row sums r i and column sums c j are all supposed to be positive, and the global feasibility condition i r i = j c j is supposed to be satisfied. Under these conditions, the feasible set is not empty; for instance, we can take x ij = r i c j /s where s := i r i = j c j . The problem (8) is known as the Monge-Kantorovich problem. It is a linear programming problem; its dual is given by (2), taken as a maximization problem, if and only if X is an optimal solution of the optimal transport problem (8) with the same row and column constraints, and

Theorem 3.1 A matrix X is a balanced solution of the linear MOMA problem
Proof By linear programming duality, which in this case is Monge-Kantorovich duality (see for instance [9, Thm. 2.2]), a matrix X ∈ R n×m + is a solution of the optimal transport problem (8) if and only if there are λ i ∈ R (i = 1, . . . , n) and μ j ∈ R ( j = 1, . . . , m) such that the following conditions hold: x ij = 0 and a ij ≤ λ i + μ j or x ij ≥ 0 and a ij = λ i + μ j (i = 1, . . . , n; j = 1, . . . , m), Under (10), these conditions are identical to the conditions (4), since we can take α i = exp(−λ i ) and β j = exp μ j . The statement of the theorem follows.
The value of the function i j (log b ij )x ij for general x ij cannot be determined on the basis of information concerning only the values of the functions j b ij x ij . This shows that, even in the linear case, the relation between MOMA problems and SOMA problems as reflected in the theorem above cannot be viewed as a scalarization procedure. The term "social welfare function" used by Gale and Sobel [11] must be interpreted carefully; it should be noted that in the single-objective problem, the welfare is served of agents who are different from those that appear in the corresponding multi-objective problem. (10) can be used to translate properties from linear SOMA problems to linear MOMA problems and vice versa. For instance, it is well known that the optimal transport problem can be solved very easily by a greedy algorithm when the weight matrix satisfies the Monge property, which requires that a i 1 j 1 + a i 2 j 2 ≥ a i 1 j 2 + a i 2 j 1 for all 1 ≤ i 1 < i 2 ≤ n and 1 ≤ j 1 < j 2 ≤ m; see [33] for an extensive discussion. 3 The corresponding condition for the related multi-objective problem is that the determinants of all 2-by-2 submatrices of the coefficient matrix (b ij ) should be nonnegative.

Direct Iteration
For strictly concave single-objective matrix allocation problems of the form (5), there is a natural iterative solution method, which may be viewed as an application of the alternating projections method due to Bregman [34]. The idea of solving matrix allocation problems by an iterative process that switches between row and column constraints already occurs in the early contributions of Kruithof [35] and Deming and Stephan [36]. The algorithm proposed in [35] and [36] is known as the iterative proportional fitting procedure (IPFP), or also as the RAS method. The iterative solution method for SOMA maximization problems that we describe below is a direct generalization of IPFP. For simplicity, it is assumed that the reward functions f ij (x) in (5) satisfy lim x↓0 f ij (x) = ∞ and lim x→∞ f ij (x) = −∞; the corresponding conditions for the MOMA reward functions related via (6) are usually referred to as Inada conditions. These conditions imply, in particular, that the nonnegativity constraints are redundant (optimal solutions must be strictly positive).
The standard Lagrange optimization method for the problem (5) gives rise to the equations f ij (x ij ) = λ i + μ j (i = 1, . . . , n; j = 1, . . . , m) (12) where λ i (i = 1, . . . , n) and μ j ( j = 1, . . . , m) are Lagrange multipliers corresponding to row and column constraints respectively. Under the strict concavity assumptions on the functions f ij , the derivatives f ij are strictly decreasing. Consequently, these functions have inverses, which will be denoted by F ij . Under the Inada conditions, the inverse functions F ij are defined on all of the real line. We can then rewrite the row and column constraints in terms of the multipliers λ i and μ j : Starting with an initial guess for the values of λ i , one can use Eq. (13a) to compute corresponding multipliers μ j . Each multiplier μ j appears in exactly one of the equations. Depending on the nature of the functions F ij , it may be possible to write down the answer in analytic form, but even if this is not the case, the problem of finding μ j only calls for determining the root of a scalar monotone function. When values for μ j have been found in this way, one can use the Eq. (13b) to find updated values of the multipliers λ i . Again the equation system is diagonal in the sense that each λ i occurs in exactly one of the equations. After solving for λ i (i = 1, . . . , n), one can determine updates for μ j ( j = 1, . . . , m), and so on. Convergence of the iterative proportional fitting procedure and of its generalization to infinite dimensions has been studied extensively; see for instance [37][38][39][40][41][42][43]. Motivated by a MOMA problem in risk sharing, Pazdera et al. [17] proved convergence of the iterative procedure for a broad class of strictly concave utilities, on the basis of a nonlinear version of the Perron-Frobenius theorem due to [44]. One may write down an analogous iterative procedure for optimal transport problems or equivalently for their multi-objective counterparts. While in this case the objective functions are linear rather than strictly concave, the iterative procedure can still be followed; however, as will be seen below, the procedure is not effective as a solution method. Nevertheless, the iteration in the linear case is worked out below, because it sheds light on the behavior of regularized versions.
We follow the MOMA viewpoint, which leads naturally to a multiplicative formulation. Iteration takes place between the utility weights α i and the multipliers β j . Starting with positive initial values for the weights α i (for example, one might choose α i = 1 for all i), the multipliers β j are to be found from the conditions for optimality of the weighted-sum problem (4a) and the column constraints. It is easily verified that, given a vector of weights α, the vector of multipliers β is such that (4a) as well as the column constraints are satisfied for some x ij if and only if β j = max i α i b ij for j = 1, . . . , m. Likewise, given a vector of multipliers β, the vector of weights α is such that (4a) as well as the row constraints are satisfied for some x ij if and only if α i = min j β j /b ij for i = 1, . . . , n. This leads to the iteration α (k+1) = ϕ(α (k) ), where the mapping ϕ from R n + into itself is defined by ϕ(α) =α with The behavior of this iterative procedure may be understood as follows. The update α

In other words, the new vector of weights
i b ij for all j, with equality for at least one value of j. The quantity max i α (k) i b ij may be called the column maximum for column j in the row-scaled matrix with elements α (k) i b ij . If row i in this matrix does not contain an element whose value is equal to the column maximum, then the row will receive a renewed scaling by multiplication by α (k+1) i , so that the column maximum will be achieved. The renewed scaling is the minimal one which has this effect, and therefore, it does not affect the value of any of the column maxima. After one iteration, each row in the row-scaled matrix contains a column maximal element, and the iteration arrives at a fixed point.
While convergence therefore takes place after at most one step, there are many fixed points. Any row scaling which is such that every row contains a column maximal element corresponds to a fixed point of the iteration. Moreover, the decision matrix cannot be written as a function of the weights α i and the multipliers β j , in contrast to the strictly concave case.

Regularization
A possible solution to remedy the unsatisfactory behavior of the direct iterative algorithm for the optimal transport problem as discussed above is to apply regularization. Here, we consider an additive regularization of the optimal transport problem, and we aim at finding the corresponding regularization in the multi-objective framework. In order to regularize, one can replace the linear reward functions a ij x ij by concavified versions where the regularization parameter (or "temperature") η is a small positive number, and the regularization function f 1 (x) is strictly concave. The optimization problem defined by these regularized reward functions is strictly concave, and the relation (6) can be used to determine the corresponding multi-objective problem. Defining the function g 1 (x) (up to an irrelevant constant) from the regularization function f 1 (x) by log g 1 (x) = f 1 (x) as in (6), one finds that the MOMA regularization corresponding to (16) is given by where b ij = exp(a ij ) and the function g η (x) satisfies The relation (17) has a straightforward interpretation in terms of the Arrow-Pratt coefficient of risk aversion. The Arrow-Pratt risk aversion function r η (x) associated with the utility function g η (x) is given by r η (x) = −g η (x)/g η (x). With this definition, the relation (17) is equivalent (up to an irrelevant constant) to This shows a rather natural relation between the risk aversion functions corresponding to g η and g 1 .
A regularization function that is frequently chosen for the optimal transport problem is the entropic function given by [9,13,14,45]. Up to a positive factor, which we can ignore on the basis of Remark 2.3, the corresponding regularization of the MOMA problem is g 1 (x) = 1/x. From (17), it follows that the MOMA regularization in this case can be written as The function that appears here is known as power utility, or also as isoelastic utility or CRRA utility (constant relative risk aversion). It is seen, therefore, that entropic regularization of the optimal transport problem corresponds to isoelastic regularization of the corresponding multi-objective matrix allocation problem.
In the case of entropic regularization of optimal transport problems, or, equivalently, isoelastic regularizations of linear MOMA problems, the iterative algorithm based on (13a-13b) specializes as follows. Working from the multiplicative form, the k-th iterate x (k) ij is determined in terms of the corresponding iterates of the multiplicative parameters by 4 α (k) In the special case (20), this produces Applying the column and row constraints successively leads to the iteration mapping ϕ η from R n + \ {0} into R n ++ defined by ϕ η (α) =α, where the vectorα is obtained from Here, · p (with p = 1/η) refers to the usual vector p-norm x p = n i=1 |x i | p 1/ p . The notations α · b · j and b i· /β · are used to indicate the n-vector with entries α i b ij (i = 1, . . . , n) and the m-vector with entries b ij /β j ( j = 1, . . . , m), respectively. In the limit, as the regularization parameter η tends to 0, the update mappings that are defined here converge to the ones in (14). (21) can equivalently be written as follows:

Remark 4.1 The update equations in
Instead of computing the sequences α (k) i and β (k) j , one can therefore also process the iteration with the sequencesα  [14]. It coincides with the IPFP algorithm, applied to the matrix with entries b 1/η ij = exp(a ij /η). The advantages and disadvantages of various possible formulations are discussed below in Sect. 5.

Remark 4.2
As noted above, the IPFP algorithm corresponds to a SOMA minimization problem in which the cost function is given by relative entropy. It is readily verified, via the differentiation-exponentiation-integration route (6), that the related MOMA minimization problems refers to agents whose utility functions are quadratic. There is some historical interest to this observation. The IPFP method was presented in 1940 by Deming and Stephan [36] as an algorithm that achieves a best approximation in the quadratic sense to a given matrix, subject to the row and column constraints. The authors soon discovered, however, that their claim was mistaken [46]. Decades later, it was established by Ireland and Kullback [39] that the approximation computed via IPFP is optimal in the sense of Kullback-Leibler divergence (relative entropy), rather than in the quadratic sense. The error in [36] may be softened a bit by the observation that, in the MOMA interpretation, IPFP does solve a quadratic minimization problem.

Perron-Frobenius Analysis
The mapping (21), when extended to take the value 0 at 0, is a homogeneous mapping of the nonnegative cone into itself. The convergence behavior of iterations defined by such mappings has been studied extensively, both in finite-dimensional and in infinitedimensional spaces; see for instance [16,[47][48][49][50]. A key tool in this literature is the Hilbert metric, which is defined for positive vectors of equal length by The Hilbert metric is in fact only a pseudo-metric on the positive cone, since points of the same ray have distance zero, but on the open unit simplex it is a true metric. The use of this metric for the study of eigenvectors of positive linear mappings was initiated in [51], while its application to nonlinear mappings starts with [52]. The purpose of this section is to show how a result of Oshime [44] in nonlinear Perron-Frobenius theory can be used to prove with little effort that the iteration (21) converges to a unique (up to a positive multiplicative factor) fixed point. This follows the development in [17], but the proof can be much shorter in the present case. The approach facilitates comparison with the ineffective iteration (14). The notion of nonsectionality is needed, which may be thought of as a nonlinear version of the well-known irreducibility condition of linear Perron-Frobenius theory. It is defined as follows. In the definition below, the notation x R where x is an n-vector and R is a subset of the index set {1, . . . , n} denotes the subvector with entries x i , i ∈ R. Definition 4.1 [44] A mapping ϕ of R n + into itself is nonsectional iff, for every decomposition of the index set {1, . . . , n} into two complementary nonempty subsets R and S, there exists s ∈ S such that (i) (ϕ(x)) s > (ϕ(y)) s for all x, y ∈ R n + such that x R > y R and x S = y S > 0; (ii) (ϕ(x k )) s → ∞ for all sequences (x k ) k=1,2,... in R n + such that x k R → ∞, while x k S is fixed and positive. The following theorem, as cited in [17], can be used to prove uniqueness of the fixed points of the iteration defined by (21).

Theorem 4.1 [44, Thm. 8, Remark 2]
If a mapping ϕ from R n + into itself is continuous, monotone, homogeneous, and nonsectional, then the mapping ϕ has a positive eigenvector, which is unique up to scalar multiplication, with a positive associated eigenvalue. In other words, there exist a constant θ * > 0 and a vector x * ∈ R n ++ such that ϕ(x * ) = θ * x * , and if θ > 0 and x ∈ R n ++ are such that ϕ(x) = θ x, then x is a scalar multiple of x * .
The existence of a unique positive eigenvector to the mapping defined by (21) does not immediately imply the existence of a fixed point; it needs to be shown that the corresponding eigenvalue is 1. This follows from the global feasibility constraint (cf. similar arguments in [53, p. 246] and [17]). (21), we haveα = θα for α ∈ R n \ {0}, then θ = 1.

Lemma 4.1 If, in the iteration mapping
Proof The equations that defineα from β and β from α, written in implicit form, are The global feasibility constraint i r i = j c j therefore implies that Consequently, ifα = θα, then θ must be equal to 1.
We can verify the conditions in Theorem 4.1 for the mapping defined in (21) and compare them to the properties of the update mapping defined by (14). A mapping ϕ from (a subset of) R n 1 to R n 2 is said to be strongly monotone iff the conditions x i ≥ y i for i = 1, . . . , n 1 and x i > y i for some i ∈ {1, . . . , n 1 } imply (ϕ(x)) i > (ϕ(y)) i for all i = 1, . . . , n 2 .

Proposition 4.1
The mapping from the nonnegative cone R + to itself defined by (14) is continuous, monotone, and homogeneous. The mapping (21) from R + into itself is continuous, strongly monotone, homogeneous, and nonsectional.
Proof Continuity, monotonicity, and homogeneity are immediate from the definitions, both for the nonregularized mapping and for the regularized version. To show strong monotonicity in the case of (21), note that, due to the assumed positiveness of all coefficients b ij , an increase of any of the components of the weight vector α implies an increase of all of the components of the vector β, which in its turn implies an increase of all components of the image vectorα. Strong monotonicity implies that condition (i) in the definition of nonsectionality is satisfied. Moreover, when one of the components of α tends to infinity, the same is true for all of the components of β, and consequently for all of the components ofα. This shows that condition (ii) is satisfied by the mapping (21). Proof The first claim follows from Proposition 4.1 and Theorem 4.1. By a basic result in Perron-Frobenius theory (see for instance [16,Ch. 2] and [17,Lemma 4.4]), a homogeneous and strongly monotone mapping from the positive cone into itself is contractive in the sense of the Hilbert metric. Given the uniqueness of the fixed point, convergence of the iteration then follows from a result of Nadler [54] (see also [17,Cor. 5.4]).
The nonregularized mapping (14) does not satisfy the property of nonsectionality. Indeed, otherwise uniqueness of a fixed point would follow as in the corollary above, whereas it has already been argued that the mapping (14) has many fixed points, which are not related to each other by multiplication by a positive constant. To see the violation of nonsectionality more directly, consider a point α ∈ R n ++ in which the first component α 1 is large relative to the other components, so that, for all j, α 1 b 1 j > α i b ij for i = 2, . . . , n. A sufficiently small increase of the components α 2 , . . . , α n will then have no effect on any of the parameters β j defined in (14), and consequently there is no effect on the updateα either. This means that condition (i) in the definition of nonsectionality is not satisfied with S = {1} and R = {2, . . . , n}. It also follows that the mapping defined by (14) is not strongly monotone.

Numerics
Alternative implementations of the regularization method for optimal transport problems can be derived from (21) and from (22). These formulas are mathematically equivalent, but the corresponding implementations may differ in their numerical properties, especially when the regularization parameter is close to 0. As noted in [14], when the parameter η is very small, the computation of a number of the form b 1/η may lead to numerical overflow or underflow. Such problems are less likely in the implementation (21), which uses the operations of raising a number to the power η and computing the p-norm of a vector with p = 1/η; both of these operations can be implemented without the creation of intermediate results that are very large or very small. 5 There are different forms of the IPFP algorithm. It can be written in a vector iteration form, starting from a given positive initial matrix (x (0) ij ) and a positive initial vector (24) Alternatively, a matrix iteration can be used: It can be verified immediately that, if the initial vector u (0) is chosen such that u (0) i = 1 for all i, the two recursions produce the same sequence of matrices x (k) ij . One might say that (24) presents a cumulative form of the computation, whereas (25) is the corre-sponding incremental form. Corresponding to the vector recursion (21), a matrix iteration can analogously be defined as follows. Starting from a given matrix (z (26) where the incremental multipliers s (k) ∈ R m and t (k) ∈ R n are given by The matrices (z ij ) as defined in (21) via ij . Starting from the initialization z ij = b ij , one can therefore iterate the matrices (z ij ) and extract the approximate solution of the optimal transport problem (x ij ) at the end.
Extracting the approximate solution only at the end raises the question how to monitor the quality of the solution in intermediate stages, in order to obtain a stopping criterion for the iteration. By construction of the algorithm, after each full step, the row sums of the matrix (x (k) ij ) are equal to the prescribed row sums r i , up to machine precision. Therefore, it is natural to measure the accuracy of the solution by the deviation of the column sums from the prescribed values c j . From the equality m j=1 x ij = r i for all i, it follows that m j=1 n i=1 x ij = m j=1 c j . Consequently, the Hilbert metric can be used as a valid distance measure between the vector of column sums of the approximate solution and the vector c. We have where s (k) j is defined in (27), and 1 is the all-ones vector. Computation of the Hilbert metric is therefore an easy by-product of the updating rules (26)(27), and one can monitor the progress of the convergence without actually computing the intermediate approximate solutions (x (k) ij ). As noted above, the nonregularized vector iteration (14) can have at most one nontrivial step. In this step, it produces a row balancing ensuring that every row of the weight matrix (b ij ) contains at least one element that is maximal among all elements in the column that it belongs to. From a numerical perspective, this type of row equilibration is possibly useful.
An iterative algorithm to solve the regularized version of the optimal transport problem may now be written as follows. Inputs to the computation are: the weight matrix (a ij ); the column sums c j ; the row sums r i ; the chosen regularization parameter η; and, finally, the tolerance level that is chosen as a stopping criterion for the iteration.  Numerical results for a test problem are shown in the figures below. The weight matrix is defined by a discretization, on a grid of size 256 × 256, of the function a(x, y) = sin 4π (x − 0.5) 2 + (y − 0.5) 2 defined on the unit square. Marginals are given by scaled discretizations of the functions c(x) = |x − 0.5| and r (y) = |y − 0.5|. Figure 1 gives a visual description of the problem data (darker cells correspond to higher values of a ij ). The results of computations are shown in Fig. 2 for η = 10 −2 , η = 10 −3 , and η = 10 −4 . The tolerance level is set at 0.01 in each case. CPU time per iteration step is mainly determined by the grid size and is not influenced much by the choice of the regularization parameter η. However, the number of iteration steps required to reach a given level of convergence appears to be approximately inversely proportional to η. The computation time is therefore also inversely proportional to η.

Remark 5.1
The computation time can be reduced by carrying out the iteration in stages, starting with a relatively high value of the regularization parameter and reducing it stepwise. Such stagewise calculation is standard in the interior point method of linear programming (see for instance [55]). A simple reduction scheme was applied to the problem data given above, consisting of twelve stages in which the regularization parameter was reduced by the factor 1.5 at each step to reach the final level 10 −4 , while the tolerance level was kept at 0.01 in all stages. The stagewise computation required less than 1000 iterations to converge. Compared to the single-stage calculation that starts immediately with the final value of the regularization parameter, this implies a reduction of computation time by approximately a factor 25. It is seen from the graphs of the criterion value (28) that the convergence is somewhat irregular. This behavior becomes even more pronounced in the case of small examples. Of course, in such examples there is not really a reason to apply regularization, since small instances can be easily solved by any linear programming code. However, one may be able to get more insight into the behavior of the iterative algorithm from simple examples. Consider a problem with the following data: The convergence behavior of the iterative algorithm is shown in Fig. 3. As is evident from the figure, the convergence toward the true solution is not smooth at all. There are long periods in which little progress is made, and the algorithm appears almost trapped. On the other hand, once the algorithm is close to the true solution, progress is quick; reducing the tolerance level from 10 −2 , as used here, to levels close to machine accuracy would increase the total number of iterations just by a few percent. Upon inspection of the intermediate results, it appears that the phases of stagnation are linked to approximate solutions that, when used as a starting point for the IPFP algorithm, would lead to cycling, due to the presence of too many zeros. 6 The algorithm passes closely by the following row-balanced matrices in succession 7 : 6 Precise conditions for convergence of the IPFP algorithm for nonnegative weight matrices are given in [56]. 7 More precisely, the matrices given are of the form (x  Each of the first three matrices above has the property that it would cause the IPFP algorithm to cycle; the final matrix corresponds to the true solution. When the regularization parameter is reduced, the algorithm follows essentially the same path, but takes smaller steps, and hence requires more iterations.

Conclusions
The purpose of this paper has been to establish a relationship between the optimal transport problem on the one hand and a certain class of multi-objective equilibrium problems with linear reward functions on the other hand. This connection forms a natural complement to a similar relationship which exists in the strictly concave/strictly convex case. By the availability of this connection, it is possible to apply different perspectives to a given problem. This variety of viewpoints has been illustrated in the context of regularization, where it was shown that entropic regularization in the single-objective problem corresponds to isoelastic regularization in the multi-objective framework. The ability to switch between different interpretations may be useful in the design of algorithms. The curious behavior of the iterative algorithm for small values of the regularization parameter seems worthy of further investigation.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.