Two Polynomial Time Graph Labeling Algorithms Optimizing Max-Norm-Based Objective Functions

Many problems in applied computer science can be expressed in a graph setting and solved by finding an appropriate vertex labeling of the associated graph. It is also common to identify the term “appropriate labeling” with a labeling that optimizes some application-motivated objective function. The goal of this work is to present two algorithms that, for the objective functions in a general format motivated by image processing tasks, find such optimal labelings. Specifically, we consider a problem of finding an optimal binary labeling for the objective function defined as the max-norm over a set of local costs of a form that naturally appears in image processing. It is well known that for a limited subclass of such problems, globally optimal solutions can be found via watershed cuts, that is, by the cuts associated with the optimal spanning forests of a graph. Here, we propose two new algorithms for optimizing a broader class of such problems. The first algorithm, that works for all considered objective functions, returns a globally optimal labeling in quadratic time with respect to the size of the graph (i.e., the number of its vertices and edges) or, for an image associated graph, the size of the image. The second algorithm is more efficient, with quasi-linear time complexity, and returns a globally optimal labeling provided that the objective function satisfies certain given conditions. These conditions are analogous to the submodularity conditions encountered in max-flow/min-cut optimization, where the objective function is defined as sum of all local costs. We will also consider a refinement of the max-norm measure, defined in terms of the lexicographical order, and examine the algorithms that could find minimal labelings with respect to this refined measure.


Introduction
Many fundamental problems in image processing and computer vision, such as image filtering, segmentation, registration, and stereo vision, can naturally be formulated as optimization problems. Often, these optimization problems can be described as labeling problems in which we wish to assign to each image element (pixel or vertex of an associated graph) v ∈ V an element (v) from some finite K -element set of labels, usually {0, . . . , K − 1}. The interpretation of these labels depends on the optimization problem at hand. In image segmentation, the labels might indicate object categories. In registration and stereo disparity problems, the labels represent correspondences between images, and in image reconstruction and filtering the labels represent intensities in the filtered image.
In what follows an undirected graph G is identified with a pair V , E , where V is its set of vertices and E is the set of its edges. Each edge connecting vertices s and t is identified with a pair {s, t}. We make the assumption that the vertices in V are linearly ordered, and letÊ := { s, t ∈ V 2 : {s, t} ∈ E & s < t}.
Our new algorithms have no restriction on the format of the graph to which they can be applied. However, in what follows we will often treat G as associated with a digital image. In this case, V is the set of all pixels of the image, while E is the set of pairs {s, t} of vertices/pixels that are adjacent according to some given adjacency relation.
In this paper, we seek the vertex label assignments : V → {0, 1, . . . , K − 1} of the undirected 1 graphs G = (V , E) that minimize a given objective (energy) function E ∞ of the form The functions φ s (·) are referred to as unary terms. The value of φ s ( j) depends explicitly only on the label j ∈ {0, 1, . . . , K − 1}, but typically is also based on some prior information. These terms are used to indicate a preference for a vertex/pixel s to be assigned a particular label j.
The functions φ st (·, ·) are referred to as pairwise or binary terms. The value of φ st (·, ·) depends simultaneously on the labels assigned to the vertices/pixels s and t, and thus introduces a dependency between the labels of different pixels. Typically, this dependency between pixels is used to express an expectation that the desired solution should have some degree of smoothness or regularity.
The unary and pairwise terms taken together form the local costs error measures we mentioned in the abstract (and forming the functional Φ defined in Sect. 3). The same local costs are used in the L 1 -norm energy E 1 , that we discuss briefly in the next section.
Finding a labeling that globally minimizes an objective function of the form E ∞ is generally a challenging computational task-in Sect. 7, we show that this problem is in fact NP-hard in the general case, for K > 2. As we will see, however, there exist restricted classes of local cost functionals for which efficient algorithms can be formulated.
In the conference version of this paper [16], we introduced an algorithm for finding a binary labeling (i.e., with K = 2) and showed that the labeling it returns is always E ∞optimal as long as all pairwise local cost terms φ st are ∞submodular, that is, that they satisfy the condition This algorithm, presented in Sect. 6, is very efficient, with quasi-linear time complexity. 2 An important question left open in our previous work [16] was whether it is possible to optimize objective function E ∞ in polynomial time without any additional assumptions on the local cost functional, like that of ∞-submodularity needed for the algorithm from [16]. Here, we answer this question affirmatively by presenting in Sect. 5 an algorithm that produces, in O((|V | + |E|) 2 ) time, a binary labeling that is globally E ∞ -optimal for any local cost functional.

L p Norm Objective Functions and Minimal Graph Cuts
While the main focus of this paper is to find efficient algorithms for the direct optimization of objective functions of the form E ∞ , we will start by discussing the more general problems of optimizing L p norm objective functions for p ∈ [1, ∞].
In their seminal work, Kolmogorov and Zabih [13] considered binary labeling problems for the L 1 -norm-based objective function of the form and showed that a globally optimal binary labeling can be found by solving a max-flow/min-cut problem on a suitably constructed graph under the condition that all pairwise terms φ st are submodular, that is, that they satisfy the inequality Looking at the objective functions E 1 and E ∞ , we can view them both as consisting of two parts: -The local error measures, in our case expressed by the unary and pairwise terms. -A global error measure, aggregating the local errors into a final score.
In the case of E 1 , the global error measure is obtained by summing all local error measures; in the case of E ∞ , the global error measure is taken to be the maximum of all local error measures. If we assume for a moment that all local error measurements are nonnegative, then E 1 can be seen as measuring the L 1 -norm of a vector 3 containing all local costs/errors. Similarly, E ∞ can be interpreted as the L ∞ -(or max-) norm of the same vector. The L 1 and L ∞ norms are both the special cases of L p norms, with p ∈ [1, ∞], which for finite p are defined as where φ p s (·) = (φ s (·)) p and φ p st (·, ·) = (φ st (·, ·)) p . The value p ∈ [1, ∞] can be seen as a parameter controlling the balance between minimizing the overall cost versus minimizing the magnitude of the individual terms. For p = 1, the optimal labeling may contain arbitrarily large individual terms as long as the sum of the terms is small. As p increases, a larger penalty is assigned to solutions containing large individual terms. In the limit as p approaches infinity, E p approaches E ∞ and the penalty assigned to a solution is determined by the largest individual term only. The limit behavior of L p norm optimizers as p approaches ∞ has also been studied in, e.g., [8,18,20]. Abbas and Swoboda [1] considered optimization of mixed optimization problems, where the objective function contains both L 1 and L ∞ terms.
Labeling problems with objective functions of the form E p , for p ∈ [1, ∞), can be solved using minimal graph cuts, provided that all pairwise terms φ p st are p-submodular [17]. A binary term φ is said to be p-submodular if the corresponding term φ p is submodular, which is equivalent to the condition In the limit, as p goes to infinity, this inequality becomes that is, the ∞-submodularity condition (2). As observed by Malmberg and Strand [17], 1-submodularity does not necessarily imply p-submodularity. 4 The following theorem was shown by Malmberg and Strand [17]: Theorem 1 If a binary term φ is 1-submodular and ∞submodular, then it is also p-submodular for any real p ≥ 1.
We note here that Theorem 1 implies also the following seemingly stronger result.

Corollary 1
Let φ be a binary term. Then for every ρ ∈ [1, ∞) the following conditions are equivalent.
Proof To see that (ii) implies (i) notice that the p-submodularity inequality (6) can be written as Since the L p norm converges to the L ∞ norm, as p goes to infinity, the limit of both sides of the above inequality becomes that is, the ∞-submodularity condition (2).
To see that (i) implies (ii) assume that φ st satisfies (i). Then φ ρ st is both 1-submodular (raise both sides of the inequality (2) with p = ρ to the power ρ) and ∞-submodular (as the map x ρ is increasing on (0, ∞)). In particular, φ st satisfies the assumptions of Theorem 1. Therefore, for every p ∈ [ρ, ∞) it is p ρ -submodular, that is, satisfies But this clearly implies p-submodularity of φ st .

Optimization of E ∞ by Classical Algorithms
In Sect. 4, we will show that if the binary terms φ satisfy (i) of Corollary 1, then an optimal labeling for the associated energy E ∞ can be found by solving an appropriate maxflow/min-cut problem. Moreover, it turns out that in some problem instances a labeling that is globally optimal with respect to E ∞ can be found using very efficient, greedy algorithms. Specifically, if then an optimal labeling for the associated energy E ∞ can be found by computing the partitioning induced by an optimum spanning forest on a suitably constructed graph using, e.g., Prim's algorithm [7,19] 5 . See more on this in Sect. 4. This property of optimum spanning forests has been observed by several authors [2,6,8]. This result has a high practical value since the computation time for constructing an optimal spanning forest is substantially lower than the computation time for solving a max-flow/min-cut problem, asymptotically as well as in practice [8].
Wolf et al. [21][22][23] recently proposed various extension of this greedy approach and also reported state-of-the-art results on various image segmentation benchmarks. We note also that the notion of partitioning an image-induced graph by computing an optimum spanning forest is tightly connected to the classic watershed image segmentation method [9,10].
Based on the above, an interesting question is therefore whether it is possible to use similar greedy techniques to optimize the objective function E ∞ beyond the special case when the local costs satisfy property (D). The results presented in this paper answer this question affirmatively and show that the class of E ∞ optimization problems that are solvable by the efficient greedy algorithms is larger than what was previously known.

Algorithms for Direct Optimization of E ∞ : Preliminaries
In Sects. 5 and 6, we will introduce two novel algorithms, each finding a binary labeling minimizing E ∞ . The exposition of these algorithms relies on the notion of unary and binary solution atoms, which we introduce in this section. Informally, a unary atom represents one possible label configuration for a single vertex, and a binary atom represents a possible label configuration for a pair of adjacent vertices. Thus, for a binary labeling problem, there are two atoms associated with every vertex and four atoms for every edge. The total number of atoms for a binary labeling problem is thus O(|V | + |E|).
Formally, we let V = {{v} : v ∈ V }, put D = V ∪ E, and let A be the family of all binary maps from D ∈ D into {0, 1}. An atom, in this notation, is an element of A. If we identify, as it is common, maps with their graphs then each unary atom associated with a vertex s ∈ V has form { s, i }, with i ∈ {0, 1}. Similarly, each binary atom associated with an edge {s, t} ∈ E has the form { s, i , t, j }, with i, j ∈ {0, 1}.
Notice, that the maps φ s and φ st used for the unary and binary terms in (1) can be combined to form a single function where D is the restriction of to D. With this notation, we may write the objective function E ∞ as Similarly, E p ( ) = φ p for any p ∈ [1, ∞).

Consistency
Conceptually, both the proposed algorithms work as follows: Starting from the set of all possible unary and binary atoms, the algorithm iteratively removes one atom at a time until the remaining atoms define a unique labeling. A key issue in this process is to ensure that, at all steps of the algorithm, at least one labeling can be constructed from the set of remaining atoms. Let be a binary labeling. We define A( ), the atoms for , as the family Notice that can be easily recovered from A( ) as its union: = A( ).

Definition 1 Let
A ⊂ A be a set of atoms. We say that A is consistent if there exists at least one labeling such that A( ) ⊆ A .
We will now derive one of our main results, namely that the problem of determining whether a given set of atoms is consistent can be formulated as a 2-satisfiability problem. The 2-satisfiability problem is a well-studied problem in computer science, and several efficient algorithms exists for its solution. This result quite directly leads to Algorithm 1, presented in Sect. 5, for finding a labeling minimizing E ∞ .
For a set A ⊆ A of atoms denote byĀ the complement of A relative to A, that is,Ā := A\A . Then A is consistent if, and only if, there exists a labeling such that A( ) ∩Ā = ∅. We will show that the existence of such labeling can be determined by solving a 2-satisfiability problem.
For this, let's treat any vertex v ∈ V of our graph as a variable of propositional calculus, that is, a variable that can take two possible values: TRUE, which will be identified with number 1, and FALSE, which will be identified with 0. Upon such identification, any labeling : V → {0, 1} can be treated as a truth functional. Now, with any unary atom A = { s, i }, with i ∈ {0, 1}, we associate a propositional calculus formula in a very simple format known as literal (i.e., a variable or its negation): Less formally, but more concisely, ψ A (s) := "s = i." Notice that : V → {0, 1} disagrees with A if, and only if, ψ A is satisfied by treated as a truth functional.

Similarly, for every binary atom
Also, : V → {0, 1} disagrees with every A ∈ A if, and only if, ψ A is satisfied by . Notice also that the formula ψ A is in the so-called 2-conjunctive normal form, that is, it is a conjunction of formulas ψ A i , each of which is a disjunction of at most two literals. The above discussion leads to the following result.

Theorem 2 A set A ⊆ A of atoms is consistent if, and only if, the 2-satisfiability problem for a formula ψĀ has a positive solution.
Proof This follows from the equivalence of the following conditions, each consecutive pair of which was argued above.
such that ψĀ is satisfied by . -The 2-satisfiability problem for a formula ψĀ has a positive solution.
Recall that the solution to the 2-satisfiability problem for a formula in the 2-conjunctive normal form that is a conjunction of n 2-disjunctions can be found in O(n) time, using, e.g., the algorithm by Aspvall et al. [3]. Thus, for any set A ⊆ A of atoms, the question can be answered in a linear time with respect to the number n := |Ā | of elements inĀ = A \ A by deciding the satisfiability of ψĀ .

Strict Optimality
In this section, we will introduce a refinement of the L ∞ norm measure. This will help us in the discussion of the two proposed algorithms, which will be introduced in the next two sections.
A potential drawback of the L ∞ -norm is that it does not distinguish between solutions with high or low errors below the maximum error. To resolve this problem, Levi and Zorin introduced, in a 2014 paper [15], the concept of strict minimizers. 6 In this framework, two solutions are compared by ordering all elements (in our case, binary and unary terms) non-increasingly by their local error value and then performing their lexicographical comparison. Formally, using the notation from Sect. 3, let 1 and 2 be two labelings. Furthermore, let A 1 , A 2 , . . . , A k and B 1 , B 2 , . . . , B k be the sequences of all atoms in A( 1 ) and A( 2 ), respectively, each ordered by the decreasing costs of atoms, that is, . We say that 1 precedes 2 lexicographically and denote this as 1 ≺ 2 , provided there exists Definition 2 A labeling is said to be strictly minimal provided for any other labeling .
From this definition, it is clear that any strict minimizer is also an L ∞ -optimal solution. Thus, the set of all strict minimizers is a subset of all L ∞ -norm optimal solutions. In fact, the limit, as p → ∞, of L p -norm minimizers discussed above, is not only an L ∞ -minimizer but also a strict minimizer [15]. (For the local cost functions satisfying the property (D), it was proved earlier, in a 2012 paper [6] of Ciesielski et al.) 7 The above discussion indicates that it would be desirable to have an efficient algorithm that not only finds L ∞minimizers, but also strict minimizers. Unfortunately, in the general setting that we examine here, the problem of finding strict minimizers is NP-hard. We will show this at the end of this section. Nevertheless, there are two special situations in which efficient algorithms for finding strict minimizers do exist. The first case is described in the next subsection. The 6 See also the 2010 paper by Ciesielski and Udupa [5] where strict optimization was earlier considered in a similar setting. 7 Specifically, [6, theorem 5.3] states that for q > 0 large enough we have P q (S, T ) =P max (S, T ), where parameters S and T indicate that the unary local cost maps ensure that for any optimal label we have S ⊂ −1 (1) and T ⊂ −1 (0) (i.e., ψ s (i) = ∞ if, and only if, either i = 0 and s ∈ S or else i = 1 and s ∈ T ), P q (S, T ) is the set of all labelings minimizing E q , whileP max (S, T ) is the set of all strictly optimal labelings. second one, discussed in Sect. 5.1 and solved by the algorithm presented there, is when all local terms have distinct weights.

When all st are p-Submodular for Large
Enough p We will use the following result, that identifies the strict optimality with the optimality with respect to E p for p large enough. For the local costs maps satisfying (D), this was first proved in [ Indeed, using the notation as in the definition of ≺, let i be the smallest such that completing the argument for (8).
To prove the proposition, choose p ≥ δ k Z and labelings 1 and 2 . If 1 is strictly minimal, then either 1 ≺ 2 , in which case (8) . Thus, strict minimality of 1 indeed implied its minimality with respect to E p .
Conversely, if 1 is minimal with respect to E p , then we must have 1 2 , since otherwise we would have 2 ≺ 1 and, by (8), A number p for which the proposition holds is referred to by Wolf et al. [21] as a dominant power. Its existence is proved in that paper; however, no estimate similar to that of δ Z is provided there. The estimate δ Z can be found, in a similar settings, in [6, theorem 5.3]; however, this result does not explicitly relate this number with the lexicographical order.
The proposition immediately implies the next theorem. We observe that in practice, the dominant power p may be large. This may give rise to numerical issues when solving the max-flow/min-cut problem, as each local cost is raised to the power p. The novel algorithms proposed in Sects. 5 and 6 do not suffer from his potential issue.

NP-Hardness of Finding Strict Optimizers
We will now show that, in the general case, the problem of finding strict optimizers is indeed NP-hard. This is justified by an example from Kolmogorov and Zabih [13, Appendix A] that shows that L 1 -optimality for non-submodular energies is NP-hard.
Recall, that the set U of vertices of a graph G = V , E is independent when it contains no two vertices connected by an edge. It is known that the problem of finding maximal independent set of vertices of an arbitrary graph is NP-hard [7, chapter 34].
In the example, associate the following local costs: -for every vertex v of label i, give the cost 1 − i; -for every edge with both vertices of label 1, let the cost be N := |V | + 1; -with any other edge, associate the cost 0.
Notice that the max-cost of any labeling is < N if, and only if, the set U := −1 (1) is independent. Among all labelings associated with an independent U , the max cost is 1. Moreover, the labeling is a strict minimizer when the number of cost 1 atoms for U , which is |V |−|U |, is minimal, that is, when the size of U is maximal.
In other words, if for a graph G we use the local costs assignments as above, then is a strict minimizer if, and only if, U := −1 (1) is a maximal independent set of vertices. So, our problem is indeed NP-hard, similarly as the problem of finding maximal independent set of vertices.

A Quadratic Time Algorithm for Direct
Optimization of E ∞ With these preliminaries in place, we are now ready to introduce a general method for finding a binary labeling that globally optimizes E ∞ . Pseudocode for this method is given in Algorithm 1.  For every k ∈ {0, 1, . . . , n} let H k and L k be the states of H and L, respectively, directly after the kth execution of the loop 2-4. First notice that, for every k ∈ {0, 1, . . . , n}, The above shows that H n ∪ L n = L n is consistent, that is, there exists a labeling : V → {0, 1} so that A( ) ⊆ L n . To finish the proof that = L n is a labeling, we need to show that A( ) = L n .
So see this, first notice that H k+1 ∪ L k+1 ⊆ H k ∪ L k for every k < n. So, A( ) ⊆ L n ⊆ H k ∪ L k . To see that L n ⊆ A( ), assume by way of contradiction that there is an A ∈ L n \ A( ). Then, A is removed from H during some, say kth, execution of line 2. So, A / ∈ H k+1 . Also, if A / ∈ A( ), then H k+1 ∪ L k is consistent, as it contains A( ). Therefore, L k+1 = L k and A / ∈ H k+1 ∪ L k+1 ⊃ L n , a contradiction. This means that A( ) = L n .
Finally, by way of contradiction, assume that = L n does not minimize E ∞ , that is, that there is a labeling with c := E ∞ ( ) < E ∞ ( ). Then, there is an A ∈ A( ) of cost > c. Let k ≤ n be such that A is removed from H during the kth execution of line 2. Then A / ∈ H k+1 . Also, by the ordering of H, we have A( ) ⊂ H k+1 . So, H k+1 ∪ L k is consistent and L k+1 = L k . In particular, A / ∈ H k+1 ∪ L k+1 ⊃ L n = A( ), contradicting the fact that A ∈ A( ).

Atoms with Unique Weights
We say that the atoms (in A) have unique weights provided the map Φ : Our main result here is the following Theorem 5 If the atoms in A have unique weights, then the labeling returned by Algorithm 1 is the unique strict optimizer.
First we prove the uniqueness part of the theorem, in form of the following lemma.

Lemma 1 If the atoms in A have unique weights, then the strictly optimal labeling is unique.
Proof Let 1 and 2 be strictly optimal labelings. We will show that 1 = 2 .
To see this, consider the sequences of the atoms in A( 1 ) and A( 2 ), respectively, each ordered by decreasing cost. Then, since both labelings are strictly optimal, the decreasing sequences of the costs of the atoms in A( 1 ) and A( 2 ) must be identical. However, since every atom has a unique weight, this means that the sets of atoms in A( 1 ) and in A( 2 ) must themselves be identical. In particular A( 1 ) = A( 2 ) and therefore 1 = A( 1 ) = A( 2 ) = 2 , as needed.

Proof of Theorem 5
We will use the same notation as in the proof of Theorem 4. Let and be distinct labelings such that is strictly optimal and, by way of contradiction, assume that Algorithm 1 returns labeling rather than . Fix the sequences A 1 , A 2 , . . . , A m and B 1 , B 2 , . . . , B m of all atoms in A( ) and A( ), respectively, each ordered by the decreasing costs of atoms. By Lemma 1 , we have ≺ . Therefore, there exists an i ∈ {1, 2, . . . , m} such that Let k ≤ n be such that B i is removed from H during the kth execution of line 2. Then, {B 1 , B 2  The requirement in Theorem 5 (and the forthcoming Theorem 7) that all atoms in A have unique weights may appear restrictive, and for real-world problems, this condition may or may not hold. We will therefore now discuss how these theorems may be interpreted when all atoms weights are not unique. First we observe that when all atom weights are not unique, it is straightforward to define a new local cost func-tionΦ with unique weights and such that, for any atoms . Such weights may, e.g., be defined by the following simple procedure: -Fix, by some method (e.g., a sorting algorithm), an increasing order of the atoms in A by weight, i.e., find a map O : By design, all atoms associated with the local costsφ have unique weights and thus running Algorithm 1 (or Algorithm 2 in case of Theorem 7) with these weights will return a strict optimizer with respect to the local costsΦ.
We observe that if the original atom weights are all unique, then the ordering O is also unique and running either of our new algorithms with the new local costsΦ induced by O would yield an identical result as with the original weights. Furthermore, we observe that the procedure above is essentially what happens during the execution of the algorithms: By ordering the max-priority queue H, we are establishing a specific (implementation dependent) ordering of the atoms that is increasing by weight just like the ordering O defined in the procedure above. Thus, even when all atoms do not have unique weights, the algorithms will return labelings that are strictly optimal with respect to some increasing order of the atoms by weight. When all atom weights are not unique, however, this ordering will not be unique but will depend on the specific implementation of the max-priority queue H.

A Quasi-Linear Time Algorithm for Direct Optimization of E ∞ When All Binary Terms are ∞-Submodular
We now present a more efficient algorithm, previously reported in the conference version of this manuscript [16], for the case when all binary terms are ∞-submodular. Superficially, this algorithm is slightly more complicated than Algorithm 1. We emphasize, however, that both algorithms have a very similar structure-starting from the set of all possible atoms, both algorithms iteratively remove one atom at a time until the remaining atoms define a unique labeling. The main difference between the algorithms is the steps taken to ensure the consistency of the set of remaining atoms.

Local Consistency, Incompatible Atoms
We introduce a property of local consistency, which will be used to establish the correctness of our second proposed algorithm. A set of atoms A is said to be locally consistent if, for every vertex s ∈ V and edge {s, t} ∈ E there are i, j ∈ {0, 1} such that the atoms { s, i } and { s, i , t, j } both belong to A (i.e., that A still allows that s will have some label). Clearly, any consistent set of atoms is also locally consistent. However, in general, local consistency does not imply consistency. 8 Furthermore, we introduce the notion of an incompatible atom, which will be needed for the exposition of the proposed algorithm. For a given set of A of atoms, we say that an atom and there exists some edge {v, w} adjacent to v such that Note that a locally consistent set of atoms may still contain incompatible atoms.

The Second Algorithm
We now introduce the proposed algorithm, with quasilinear time complexity, for finding a binary label assignment : V → {0, 1} that globally minimizes the objective function E ∞ given by (1), under the condition that all pairwise terms in the objective function are ∞-submodular. If, additionally, all atoms have unique weights then the labeling returned by the algorithm is also the strict minimizer. Informally, the general outline of the proposed algorithm is as follows: -Start with a set S consisting of all possible atoms and an initially empty set I of atoms identified as incompatible.
(Recall that the total number of atoms is O(|V | + |E|).) -For each atom A, in order of decreasing cost Φ(A): -If A is still in S, and is not the only remaining atom for that vertex/edge, remove A from S. -After the removal of A, S may contain incompatible atoms. Iteratively remove all such incompatible atoms until S contains no more incompatible atoms.
Before we formalize this algorithm, we introduce a specific preordering relation on the atoms A.
With these preliminaries in place, we are now ready to introduce the proposed algorithm, for which pseudocode is given in Algorithm 2. An array A of buckets of atoms, indexed by D = V ∪ E; a list H of atoms; a queue K of vertices/edges such that every vertex in K precedes any edge. insert to K any vertex/edge C adjacent to D: to its top, when C is a vertex and its bottom when C is an edge

Computational Complexity
We now analyze the asymptotic computational complexity of Algorithm 2. First, let η := |A| = 2|V | + 4|E|. In image processing applications the graph G is commonly sparse, in the sense that O(|V |) = O(|E|). In this case, we have O(η) = O(|V |).
Creating the list H requires us to sort all atoms in A. The sorting can be performed in O(η log η) time. In some cases, e.g., if all unary and binary terms are integer valued, the sorting may be possible to perform in O(η) time using, e.g., radix or bucket sort.
We make the reasonable assumption that the following operations can all be performed in O(1) time: -Remove an atom from H.
-Remove an atom from A(D).
-Remove or insert elements in K.
-Given an atom, find its corresponding edge or vertex.
-Given a vertex, find all edges incident at that vertex.
-Given an edge, find the vertices spanned by the edge.
The combined number of the executions of the main loop, lines 3-12, and of the internal loop, lines 7-12, equals to |A|, that is, O(η). This is so, since any insertion of an atom into K requires its prior removal from the list H. If the assumptions above are satisfied, it is easily seen that only O(1) operations are needed between consecutive removals of an atom from H. Therefore, the amortized cost of the execution of the main loop is O(η).
Thus, the total computational cost of the algorithm is bounded by the time required to sort O(η) elements, i.e., at most O(η log η).

) Let
Let 1 = k 1 < · · · < k m be the list of all values of k ∈ {1, . . . , n} such that A k is a proper refinement of A k−1 resulting from the execution of line 6. Note that it is conceivable that the numbers k j and k j+1 are consecutive-this happens when the execution of loop 8-12 directly after the execution of line 5 has been used to create A k j resulted in removal of no atoms from A k j .
The proof of Theorem 6 is based on the following Lemma, for which a proof is given in Appendix Section. Lemma 2 During the execution of Algorithm 2, the following properties hold for every k ≤ n.
contains at least one atom for every D ∈ D. (P2) A k is locally consistent. (P3) A k has no incompatible atoms directly before any execution of line 4.

Proof of Theorem 6
Beside Lemma 2, we still need to argue for two facts. First notice that the algorithm does not stop until all buckets A n [D], D ∈ D, have precisely one element. Thus, since A n is locally consistent, To finish the proof, we need to show that indeed minimizes energy E ∞ . For this, first notice that at any time of the execution of the algorithm, any atom in H is also in . Indeed, these sets are equal immediately after the initialization and we remove from D∈D A[D] only those atoms, that have been already removed from H. Now, let L : V → {0, 1} be a labeling minimizing E ∞ . We claim that the following property holds any time during the execution of the algorithm: Indeed, it certainly holds immediately after the initialization. This cannot be changed during the execution of line 6 when the assumption is satisfied, since then A considered there has just been removed from H ⊃ D∈D A[D] and once again ensuring optimality of .

Theorem 7 If the atoms in A have unique weights, then the labeling returned by Algorithm 2 is the unique strict optimizer.
Proof The uniqueness part of the theorem is already shown in Lemma 1. The rest of the argument is essentially identical to that used in the proof of Theorem 5.

NP-Hardness of Multi-label E ∞ -optimization
We will now show that, for a number of labels K > 2, the problem of finding a labeling that minimizes E ∞ is NP-hard in the general case.
Recall that a K -coloring of a graph is a mapping c : V → {1, 2, . . . , K } such that c(s) = c(t) for every edge {s, t} ∈ E. The K -coloring problem consists of determining whether a given undirected graph admits a K -coloring. Recall also that already 3-coloring problem is NP-complete [7, chapter 34].
To see that optimization of E ∞ is NP-hard for K > 2 labels, consider 3 labelings, where we associate the costs: -for every vertex v the cost of any label assignment is 0; -for any edge with distinct labeling of its vertices the cost is 0; -for any edge with the same labeling of its vertices the cost is 1.
For such assignments, the E ∞ -energy of a labeling is ≤ 0 if, and only if, the labeling is a 3-coloring. The same argument can be repeated also for K > 3. Thus, the problem of E ∞optimization with K > 2 labels is indeed NP-hard.

Conclusions
We have presented two algorithms for finding a binary vertex labeling of a graph that globally minimizes objective functions of the form E ∞ . It is well known that for a limited subclass of such problems, globally optimal solutions can be found by computing an optimal spanning forest on a suitably constructed graph. Such optimal spanning forests can, in turn, be computed using very efficient, greedy algorithms. Despite the fact that this optimum spanning forest approach is commonly used in many image processing applications, the potential and limitations of this method in terms of more general optimization problems are, to the best of our knowledge, largely unexplored. The exact class of maxnorm optimization problems that can be solved using efficient greedy algorithms, or even in polynomial time, has remained unknown. By the introduction of the two proposed algorithms, we show that the class of such problems that can be solved in (low-order) polynomial time is indeed larger than what was previously known. In Table 1, we provide a summary of the various subclasses of the general optimization problem considered in this paper, and algorithms for solving them.
An important observation here is the following: Optimization binary labeling problems with objective functions of the form E 1 frequently occur in image processing and computer vision applications. The max-flow/min-cut approach proposed by Kolmogorov and Zabih [13] still remains one of the primary methods for solving such problems when all pairwise terms are submodular. When the local cost functionals include non-submodular terms, however, the same problem becomes NP-hard. As concluded in our discussion in Sect. 2.1, similar submodularity requirements hold also for the generalized objective functions E p for any finite p. Practitioners looking to solve such optimization problems must therefore first verify that their local cost functional satisfies the appropriate submodularity conditions. If this is not the case, they must resort to approximate optimization methods that may or may not produce satisfactory results for a given problem instance. Here we show, by the introduction of Algorithm 1, that in the limit as p goes to infinity, the requirement for submodularity of the pairwise terms disappears. Indeed Algorithm 1 returns, in low-order polynomial time, a E ∞minimal binary labeling for any local cost functional. Thus, even when the local costs are such that the problem of minimizing E p is NP-hard for some or all finite p, a labeling minimizing E ∞ can be found in low order polynomial time.
The motivation for our work comes from image processing applications, and the local cost functionals we consider naturally occurs in many image processing problems. The two proposed algorithms, however, are formulated for general graphs and may thus also have applications to other applied problems in computer science. Structurally, both the proposed algorithms resemble Kruskal's algorithm [7,14], and in this sense the proposed algorithms can be seen as generalizations of the optimum spanning forest approach to optimization. Algorithm 1 has quadratic time complexity and is thus less efficient than Algorithm 2. It appears likely, however, that the time complexity of Algorithm 1 could be reduced further. Specifically, Algorithm 1 operates by solving a series of n 2-satisfiability problem. In the proposed algorithm each such problem is solved in isolation, but we observe that there is a high degree of similarity between each consecutive problem-each 2-satisfiability problem differs from the previous one only by the introduction of one additional disjunction of two literals. Exploring whether this redundancy can be utilized to formulate a more efficient version of Algorithm 1 is an interesting direction for future work.
Another natural extension of the work presented here is to consider optimization with more than two labels. In Sect. 7, we showed that for more than two labels finding a labeling that is optimal according to E ∞ is NP-hard in the general case. Nevertheless, as can be seen in Table 1, there are special cases of multilabel max-norm problems that can be solved using Prim's algorithm. Determining the class of multilabel problems that can be solved in low-order polynomial time is an interesting direction for future work.
At first glance, the restriction to binary labeling may appear very limiting. We note, however, that many successful methods for approximate multi-label optimization rely on iteratively minimizing binary labeling problems via move-making strategies [4]. Thus, the ability to find optimal solutions for problems with two labels potentially has a high relevance also for the multi-label case.
Acknowledgements Open access funding provided by Uppsala University. The authors would like to thank Robin Strand for valuable discussions on the ideas presented in this manuscript.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.

Appendix: Proof of Lemma 2
In this appendix, we provide a proof of Lemma 2. It is enough to prove that if for some κ ≤ n the properties (P0)-(P3) hold for every k < κ, then they also hold for κ. Clearly, these properties hold immediately after the execution of line 2, that is, for κ = 0. So, we can assume that κ > 0. We need to show that (P0)-(P3) are preserved by each operation of the algorithm. More specifically, by the execution of lines 6 or 10, since the status of each of these properties can change only when an atom is removed from A during their execution.
Proof of (P0) Fix an edge D = {v, w} and assume that (P0) holds for this D and all k < κ. Now, if A κ−1 [D] has less than 4 elements, then by the inductive assumption it must be already missing either { v, 1 , w, 0 } or { v, 0 , w, 1 }, and so the same will be true for A κ [D], as needed. So, assume that A κ−1 [D] has still all 4 elements. This means that these 4 elements are present in H and, by (2) Proof of (P1)-(P3) This will be proved by the simultaneous induction on κ.
(P1) must be preserved by the execution of line 10, by the inductive assumption (P2) that A κ−1 is locally consistent. It also cannot be destroyed by the execution of line 6, since this is prevented by the condition of line 5. Thus, A κ [D] still has the property (P1).
To see (P3) we can assume that κ = k j for some j > 0. Clearly (P3) holds for k = k j−1 . Thus, we need only to show that removal of an atom A in line 6 and consecutive execution of loop 7-12 preserves (P3). Indeed, the potential incompatibility can occur only in relation of the vertices associated with the atoms removed from D∈D A[D]. However, each time such an atom is removed, all adjacent atoms are inserted into the queue K and the execution of the loop 7-12 does not end until all such potential incompatibilities are taken care off.
The proof of the preservation of (P2) is more involved. Let j be the largest such that k j ≤ κ. First notice that if κ = k j , then (P2) holds. Indeed, by the inductive assumptions (P2) and (P3), A κ−1 is locally consistent and has no incompatible atoms. Since A κ = A κ−1 , the bucket A[D] must have contained two or more atoms prior to the removal of A in line 6. Since A κ−1 did not contain any incompatible atoms, A κ = A κ−1 \ {A} must remain locally consistent. So, we can assume that μ := κ − k j is nonzero. We will examine families A k j , A k j +1 , . . . , A k j +μ = A κ .
Let A = A 0 , . . . , A μ be the order in which the atoms were removed from K during of this time execution of loop 8-12. Also, let x 0 , . . . , x μ be the vertices/edges associated with the atoms A 0 , . . . , A μ , respectively. We will show, by induction on ν ≤ μ, the following property (I ν ), which in particular imply that A k j +ν is locally consistent.
To state (I ν ) first notice that if an atom for a vertex v is among x 0 , . . . , x ν−1 , then A k j +ν must contain precisely one of two atoms { v, 0 } and { v, 1 }. (By (P1), it must contain at least one of these atoms). It cannot contain both, since this would mean that no v-atom was removed so far and hence A k j +ν could not have been removed from A k j +ν−1 .) In particular, this means that there is an i v ∈ {0, 1} for which A k j +ν already ensures that the final value of (v) is i v . This means, that We will prove, by induction on ν ≤ μ, that (I ν ) A k j +ν is locally consistent and if vertices v and w are among Of course, this will finish the proof of (P2). Clearly, (I 0 ) holds, as we already shown that A k j is locally consistent, and the other condition is satisfied in void. So, fix ν ∈ {1, . . . , μ} such that (I ξ ) holds for all ξ < ν. We will show that (I ν ) holds as well.
For this, assume first that x ν is an edge {v, w}. We need to show only that A k j +ν remains locally consistent, the other part of (I ν ) being ensured in this case by (I ν−1 ). Since x ν = {v, w}, there must exist a j < ν such that x j is a vertex and x j ∈ {v, w}. For simplicity, we assume that x j = v and that i v = 0; the other cases being similar.
We need to show that A k j +ν , obtained from A k j +ν−1 by removing from it the atoms { v, 1 , w, 0 } and { v, 1 , w, 1 }, cannot be locally inconsistent.
Note that such removal from locally consistent set A k j +ν−1 can potentially influence local consistency of A k j +ν only of {v, w} with respect to the vertices v and w. However, since A k j +ν−1 [{v}] = { v, 0 } , this is also equal to A k j +ν [{v}]. Also, both A k j +ν−1 and A k j +ν must contain either { v, 0 , w, 0 } or { v, 0 , w, 1 }. So, A k j +ν it cannot have local inconsistency of {v, w} with v. Therefore, we must show only that A k j +ν contains no local inconsistency between {v, w} and w.
To see this, first notice that there will be no such inconsistency when Indeed, then A k j −1 [{w}] = { w, i } for some i ∈ {0, 1} and, by the property (P3), A k j −1 ⊃ A k j +μ cannot contain atom { v, 0 , w, 1 − i }. Hence A k j +μ must contain { v, 0 , w, i } and local consistency is preserved.
To finish the argument consider the following three cases. and w cannot be among x 0 , . . . , x ν−1 , since this would contradict the second part of (I ν−1 ). In particular, (9) holds and so local consistency is preserved.
Before we proceed further, note that for every ν ≤ μ, Indeed, by (P3), this clearly holds for ν = 0. Also, if x ν is an edge, then the ordering conditions we imposed on the queue K ensure that the atoms of no other edge can be added to K and subsequently modified, before each vertex (adjacent to x ν ) that can have incompatible atoms with that for x ν is added to K and subsequently modified, so that the potential incompatibilities are removed. Finally, consider x ν being a vertex v. Then we must have had A k j +ν−1 [D]. Also, by (J ν ), such p is unique. Therefore, A k j +ν must be locally consistent, since the only potential local inconsistency in A k j +ν could be between v and {v, w}. But our choice of A k j +ν [{v}] ⊂ A k j +ν−1 [{v}] = { v, 0 , v, 1 } ensures that such inconsistency cannot occur.
Notice also that the second part of (I ν ) holds as well. Indeed, this is satisfied in void when there is no vertex among x 0 , . . . , x ν−1 . So, assume that such vertex exists. Then, w, the second vertex of the above chosen edge x p = D = {v, w}, must be among such x 0 , . . . , x ν−1 . Indeed, if p = 0 then we must have ν = 2 and x 1 = w. Since i w = 0, we must have A k j +ν−1 [{v}] holds. We need to show that the equality A k j + p [D] = { v, 1 , w, 0 } is impossible. Indeed, this would imply that A k j +q−1 [D] ⊂ { v, 1 , w, 0 }, { v, 0 , w, 1 }, { v, 1 , w, 1 }} and using the property (P0), also that A k j +q−1 [D] ⊂ { v, 1 , w, 0 }, { v, 1 , w, 1 } . However, this means that A k j +q−1 already decided the value of λ(v) as 1. Since the value of λ(w) was previously decided, the reasoning as for (J ν ) shows that v should appear already in x 0 , . . . , x q , while q < ν contradicts this. This finishes the proof of (P1)-(P3).