Set Function Optimization

This article is an introduction to recent development of optimization theory on set functions, the nonsubmodular optimization, which contains two interesting results, DS (difference of submodular) functions decomposition and sandwich theorem, together with iterated sandwich method and data-dependent approximation. Some potential research problems will be mentioned.


Introduction
In recent development of computer technology, such as wireless networks [1,2], cloud computing [3,4], sentiment analysis [5][6][7][8], and machine learning [9], many nonlinear optimization problems come out with discrete structure. They form a large group of new problems, which belong to a research area, nonlinear combinatorial optimization. The nonlinear combinatorial optimization has been studied for a long time, but recently becomes very active. One of the important fields in this area is the set function optimization. Its development can be roughly divided into three periods.
The first period is before 2000. The research works came mainly from researchers in operations research. Those works are mainly on submodular function optimization, often with monotone nondecreasing property. For any set function f : f is monotone nondecreasing if In this period, major results include following: • Unconstrained submodular minimization can be solved in polynomial time [10][11][12]. • For constrained monotone nondecreasing submodular maximization, it has (1 − 1/e)-approximation with size-constraint [13] or a knapsack constraints [14,15]. • For nonlinear-constrained linear optimization, the linear maximization with k matroid constraints has (1/(k + 1))-approximation [16,17], and the linear minimization with submodular cover constraint, called the submodular cover problem, has (1 + ln γ )-approximation where γ is a number determined by the submodular function defining the constraint [18].
The second period is from 2007 to 2012, the research activity occurs mainly in the theoretical computer science. The major results are about nonmonotone submodular optimization, including submodular maximization with knapsack constraints and matroid constraints [19][20][21] and submodular minimization with size-constraint [22]. Most of them were published in theoretical computer science conferences, such as ACM Symposium on Theory of Computing, IEEE Symposium on Foundations of Computer Science, ACM-SIAM Symposium on Discrete Algorithms, and journals, such as SIAM Journal on Computing.
The third period is starting from 2014. The research is in application-driving. The main focus is on nonsubmodular optimization. In the study of nonsubmodular optimization, we may find four clusters of research efforts.
In this article, we discuss research works on DS functions, especially introduce two surprising results, the DS decomposition and the sandwich theorem together with the iterated sandwich method.

DS Decomposition
The first one is as follows.
Theorem 2.1 [26] Every set function f : 2 X → R can be expressed as the difference of two monotone nondecreasing submodular functions g and h, i.e., f = g − h, where X is a finite set.
To prove this theorem, we first show two lemmas. Lemma 2.1 [25] Every set function f : 2 X → R can be expressed as the difference of two submodular functions g and h, i.e., f = g − h.
Lemma 2.2 [30] Every submodular function g can be expressed as g = p + m where p is a polymatroid function (i.e., a monotone nondecreasing submodular function with p(∅) = 0) and m is a modular function (i.e., for any two sets A and B, m It is easy to verify that m is a modular function. Thus, p is a submodular function. Moreover, , p is monotone nondecreasing. Therefore, p is a polymatroid function. Now, we are ready to prove Theorem 2.1.
Proof of Theorem 2.1 By Lemma 2.1, f can be expressed as f = g − h where g and h are submodular functions. By Lemma 2.2, g and h can be expressed as g = p g + m g and h = p h + m h where p g and p h are polymatroid functions, and m g and m h are modular functions. Therefore, The following is an example of DS functions. Example 2.1 (Profit Maximization [31]) The profit maximization is a problem in social computing. Consider a social network which is a directed graph G = (V , E) with an information diffusion model m. Usually, an information diffusion process consists of discrete steps. Consider every node has two states, active and inactive. Initially, every node is inactive. The process starts to activate a subset of nodes, called seeds. After seeds become active, they can activate their neighbors based on certain rules of model m. The process ends when no node newly becomes active. Let S be the set of seeds and I (S) the set of active nodes at the end of process. Then, maximization of |I (S)| (or E(|I (S)|) when m is a probabilistic model), called the influence spread, is an important problem, called the influence maximization, in many applications of social networks. However, in viral marketing, seeds are often free samples or coupons for a certain product, i.e., distribution of seeds needs cost. Therefore, the objective function of maximization should be the difference of the influence spread and the seed cost, called the profit. When the seed cost is a submodular function with respect to seed set S, the profit becomes a DS function.

Sandwich Theorem
The second surprising result is the following sandwich theorem. Why is this result surprising? To explain this, let us look at a property of modular functions. This lemma indicates that the modular function is a linear set function. Theorem 3.1 contains two different modular functions passing through the same set and one is always smaller than or equal to the other. This phenomenon cannot occur for continuous linear functions. A continuous linear function with n variables can be expressed as an n-dimensional plane in the (n + 1)-dimensional space. A pair of different ndimensional planes with a point in common cannot have a coordinate along which one is always smaller than or equal to the other. Therefore, this theorem states a special property of the set function.
To prove the sandwich theorem, we show two lemmas.

Lemma 3.2 [26]
For any submodular function f : 2 X → R and any set Y ⊆ X , there exists a modular function m u : 2 X → R such that m u f and m u (Y ) = f (Y ).

Proof
Define Clearly, m l is modular and m u (Y ) = f (Y ). Next, we show that m l f . Assume A\Y = { j 1 , · · · , j k }. Then, Therefore,

Lemma 3.3 [26]
For any submodular function f : 2 X → R and any set Y ⊆ X , there exists a modular function m l : 2 X → R such that f m l and f (Y ) = m l (Y ).
Proof Put all elements of X into an ordering X = {x 1 , Clearly m l is modular and

Moreover, for any set
Now, we are ready to prove Theorem 3.1.

Iterated Sandwich Method
Based on the sandwich theorem, we can design following algorithm for min A∈2 X f (A): Iterated Sandwich Method: • Input a set function f : 2 X → R.  ( f (A u ), f (A l ), f (A o )).
-If f (A + ) = f (A), then stop iteration and go to output; else set A ← A + and start a new iteration.

• Output A.
A similar one can be designed for min A∈2 X f (A). What can we say about the solution obtained by this algorithm? Is it a local optimal solution? Oh, let us first explain what is a local optimal solution for the set function optimization.
For a submodular set function f : 2 X → R, the subgradient at set A consists of all linear functions c : . Each linear function c can also be seen as a vector in R X , i.e., a vector c with components labeled by elements in X . The characteristic vector of each subset Y of X is a vector in {0, 1} X such that the component with label x ∈ X is equal to 1 if and only if x ∈ Y . For simplicity of notation, we may use the same notation Y to represent the set Y and its characteristic vector. Then, the subgradient of f at set A can be described as If c, d ∈ ∂ f (A), then for any 0 The extreme point of this convex set can be characterized as follows. Actually, (4.1) is a necessary condition for set A to be a minimum solution.

Theorem 4.2 Let f = g − h be a set function and g and h submodular functions on subsets of X . If set A is a minimum solution (the first type) for min
for any Y ⊆ X . Therefore, for any c ∈ ∂h(A), This means that (4.1) holds. Condition (4.1) is also sufficient for certain minimality.

Theorem 4.3 Suppose A satisfies condition (4.1). Then, for any Y
Now, we come back to the iterated sandwich method. Could the method produce a solution satisfying condition (4.1) surely? It is a problem for further research. However, if we look at a local minimum (the second type) as a set for which adding or removing an element would not decrease the objective function value. A positive answer would be reached with an approach given by [25,26] with a little modification as follows.
What can we say about the approximation performance for the iterated sandwich method? At least, it may produce a solution comparable with the data-dependent approximation described in the next section.

Data-Dependent Approximation
The sandwich method has been used quite often for solving several nonsubmodular optimization problems in the literature [33][34][35][36]. It runs as follows. Sandwich Method: • Input a set function f : 2 X → R.

• Output S.
This method is called a data-dependent approximation algorithm with following guaranteed performance.