1 Introduction

Consider the following problem of assigning requests to processing servers in distributed computing system. We are given a set of \(M\) server machines, each with processing speed \(1/h_j,\,j=1,\ldots ,M\), and a set of \(N\) clients, each with a weight \(w_i,\,i = 1,\ldots ,N\). Parameter \(h_j\) is the time of single request processing on \(j\)th server. A weight \(w_i\) represents the amount of client’s demand for processing (number of requests sent in a unit of time), where client itself can be interpreted either as a single application or whole local area network, issuing a continuous stream of requests. The processing requirement after assigning \(i\)th client to \(j\)th machine is \(c_{ij} = w_ih_j\). However, when multiple clients share the same machine, the order of request processing is unknown, thus each client is interested in minimal worst-case processing time (e.g. due to the quality of service requirements). Hence the waiting (completion) time of \(i\)th client’s batch of requests is defined as \(C_i = \sum _{j=1}^M x_{ij} \sum _{k=1}^N x_{kj} c_{kj}\), where \(x_{ij}=1\) if \(i\)th client is assigned to \(j\)th server, and \(x_{ij}=0\) otherwise. It is assumed that demand of each client is unsplittable.

The objective is to assign all clients in such a way so as to minimize the sum of completion times, \(\sum _{i=1}^N C_i\). The quadratic integer programming formulation is:

$$\begin{aligned} \text{ minimize } \sum _{i=1}^N \sum _{j=1}^M \left( h_j x_{ij} \sum _{k=1}^N w_k x_{kj} \right) = \sum _{i=1}^N \sum _{j=1}^M \sum _{k=1}^N c_{kj} x_{ij} x_{kj} \end{aligned}$$
(1)

subject to:

$$\begin{aligned}&\forall _i \sum _{j=1}^M x_{ij} = 1,\end{aligned}$$
(2)
$$\begin{aligned}&\forall _{i,j} x_{ij} \in \{ 0, 1 \}. \end{aligned}$$
(3)

This problem can be seen as a special case of quadratic semi-assignment problem (QSAP) [10, 11], which is well-known to be NP-hard [1], and for which optimal solutions are difficult to compute even for small-sized instances. QSAP is obtained by relaxing the one-to-one constraint in generalized quadratic assignment problem [7]. One specific class of polynomially solvable instances of QSAP has been characterized in [9], where its applications in flight scheduling are discussed. In this paper another class of QSAP instances of practical importance is identified, which also can be solved efficiently, under certain fixed-parameter assumption. In the presented formulation the quadratic terms reflect the increased delays caused by multiple clients competing for single processor [8]. Similar load balancing problems were of the interest in the context of selfish resource allocation in the Internet, where each assigned task experiences the machine completion time [5]. Other applications of the presented problem include specialized variants of correlation clustering [2] and scheduling [6].

Since the instance size of this problem is characterized by two parameters, \(N\) and \(M\), in this paper we show fixed-parameter tractability of (1)–(3). The motivation is based on the observation that in practical applications the number of servers \(M\) is usually much smaller than the number of clients \(N\) needed to be assigned. In general, when also \(M\) is considered as a part of the input, problem (1)–(3) is NP-hard [3].

2 Main results

In this paper we introduce the following equivalent formulation of (1)–(3) to develop the analysis. Given a multiset \(U = \{ w_1,\ldots ,w_N \}\) of positive integers (weights), find such a partition of weights \(S_1, S_2, \ldots , S_M\), that minimizes function:

$$\begin{aligned} f(S_1, S_2, \ldots , S_M) = \sum _{j=1}^M \left( |S_j| h_j \sum _{w_i \in S_j} w_i \right) \end{aligned}$$
(4)

where \(\bigcup _j S_j = U,\,S_i \cap S_j = \emptyset \), for all \(i, j = 1,\ldots , M,\,i \ne j\). In this reformulation of semi-assignment, the partition block sizes \(|S_j|\) and reciprocals of speeds \(h_j\) assume the role of coefficients. Each \(S_j\) represents a server. The presented technique is based on the following simple observation:

Proposition 1

For \(M=2\) the problem (1)–(3) can be solved in polynomial time.

Proof

For two machines the problem can be seen as follows. There is a multiset of weights \(U = \{ w_1, w_2, \ldots , w_N \}\). We need to partition the set of indices \(T = \{ 1, 2, \ldots , N \}\) into two disjoint subsets, \(S\) and \(T - S\), in such a way that the set-function:

$$\begin{aligned} f(S) = h_1 |S| \sum _{i \in S} w_i + h_2 (N-|S|) \sum _{i \in T-S} w_i \end{aligned}$$

is minimized.

Denote \(C = \sum _{i=1}^N w_i\). Let us consider the following family of one-variable linear functions:

$$\begin{aligned} g_k(x) = h_1kx + h_2(N-k) (C-x) = ((h_1 + h_2)k -h_2N)x + h_2(N-k)C. \end{aligned}$$

Clearly, each \(g_k\) coincides with \(f(\cdot )\) on arguments \(x = \sum _{i \in S} w_i\), where \(S\) belongs to subset of \(f\)’s domain given by \(\{ S \subset T : k = |S| \}\), for fixed \(k\). Thus \(f\) can be written as:

$$\begin{aligned} f(S) = \left\{ \begin{array}{l@{\quad }l} h_2NC, &{} S = \emptyset , \\ h_1NC, &{} S = T, \\ g_k(x), &{} x \in A_k, \quad \text{ for }\, k = |S| = 1, \ldots , N-1, \end{array}\right. \end{aligned}$$

where

$$\begin{aligned} A_k = \left\{ x = \sum _{i \in S} w_i : S \subset T,\quad k = |S|\right\} \!. \end{aligned}$$

If \(\frac{d}{dx} g_k(x) = ((h_1 + h_2)k -h_2N) \ge 0\) then \(g_k\) attains minimum for the smallest \(x \in A_k\), and otherwise for the largest \(x \in A_k\). Such \(x\) can be computed by sorting elements of \(A_k\) and summing \(k\) first weights [\(k\) last, in case of \(((h_1 + h_2)k -h_2N) < 0\)]. Thus minimum of \(f\) can be found by computing \(g_k(\min A_k)\) or \(g_k(\max A_k)\) for all \(k=1, \ldots , N-1\). Since sorting requires \(O(N \log N)\) time, and precomputing all partial sums \(w_1 + \cdots + w_k\) can be accomplished in \(O(N)\) time, thus finding the minimum (or maximum) of \(g_k\) can be performed in constant time for every \(k\). The overall complexity is \(O(N \log N)\). \(\square \)

The above argument shows that the size of the set of feasible solutions, which is exponential in \(N\), can be easily reduced, by leaving only polynomially many potential solutions to check. It turns out that this technique can be generalized. For any fixed \(M \in \mathbb N \) the number of integer solutions of the equation \(k_1 + k_2 + \cdots + k_M = N\) is bounded by \(O(N^{M-1})\) [the asymptotics is in fact \(N^{M-1}/M!(M-1)!\) see [4] for more details on counting integer partitions].

The idea is to search for the solutions on properly constructed hyperplanes in \(\mathbb R ^{m}\). Given a hyperplane \(\sum _{i=1}^m \alpha _i x_i = \gamma \), we can determine its orientation by analyzing signs of coefficients \(\alpha _i\). For positive \(\alpha _i\) the corresponding coordinate \(x_i\) should be as small as possible, which amounts to taking the sum of smallest weights, since the hyperplane-defining function increases in these directions. On the other hand, for negative \(\alpha _i\) the corresponding \(x_i\) should be as large as possible, which amounts to taking the sum of largest weights, since the hyperplane-defining function decreases in these directions. Utilizing the fact that the sum of coordinates \(\sum _{i=1}^m x_i\) must be equal to the total weight of clients in \(U\), we can reduce the problem of searching for values of \(\mathbf{x}=(x_1, \ldots , x_m)\) on hyperplane in \(\mathbb R ^m\) to a smaller problem of searching on a hyperplane in \(\mathbb R ^{m-1}\), by expressing a variable \(x_j\) in terms of other coordinates. Starting from \(m = M\), and repeating until \(m=1\), we obtain an algorithm which runs in time polynomial in \(N=|U|\).

Theorem 1

For any fixed \(M \in \mathbb N \) the problem (1)–(3) can be solved in polynomial time.

Proof

Since the sets \(S_1, \ldots , S_M\) are disjoint and must cover exactly \(N\) elements \(w_i\), there are polynomially many possible combinations of cardinalities of sets \(S_1, \ldots , S_M\). It is enough to consider one fixed choice of partition blocks sizes. Let us fix \(k_j = |S_j|\), for all \(j=1,\ldots ,M\). Observe that function (4) satisfies:

$$\begin{aligned}&\min \{ f(S_1, \ldots , S_M):|S_j| = k_j, j=1, \ldots , M \} \\&\qquad \quad =\min \left\{ g(\mathbf{x}) = \sum _{j=1}^M h_j k_j x_j :x_j = \sum _{w_i \in S_j} w_i, |S_j| = k_j, j=1,\ldots ,M\!\right\} . \end{aligned}$$

Consider a hyperplane given by equation \(g(\mathbf{x}) = \sum _{j=1}^m \alpha _j x_j - \gamma \). Let us separate the indices of variables \(\mathbf{x}\) into two disjoint sets \(A^{+}\) and \(A^{-}\):

$$\begin{aligned} A^{+} = \left\{ j:\frac{\partial }{\partial x_j} g(\mathbf{x}) \ge 0 \right\} \end{aligned}$$

and \(A^{-} = \{ 1,\ldots ,m \} {\setminus } A^{+}\). Observe that the minimum of (4) lies on the hyperplane \(g(\mathbf{x})\), with \(m=M,\,\alpha _j = h_jk_j\) and \(\gamma = 0\), for some \(\mathbf{x}\) satisfying \(\sum _{i=1}^M x_i = \sum _{w \in U} w\). Clearly, since in directions \(x_j\) for \(j \in A^{+}\) the function \(g\) is nondecreasing, an optimal \(\mathbf{x}\) must attain the smallest possible values on these coordinates. Similarly, for optimal \(\mathbf{x}\) the coordinates \(x_j\) for \(j \in A^{-}\) should be as large as possible, since \(g\) decreases in those directions. Denote \(k_{\min } = \sum _{j \in A^{+}} k_j\) and \(k_{\max } = \sum _{j \in A^{-}} k_j\). Without the loss of generality, let \(w_1 \le w_2 \le \cdots \le w_N\). Since \(k_{\min } + k_{\max } = N\), thus coordinates \(j \in A^{+}\) should sum up to the value of sum of exactly \(k_{\min }\) smallest weights:

$$\begin{aligned} \sum _{j \in A^{+}} x_j = \sum _{i=1}^{k_{\min }} w_i, \end{aligned}$$
(5)

and the remaining coordinates should sum up to the value of sum of \(k_{\max }\) largest weights:

$$\begin{aligned} \sum _{j \in A^{-}} x_j = \sum _{i=N-k_{\max }+1}^{N} w_i. \end{aligned}$$
(6)

Let \(x_{j^{+}}\) be any variable such that \(j^{+} \in A^{+}\), and let \(x_{j^{-}}\) be any variable such that \(j^{-} \in A^{-}\) (at least one of these sets must be nonempty). From (5) and (6) we get:

$$\begin{aligned} x_{j^{+}} = \sum _{i=1}^{k_{\min }}w_i - \sum _{i \in A^{+} {\setminus } \{ j^{+} \}} x_i \end{aligned}$$
(7)

and:

$$\begin{aligned} x_{j^{-}} = \sum _{i=N-k_{\max }+1}^{N}w_i - \sum _{i \in A^{+} {\setminus } \{ j^{-} \}} x_i. \end{aligned}$$
(8)

Substituting \(x_{j^{+}}\) and \(x_{j^{-}}\) into the hyperplane equation \(g(\mathbf{x})\) we obtain an equivalent function \(\hat{g}(\hat{\mathbf{x}})\) of \(m-2\) variables (or \(m-1\) variables, in case if one of the sets \(A^{+}\) and \(A^{-}\) was empty):

$$\begin{aligned} \hat{g}(\hat{\mathbf{x}})&= \sum _{j \in A^{+} {\setminus } \{ j^{+} \} } (\alpha _j - \alpha _{j^+}) x_j + \sum _{j \in A^{-} {\setminus } \{ j^{-} \} } (\alpha _j - \alpha _{j^-}) x_j\nonumber \\&+ \alpha _{j^+} \sum _{i=1}^{k_{\min }}w_i + \alpha _{j^-} \sum _{i=N-k_{\max }+1}^{N}w_i. \end{aligned}$$
(9)

This implies that, assuming optimal \(\hat{\mathbf{x}}\) is known, remaining variables of optimal \(\mathbf{x}\), namely \(x_{j^+}\) and \(x_{j^-}\), can be computed from relations (7) and (8), by backward induction.

Observe that the above reasoning can be applied again to the obtained hyperplane \(\hat{g}({\hat{\mathbf{x}}}) = \sum _{j=1}^{m^{\prime }} \hat{\alpha }_j x_j - \hat{\gamma }\), with parameters \(\hat{\alpha }_j\) and \(\hat{\gamma }\) defined as in Eq. (9). Moreover, the choice of weights is now narrowed for all coordinates \(x_j\) with \(\alpha _j \ge 0\) to the subset of \(k_{\min }-k_{j^+}\) first elements of the sorted sequence of weights, and for all coordinates \(x_j\) with \(\alpha _j < 0\) to the subset of \(k_{\max }-k_{j^-}\) its last elements. Thus the set of remaining weights to allocate has now \(N - k_{j^+} - k_{j^-}\) elements.

After at most \(M-1\) steps we obtain a 1-dimensional hyperplane \(\hat{g}(x) = \hat{\alpha }_1 x - \hat{\gamma }\), which allows computing the final variable \(x\) by inspection, as in the proof of Theorem 1. The whole process can be accomplished in \(\varOmega (N \log N) + O(MN)\) steps [for each of \(O(N^{M-1})\) choices of partition block sizes]. \(\square \)

figure a

Algorithm 1 implements the presented idea. To solve the problem (1)–(3) it is enough to run Algorithm 1 with arguments \((U, M, h_1k_1, \ldots , h_Mk_M, 0)\) subsequently for all allowed partition sizes \(k_1, \ldots , k_M\). Correctness of the routine follows from the following fact:

Proposition 2

Given an instance of problem (1)–(3) and a sequence of integers \(k_1, \ldots , k_M\), Algorithm 1 computes \(M\)-partition of set \(\{ w_1, w_2, \ldots , w_N \}\), such that each \(n\)th block has \(k_n\) elements, and the cost (1) is minimal.

Proof

Suppose \(m \ge 2\), as when \(m=1\) a trivial \(1\)-partition \(S_1 = W\) is returned. Observe that initially all parameters \(\alpha _j = h_jk_j\) are nonnegative. In step 5 the Algorithm 1 finds such \(n\) that \(\alpha _n\) is minimal. This guarantees that in step 7 all resulting parameters \(\alpha _j^{\prime }\) remain nonnegative. These parameters correspond to \(\hat{\alpha }_j = \left( \alpha _j - \alpha _{j^+} \right) \) in (9). Consequently, \(A^- = \emptyset \) and only set \(A^+\) is nonempty in each recursive call (step 13). Thus the current hyperplane is always defined exclusively by nonnegative parameters \(\hat{\alpha }_j\) and \(\hat{\gamma }\).

Due to this, since given \(m\)-dimensional hyperplane determines the allocation of \(\sum _{i=1}^m k_i\) smallest weights, the next hyperplane (of dimension \(m-1\)) determines the allocation of \(\sum _{i=1}^m k_i - k_n\) smallest weights, as \(k_n\) largest of them correspond to the partition block \(S_n = W {\setminus } W^{\prime }\), where \(W^{\prime }\) is as given in step 11. The contents of all but \(n\)th block is determined by calling the procedure recursively for the subsequent hyperplane. This results in computing \((m-1)\)-partition of set \(W^{\prime }\), until \(m=1\). In step 14, after returning from the recursive call, the final partition vector is updated by inserting \(S_n\). \(\square \)

As an example, consider five servers with parameters \(h_j = 2, 1, 5, 3, 1\), respectively, and five clients with demands \(w_i = 5, 3, 1, 2, 2\), respectively. The optimal partition block sizes are \(1, 2, 0, 1, 1\). The algorithm allocates total demand \(13\) to servers \(1,2,4\) and \(5\). In the second iteration variable \(x_5\) is eliminated, leaving total demand \(8\) to allocate among servers \(1,2,4\). Next, \(x_1\) is eliminated, leaving demand \(5\) for servers \(2\) and \(4\). Finally, \(x_2\) is eliminated. This results in optimal solution \(\{ 3 \}, \{ 2, 2 \}, \emptyset , \{ 1 \}, \{ 5 \}\), with value \(22\).

3 Computational experiments

An experimental study has been carried out in order to evaluate the efficiency of Algorithm 1 in practice. The most time-consuming part in computing optimal solution is the searching among all feasible partition sizes \(k_1, k_2, \ldots , k_M\), since their number is bounded by a polynomial of degree \(M-1\), while the routine which computes the partition itself has complexity bounded by \(O(N \log N)\). For relatively small \(M\) the algorithm is very fast, but its performance drops dramatically with the increase of \(M\). This is illustrated in Table 1, where example solutions are presented, along with corresponding partition sizes and respective running times (in seconds). The instances were randomly generated by drawing \(w_i \in (0, 100)\) and \(h_j \in (0,10)\) from uniform distributions.

Table 1 Solutions of example problem instances computed by Algorithm 1 along with their respective running times

4 Conclusions

The class of quadratic semi-assignment problem instances presented in this paper models the task of assigning streams of requests to a group of related server machines. In the presence of unknown exact request schedules, the objective is to minimize the sum of the worst-case processing times, which are equal for a subset of clients sharing the same machine. Although the corresponding generalization is NP-hard, in practical applications the number of servers is usually much smaller than the number of clients to assign. Thus especially important is the case with fixed number of servers, for which exact polynomial time algorithm was given. The demonstrated reformulation of the problem, minimizing sum of partition blocks with block size-dependent weights may also be useful in the design of algorithms for other combinatorial problems involving operations on set partitions.