1 Introduction

One of Karp’s 21 NP-complete problems [20], Subset Sum has seen astounding progress over the last few years. Koiliaris and Xu [23], Bringmann [10] and Jin and Wu [19] have presented pseudopolynomial algorithms resulting in substantial improvements over the long-standing standard approach of Bellman [7] and the improvement by Pisinger [31]. Moreover, the latter two algorithms [10, 19] match the SETH-based lower bounds proved in [1]. Additionally, recently there has been progress in the approximation scheme of Subset Sum, the first such improvement in over 20 years, with a new algorithm introduced by Bringmann and Nakos [8], as well as corresponding lower bounds obtained through the lens of fine-grained complexity.

A thoroughly studied special case of Subset Sum is the Partition problem, which asks for a partition of the input set to two subsets such that the difference of their sums is minimum. Any algorithm solving the first applies to the latter, though recent progress [8, 16, 28] has shown that Partition may be solved more efficiently in the approximation setting. On the other hand, regarding exact solutions, no better algorithm has been developed, therefore Subset Sum algorithms remain the state of the art.

The Equal Subset Sum problem, which, given an input set, asks for two disjoint subsets of equal sum, is closely related to Subset Sum and Partition. It finds applications in multiple different fields, ranging from computational biology [12, 13] and computational social choice [24], to cryptography [32], to name a few. In addition, it is related to important theoretical concepts such as the complexity of search problems in the class TFNP [30].

The centerpiece of this paper is the Subset Sum Ratio problem, the optimization version of Equal Subset Sum, which asks, given an input set \(S \subseteq \mathbb {N}\), for two disjoint subsets \(S_1, S_2 \subseteq S\), such that the following ratio is minimized

$$\begin{aligned} \frac{\max \left\{ \sum _{s_i \in S_1} s_i, \sum _{s_j \in S_2} s_j \right\} }{\min \left\{ \sum _{s_i \in S_1} s_i, \sum _{s_j \in S_2} s_j \right\} }. \end{aligned}$$

This problem is known to be NP-hard, and many FPTASes have been proposed over the years [6, 25, 29], all of which rely on some kind of scaling of the input elements. The current state of the art [25] achieves a running time of \(\mathcal{O}(n^4/\varepsilon )\), leaving a significant gap in comparison with known approximation algorithms for the closely related Subset Sum and Partition problems, especially with respect to n. This leads to the natural question of whether we can improve this performance and achieve an FPTAS with a running time \(\mathcal{O}(n^{c_1} / \varepsilon ^{c_2})\), where either \(c_1 < 4\) or \(c_1 + c_2<5\). We manage to answer both questions in the affirmative, by presenting a novel approximation scheme which utilizes exact or approximate Partition algorithms and achieves running timeFootnote 1\(\tilde{\mathcal{O}}(n^{2.3}/\varepsilon ^{2.6})\) or \(\tilde{\mathcal{O}}(n^{2}/\varepsilon ^3)\) respectively. Our proposed algorithm significantly differs from previous approaches while it is the first to associate these closely related problems.

1.1 Related work

Equal Subset Sum as well as its optimization version called Subset Sum Ratio[6] are closely related to problems appearing in many scientific areas. Some examples include the Partial Digest problem, which comes from computational biology [12, 13], the allocation of individual goods [24], tournament construction [22], and a variation of Subset Sum, called Multiple Integrated Sets SSP, which finds applications in the field of cryptography [32]. Furthermore, it is related to important concepts in theoretical computer science; for example, a restricted version of Equal Subset Sum lies in a subclass of the complexity class \(\textsf{TFNP}\), namely in \(\textsf{PPP}\) [30], a class consisting of search problems that always have a solution due to some pigeonhole argument, and no polynomial time algorithm is known for this restricted version.

Equal Subset Sum has been proven NP-hard by Woeginger and Yu [33] (see also the full version of [27] for an alternative proof) and several variations have been proven NP-hard by Cieliebak et al. [11, 14]. A 1.324-approximation algorithm has been proposed for Subset Sum Ratio in [33] and several FPTASes appeared in [6, 25, 29], the fastest so far being the one in [25] of complexity \(\mathcal{O}(n^4/\varepsilon )\), the complexity of which also applies to various meaningful special cases, as shown in [26].

As far as exact algorithms are concerned, recent progress has shown that Equal Subset Sum can be solved probabilistically in \(\mathcal{O}^{\star }(1.7088^n)\) time [27], faster than a standard “meet-in-the-middle” approach yielding an \(\mathcal{O}^{\star }(3^{n/2}) \le \mathcal{O}^{\star }(1.7321^n)\) time algorithm.

These problems are tightly connected to Subset Sum, which has seen impressive advances recently, due to Koiliaris and Xu [23] who gave a deterministic \(\tilde{\mathcal{O}}(\sqrt{n}t)\) algorithm, where n is the number of input elements and t is the target, and Bringmann [10] who gave a \(\tilde{\mathcal{O}}(n + t)\) randomized algorithm, which is essentially optimal under SETH [1]. See also [3] for an extension of these algorithms to a more general setting. Jin and Wu subsequently proposed a simpler randomized algorithm [19] achieving the same bounds as [10], which however seems to only solve the decision version of the problem. Recently, Bringmann and Nakos [9] have presented an \(\mathcal{O}\left( |\mathcal {S}_t(Z) |^{4/3} \textrm{poly}(\log t) \right) \) algorithm, where \(\mathcal {S}_t(Z)\) is the set of all subset sums of the input set Z that are smaller than t, based on top-k convolution.

Partition shares the complexity of Subset Sum regarding exact solutions, where the meet-in-the-middle approach [18] from the 70’s remains the state of the art as far as algorithms dependent on n are concerned. On the other hand, one can approximate Partition more efficiently than Subset Sum unless the min-plus convolution conjecture [15] is false. In particular, Bringmann and Nakos [8] have presented the first improvement for the latter in over 20 years, since the scheme of [21] had remained the state of the art. Moreover, in their paper they have shown that developing a significantly better algorithm would contradict said conjecture. Furthermore, they develop an approximation scheme for Partition utilizing min-plus convolution computations, improving upon the recent work of Mucha et al. [28] and circumventing the lower bounds established for Subset Sum in their work. Very recently, Deng, Jin and Mao [16] presented an even faster approximation algorithm for Partition, further widening the gap between the complexities of both problems in the approximation setting.

1.2 Our contribution

We present a novel approximation scheme for the Subset Sum Ratio problem, which, depending on the relationship between n and \(\varepsilon \), improves upon the best existing approximation scheme of [25]. Our algorithm significantly differs from previous approaches, which in most cases rely on some kind of scaling of the input elements, and instead makes use of either exact or approximation algorithms for Partition. In particular, we first partition the input elements into small and large, and then prove that we can either easily find an approximate solution involving only large elements or there are at most \(\log (n / \varepsilon ^2)\) of them. In the latter case, in order to approximate Subset Sum Ratio it suffices to solve instances of Partition on all the subsets of large elements, i.e., polynomially many instances, each of size at most \(\log (n / \varepsilon ^2)\). By leveraging known Partition algorithms in the second case, we manage to improve upon previous FPTASes. In the case of exact computations, we show that by employing such a Partition algorithm of complexity \(\mathcal{O}^{\star }(2^{\alpha n})\), our proposed scheme runs in time \(\tilde{\mathcal{O}}( n \cdot (n / \varepsilon ^2)^{\log (1 + 2^\alpha )})\), for some constant \(\alpha > 0\). It is already known that such an algorithm exists for \(\alpha = 1/2\) [18], and any further improvements will positively affect our FPTAS. On the other hand, using the approximation algorithm of Kellerer et al. [21] we achieve a running time of \(\tilde{\mathcal{O}}(n^2 / \varepsilon ^3)\), while any improvement over it (e.g., [8, 16]) will only affect polylogarithmic factors of our scheme, as is further discussed in Sect. 5.

We start by presenting some necessary background in Sect. 2. Afterward, in Sect. 3 we introduce an FPTAS for a restricted version of the problem. Then, in Sect. 4, we explain how to make use of the algorithm presented in Sect. 3, in order to obtain an approximation scheme for the Subset Sum Ratio problem. The complexity of the final scheme is thoroughly analyzed in Sect. 5, followed by some possible directions for future research in Sect. 6.

Prior work In the current paper we improve upon the results of the preliminary version [2], by using approximate and exact Partition algorithms instead of Subset Sum computations.

2 Preliminaries

Let, for \(x \in \mathbb {N}\), \([x] = \left\{ z \in \mathbb {N} \mid 1 \le z \le x \right\} \) denote the set of integers in the interval [1, x]. Given a set \(S \subseteq \mathbb {N}\), denote its largest element by \(\max (S)\) and the sum of its elements by \(\Sigma (S) = \sum _{s \in S} s\). If we are additionally given a value \(\varepsilon \in (0, 1)\), define the following partition of its elements:

  • The set of its large elements as \(L(S, \varepsilon ) = \left\{ s \in S \mid s \ge \varepsilon \cdot \max (S) \right\} \). Note that \(\max (S) \in L(S, \varepsilon )\), for any \(\varepsilon \in (0, 1)\).

  • The set of its small elements as \(M(S, \varepsilon ) = \left\{ s \in S \mid s < \varepsilon \cdot \max (S) \right\} \).

In the following, since the values of the associated parameters will be clear from the context, they will be omitted and we will refer to these sets simply as L and M.

Definition 1

(Partition) Given a set X, compute a subset \(X^*_p \subseteq X\), such that \(\Sigma (X^*_p) = \max \left\{ \Sigma (Z) \mid Z \subseteq X, \Sigma (Z) \le \Sigma (X) / 2 \right\} \). Moreover, let \(\overline{X^*_p} = X {\setminus } X^*_p\).

Definition 2

(Approximate Partition, from [28]) Given a set X and error margin \(\varepsilon \), compute a subset \(X_p \subseteq X\) such that \((1 - \varepsilon ) \cdot \Sigma (X^*_p) \le \Sigma (X_p) \le \Sigma (X^*_p)\). Moreover, let \(\overline{X_p} = X \setminus X_p\).

3 Scheme for a restricted version

In this section, we present an FPTAS for the constrained version of the Subset Sum Ratio problem where we are only interested in approximating solutions that involve the largest element of the input set. In other words, one of the subsets of the optimal solution contains \(\max (A) = a_n\) (assuming that \(A = \left\{ a_1, \ldots , a_n \right\} \) is the sorted input set); let \(r_{\text { opt}}\) denote the subset sum ratio of such an optimal solution. Our FPTAS will return a solution of ratio r, such that \(1 \le r \le (1 + \varepsilon ) \cdot r_{\text { opt}}\), for a given error margin \(\varepsilon \in (0, 1)\); however, we allow that the sets of the returned solution do not necessarily satisfy the aforementioned constraint (i.e., \(a_n\) may not be involved in the approximate solution).

3.1 Outline of the algorithm

We now present a rough outline of Algorithm 1:

  • At first, we search for approximate solutions involving exclusively large elements from \(L(A,\varepsilon )\).

  • To this end, we produce the subset sums formed by these large elements. If their number exceeds \(n / \varepsilon ^2\), then we can find an approximate solution.

  • Otherwise, there are at most \(n / \varepsilon ^2\) subsets of large elements. In this case, we can find a solution by running an exact or an approximate Partition algorithm for each subset.

  • In the case that the optimal solution involves small elements, we show that it suffices to add elements of \(M(A, \varepsilon )\) in a greedy way.

Algorithm 1
figure a

ConstrainedSSR(\(A, \varepsilon , T\))

3.2 Solution involving exclusively large elements

We firstly search for a \((1 + \varepsilon )\)-approximate solution with \(\varepsilon \in (0,1)\), without involving any of the elements that are smaller than \(\varepsilon \cdot a_n\). Let \(M = \left\{ a_i \in A \mid a_i < \varepsilon \cdot a_n \right\} \) be the set of small elements and \(L = A {\setminus } M = \left\{ a_i \in A \mid a_i \ge \varepsilon \cdot a_n \right\} \) be the set of large elements.

After partitioning the input set, we split the interval \([0, n \cdot a_n]\) into smaller intervals, called bins, of size \(l = \varepsilon ^2 \cdot a_n\) each, as depicted in Fig. 1.

Fig. 1
figure 1

Split of the interval \([ 0, n \cdot a_n ]\) to bins of size l

Thus, there are a total of \(B = n / \varepsilon ^2\) bins. Notice that each possible subset of the input set will belong to a respective bin constructed this way, depending on its sum. Additionally, if two sets correspond to the same bin, then the difference of their subset sums will be at most l.

The next step of our algorithm is to generate all the possible subset sums, occurring from the set of large elements L. The complexity of this procedure is \(\mathcal{O}\left( 2^{|{L}|} \right) \), where \(|{L}|\) is the cardinality of set L. Notice however, that it is possible to bound the number of the produced subset sums by the number of bins B, since if two sums belong to the same bin they constitute a solution, as shown in Lemma 1, in which case the algorithm terminates in time \(\mathcal{O}(n / \varepsilon ^2)\).

Lemma 1

If two subsets correspond to the same bin, we can find a \((1 + \varepsilon )\)-approximation solution.

Proof

Suppose there exist two sets \(L_1, L_2 \subseteq L\) whose sums correspond to the same bin, with \(\Sigma (L_1) \le \Sigma (L_2)\). Notice that there is no guarantee regarding the disjointness of said subsets, thus consider \(L'_1 = L_1 \setminus L_2\) and \(L'_2 = L_2 {\setminus } L_1\), for which it is obvious that \(\Sigma (L_1') \le \Sigma (L_2')\). Additionally, assume that \(L'_1 \ne \emptyset \). Then it holds that

$$\begin{aligned} \Sigma (L_2') - \Sigma (L_1') = \Sigma (L_2) - \Sigma (L_1) \le l. \end{aligned}$$

Therefore, the sets \(L'_1\) and \(L'_2\) constitute a \((1+\varepsilon )\)-approximation solution, since

$$\begin{aligned} \frac{\Sigma (L_2')}{\Sigma (L_1')}&\le \frac{\Sigma (L_1') + l}{\Sigma (L_1')} = 1 + \frac{l}{\Sigma (L_1')}\\&\le 1 + \frac{\varepsilon ^2 \cdot a_n}{\varepsilon \cdot a_n} = 1 + \varepsilon \end{aligned}$$

where the last inequality is due to the fact that \(L_1' \subseteq L\) is composed of elements \(\ge \varepsilon \cdot a_n\), thus \(\Sigma (L_1') \ge \varepsilon \cdot a_n\).

It remains to show that \(L'_1 \ne \emptyset \). Assume that \(L'_1 = \emptyset \). This implies that \(L_1 \subseteq L_2\) and since we consider each subset of L only once and the input is a set and not a multiset, it holds that \(L_1 \subset L_2 \implies L'_2 \ne \emptyset \). Since \(L_1\) and \(L_2\) correspond to the same bin, it holds that

$$\begin{aligned} \Sigma (L_2) - \Sigma (L_1) \le l \implies \Sigma (L_2') - \Sigma (L_1') \le l \implies \Sigma (L'_2) \le l \end{aligned}$$

which is a contradiction, since \(L'_2\) is a non empty subset of L, which is comprised of elements greater than or equal to \(\varepsilon \cdot a_n\), hence \(\Sigma (L'_2) \ge \varepsilon \cdot a_n > \varepsilon ^2 \cdot a_n = l\), since \(\varepsilon < 1\). \(\square \)

Consider an \(\varepsilon '\) such that \((1 + \varepsilon ')/(1 - \varepsilon ') \le 1 + \varepsilon \) for all \(\varepsilon \in (0, 1)\) (the exact value of \(\varepsilon '\) will be computed in Sect. 5).

If every produced subset sum of the previous step belongs to a distinct bin, then, we can infer that the number of subsets of large elements is bounded by \(n / \varepsilon ^2\). Moreover, we can prove the following lemma.

Lemma 2

If the optimal ratio \(r_{\text { opt}}\) involves sets \(S^*_1, S^*_2\) consisting of only large elements, with \(S^*_1 \cup S^*_2 = S^* \subseteq L\) and \(a_n \in S^*\), then \(\Sigma (\overline{S_p}) / \Sigma (S_p) \le (1 + \varepsilon ) \cdot r_{\text { opt}}\), where \(S_p\) is a \((1 - \varepsilon ')\)-apx solution to the Partition problem on input \(S^*\).

Proof

Assume that \(\Sigma (S^*_1) \le \Sigma (S^*_2)\). Note that sets \(S_1^*, S_2^*\) are also the optimal solution of the Partition problem on input \(S^*\). By running a \((1 - \varepsilon ')\) approximate Partition algorithm on input set \(S^*\), we obtain the sets \(S_1, S_2\) with \(\Sigma (S_1) \le \Sigma (S_2)\), where \(S_1 = S_p\) and \(S_2 = \overline{S_p}\). Then,

$$\begin{aligned} \frac{\Sigma (S_2)}{\Sigma (S_1)}&\le \frac{\Sigma (S^*_2) + \varepsilon ' \cdot \Sigma (S^*_1)}{(1 - \varepsilon ') \Sigma (S^*_1)}\\&\le \frac{\Sigma (S^*_2) + \varepsilon ' \cdot \Sigma (S^*_2)}{(1 - \varepsilon ') \Sigma (S^*_1)}\\&= \frac{1 + \varepsilon '}{1 - \varepsilon '} \cdot \frac{\Sigma (S^*_2)}{\Sigma (S^*_1)}\\&\le (1 + \varepsilon ) \cdot r_{\text { opt}} \end{aligned}$$

where we used the fact that \((1 - \varepsilon ') \cdot \Sigma (S^*_1) \le \Sigma (S_1)\) as well as \(\Sigma (S_2) \le \Sigma (S^*_2) + \varepsilon ' \cdot \Sigma (S^*_1)\). \(\square \)

Therefore, we have proved that when the optimal solution consists of sets comprised of only large elements, it is possible to find a (\(1 + \varepsilon \))-approximation solution for the constrained Subset Sum Ratio problem by running a \((1 - \varepsilon ')\)-approximation algorithm for Partition with input the union of said large elements. In order to do so, it suffices to consider as input all the \(2^{|{L}|-1}\) subsets of L containing \(a_n\) and each time run a \((1 - \varepsilon ')\)-approximation Partition algorithm. The total cost of this procedure will be thoroughly analyzed in Sect. 5 and depends on the algorithm used.

It is important to note that by utilizing an (exact or approximation) algorithm for Partition, we establish a connection between the complexities of Partition and approximating Subset Sum Ratio in a way that any future improvement in the first carries over to the second.

3.3 General \((1+\varepsilon )\)-approximate solutions

Whereas we previously considered optimal solutions involving exclusively large elements, here we will search for approximations for those optimal solutions that use all the elements of the input set, hence include small elements, and satisfy our constraint (i.e. \(a_n\) belongs to the optimal solution sets). We will prove that in order to approximate those optimal solutions, it suffices to consider only the \((1 - \varepsilon ')\)-apx solutions of the Partition problem corresponding to each subset of large elements and add small elements to them. In other words, instead of considering any two random disjoint subsets consisting of large elementsFootnote 2 and subsequently adding to these the small elements, we can consider only the \((1 - \varepsilon ')\)-approximate solutions to the Partition problem computed in the previous step, ergo, at most \(B = n / \varepsilon ^2\) configurations regarding the large elements. Moreover, we will prove that it suffices to add the small elements to our solution in a greedy way.

Since the algorithm has not detected a solution so far, due to Lemma 1 every computed subset sum of set L belongs to a different bin. Thus, their total number is bounded by the number of bins B, i.e.

$$\begin{aligned} 2^{|{L}|} \le \left( \frac{n}{\varepsilon ^2} \right) \iff |{L}| \le \log \left( \frac{n}{\varepsilon ^2} \right) \end{aligned}$$

We proceed by additionally involving small elements into our solutions in order to reduce the difference between the sums of the sets, thus reducing their ratio.

Lemma 3

Assume that we are given the \((1 - \varepsilon ')\)-apx solutions for the Partition problem on every subset of large elements containing \(a_n\). Then, a \((1 + \varepsilon )\)-approximation solution for the constrained version of Subset Sum Ratio can be found, when the optimal solution involves small elements.

Proof

Let \(S^*_1,S^*_2\) be disjoint subsets that form an optimal solution for the constrained version of Subset Sum Ratio, where:

  • \(\Sigma (S^*_1) \le \Sigma (S^*_2)\) and \(a_n \in S^* = S^*_1 \cup S^*_2\).

  • \(S^*_1 = L^*_1 \cup M^*_1\) and \(S^*_2 = L^*_2 \cup M^*_2\), where \(L^*_1, L^*_2 \subseteq L\) and \(M^*_1, M^*_2 \subseteq M\).

  • \(M^*_1 \cup M^*_2 \ne \emptyset \).

Moreover, let \(L^*_p\) and \(\overline{L^*_p}\) be the optimal solution of the Partition problem on input \(L^* = L_1^* \cup L_2^*\), while \(L_p\) and \(\overline{L_p}\) be the sets returned by a \((1 - \varepsilon ')\)-apx algorithm. Then, it holds that:

  • \(\Sigma (L^*_p) \le \Sigma (\overline{L^*_p})\) and \(\Sigma (\overline{L^*_p}) - \Sigma (L^*_p) \le |{\Sigma (L^* {\setminus } X) - \Sigma (X)}|, \forall X \subseteq L^*\).

  • \((1 - \varepsilon ') \cdot \Sigma (L^*_p) \le \Sigma (L_p) \le \Sigma (L^*_p)\).

  • \(\Sigma (\overline{L^*_p}) \le \Sigma (\overline{L_p}) \le \Sigma (\overline{L^*_p}) + \varepsilon ' \cdot \Sigma (L^*_p) \le (1 + \varepsilon ') \cdot \Sigma (\overline{L^*_p})\).

  • \(a_n \le \Sigma (\overline{L^*_p})\), since \(a_n \in L^*\).

Case 1. Suppose that \(\Sigma (L_p) + \Sigma (M) \ge \Sigma (\overline{L_p})\). In this case, there exists k such that \(M_k = \left\{ a_i \in M \mid i \in [k] \right\} \subseteq M\) and \(0 \le \Sigma (L_p \cup M_k) - \Sigma (\overline{L_p}) \le \varepsilon \cdot a_n\), since all elements of M have value less than \(\varepsilon \cdot a_n\). Hence,

$$\begin{aligned} 1 \le \frac{\Sigma (L_p \cup M_k)}{\Sigma (\overline{L_p})} \le 1 + \frac{\varepsilon \cdot a_n}{\Sigma (\overline{L_p})} \le 1 + \frac{\varepsilon \cdot a_n}{a_n} = 1 + \varepsilon . \end{aligned}$$

Case 2. Alternatively, it holds that \(\Sigma (L_p) + \Sigma (M) < \Sigma (\overline{L_p})\). Then,

$$\begin{aligned} \frac{\Sigma (\overline{L_p})}{\Sigma (L_p \cup M)}&= \frac{\Sigma (\overline{L_p})}{\Sigma (L_p) + \Sigma (M)}\\&\le \frac{(1 + \varepsilon ') \cdot \Sigma (\overline{L^*_p})}{ (1 - \varepsilon ') \cdot \Sigma (L^*_p) + \Sigma (M)}\\&\le \frac{1 + \varepsilon '}{1 - \varepsilon '} \cdot \frac{\Sigma (\overline{L^*_p})}{\Sigma (L^*_p) + \Sigma (M)}\\&\le (1 + \varepsilon ) \cdot \frac{\Sigma (\overline{L^*_p})}{\Sigma (L^*_p) + \Sigma (M)}. \end{aligned}$$

If \(\Sigma (L^*_p) + \Sigma (M) \ge \Sigma (\overline{L^*_p})\), then it follows that \(\frac{\Sigma (\overline{L_p})}{\Sigma (L_p \cup M)} \le 1 + \varepsilon \). On the other hand, if \(\Sigma (L^*_p) + \Sigma (M) < \Sigma (\overline{L^*_p})\), then it follows that \(\Sigma (S_1^*) = \Sigma (L^*_p \cup M)\) and \(\Sigma (S_2^*) = \Sigma (\overline{L^*_p})\), therefore \(\frac{\Sigma (\overline{L_p})}{\Sigma (L_p \cup M)} \le (1 + \varepsilon ) \cdot \frac{\Sigma (S^*_{2})}{\Sigma (S^*_{1})}\). \(\square \)

3.4 Adding small elements efficiently

Here, we will describe a method to efficiently add small elements to our sets. In particular, we search for some k such that \(0 \le \Sigma (L_p \cup M_k) - \Sigma (\overline{L_p}) \le \varepsilon \cdot a_n\), where \(M_k = \left\{ a_i \in M \mid i \in [k] \right\} \). Notice that if \(\Sigma (M) \ge \Sigma (\overline{L_p}) - \Sigma (L_p)\), there always exists such a set \(M_k\), since by definition, each element of set M is smaller than \(\varepsilon \cdot a_n\). In order to determine \(M_k\), we make use of an array of partial sums \(T[k] = \Sigma (M_k)\), where \(k \le |{M}|\). Since T is sorted, we can find k in \(\mathcal{O}(\log |{L}|) = \mathcal{O}(\log n)\) using binary search.

4 Final algorithm

The algorithm presented in the previous section constitutes an approximation scheme for Subset Sum Ratio when one of the solution subsets contains the maximum element of the input set. Thus, in order to solve the Subset Sum Ratio problem, it suffices to run the previous algorithm n times, where n depicts the cardinality of the input set A, while each time removing the max element of A.

In particular, suppose that the optimal solution involves disjoint sets \(S_1^*\) and \(S_2^*\), where \(a_k = \max (S_1^* \cup S_2^*)\). There exists an iteration for which the algorithm considers as input the set \(A_k = \left\{ a_1, \ldots , a_k \right\} \). In this iteration, the element \(a_k\) is the largest element and the algorithm searches for an approximation of the optimal solution for which \(a_k\) is contained in one of the solution subsets. The optimal solution of the unconstrained version of Subset Sum Ratio has this property so the ratio of the approximate solution that the algorithm of the previous section returns is at most \((1 + \varepsilon )\) times the optimal.

Consequently, n repetitions of the algorithm suffice to construct an FPTAS for Subset Sum Ratio. Notice that if at some repetition, the sets returned due to the algorithm of Sect. 3 have ratio at most \(1 + \varepsilon \), then this ratio successfully approximates the optimal ratio \(r_{\text { opt}} \ge 1\), since \(1 + \varepsilon \le (1 + \varepsilon ) \cdot r_{\text { opt}}\), therefore they constitute an approximate solution.

Algorithm 2
figure b

SSR(\(A, \varepsilon \))

5 Complexity

The total complexity of the final algorithm is determined by three distinct operations, over the n iterations of the algorithm:

  1. 1.

    The cost to compute all the possible subset sums occurring from large elements. It suffices to consider the case where this is bounded by the number of bins \(B = n / \varepsilon ^2\), due to Lemma 1.

  2. 2.

    The cost to compute an exact or \((1 - \varepsilon ')\)-apx Partition solution on each subset of large elements. The cost of this operation will be analyzed in the following subsection.

  3. 3.

    The cost to include small elements to the Partition solutions. There are B such solutions, and each requires \(\mathcal{O}(\log n)\) time, and thus the total time required is \(\mathcal{O}\left( \frac{n}{\varepsilon ^2} \cdot \log n \right) \).

5.1 Complexity of partition computations

5.1.1 Using exact partition computations

First, we will consider the case where we compute the optimal solution of the Partition problem. In order to do so, we will use the standard meet-in-the-middle algorithm [18] for Subset Sum, and in the following we analyze its complexity.

Let subset \(L' \subseteq L\) such that \(|{L'}| = k\), and suppose we are given an exact Partition algorithm of complexity \(\mathcal{O}(2^{\alpha k} \cdot k^{\beta })\), for some constants \(\alpha , \beta \). Notice that the number of subsets of L of cardinality k is \(\left( {\begin{array}{c}|{L}|\\ k\end{array}}\right) \) and that \(|{L}| \le \log (n / \varepsilon ^2)\). Then, it holds that

$$\begin{aligned} \sum _{k = 0}^{|{L}|} \left( {\begin{array}{c}|{L}|\\ k\end{array}}\right) \cdot 2^{\alpha k} \cdot k^{\beta }&\le |{L}|^{\beta } \cdot \sum _{k = 0}^{|{L}|} \left( {\begin{array}{c}|{L}|\\ k\end{array}}\right) \cdot 2^{\alpha k}\\&= |{L}|^{\beta } \cdot \left( 1 + 2^\alpha \right) ^{|{L}|}\\&= |{L}|^{\beta } \cdot 2^{|{L}| \log (1 + 2^\alpha )}\\&\le \log ^{\beta } (n / \varepsilon ^2) \cdot (n / \varepsilon ^2)^{\log (1 + 2^\alpha )} \end{aligned}$$

where we used the binomial theorem. By employing the meet in the middle algorithm [18], where \(\alpha = 1/2\) and \(\beta = 1\), it follows that \(\log (1 + 2^\alpha ) = 1.271\ldots < 1.3\). Consequently, the complexity to solve the Partition problem for all the subsets of large elements is

$$\begin{aligned} \mathcal{O}\left( \frac{n^{1.3}}{\varepsilon ^{2.6}} \cdot \log (n / \varepsilon ^2) \right) = \tilde{\mathcal{O}}\left( \frac{n^{1.3}}{\varepsilon ^{2.6}} \right) \end{aligned}$$

5.1.2 Using approximate partition computations

Here we will analyze the complexity in the case we run an approximate Partition algorithm in order to compute the \((1 - \varepsilon ')\)-approximation solutions.

For subset \(L' \subseteq L\), we run an approximate Partition algorithm with error margin \(\varepsilon '\) such that

$$\begin{aligned} \frac{1 + \varepsilon '}{1 - \varepsilon '} \le 1 + \varepsilon \iff \varepsilon ' \le \frac{\varepsilon }{2 + \varepsilon } \end{aligned}$$

and by choosing the maximum such \(\varepsilon '\), it holds that

$$\begin{aligned} \varepsilon ' = \frac{\varepsilon }{2 + \varepsilon } \implies \frac{1}{\varepsilon '} = \frac{2 + \varepsilon }{\varepsilon } = \frac{2}{\varepsilon } + 1 \implies \frac{1}{\varepsilon '} = \mathcal{O}\left( \frac{1}{\varepsilon } \right) \end{aligned}$$

Since there are at most \(n / \varepsilon ^2\) subsets of large elements, we will need to run said algorithm at most \(n / \varepsilon ^2\) times on \(|{L'}| \le |{L}|\) elements and with error margin \(\varepsilon '\).

Note that any approximate Subset Sum algorithm could be used in order to approximate Partition, such as the one presented by Kellerer et al. [21] of complexity \(\mathcal{O}\left( \min \left\{ \frac{n}{\varepsilon }, n + \frac{1}{\varepsilon ^2} \cdot \log (1 / \varepsilon ) \right\} \right) \). In our case, with \(|{L}| = \log (n / \varepsilon ^2)\) and error margin \(\varepsilon '\), the total complexity is

$$\begin{aligned}&\mathcal{O}\left( \frac{n}{\varepsilon ^2} \cdot \min \left\{ \frac{|{L}|}{\varepsilon '}, |{L}| + \frac{1}{(\varepsilon ')^2} \cdot \log (1 / \varepsilon ') \right\} \right) =\\&\mathcal{O}\left( \frac{n}{\varepsilon ^2} \cdot \min \left\{ \frac{\log (n / \varepsilon ^2)}{\varepsilon }, \log (n / \varepsilon ^2) + \frac{1}{\varepsilon ^2} \cdot \log (1 / \varepsilon ) \right\} \right) = \\&\tilde{\mathcal{O}} \left( \frac{n}{\varepsilon ^3} \right) . \end{aligned}$$

Using the state of the art \(\tilde{\mathcal{O}}(n + (1 / \varepsilon )^{1.25})\) algorithm of Deng et al. [16] for approximating Partition, one could, in some cases, further improve the last term of the previous minimum. However, since the Partition instances that we are solving involve \(|{L}| = \log (n / \varepsilon ^2)\) elements, any improvement resulting from said approximation algorithm would only affect polylogarithmic factors. Due to this, the algorithm of Kellerer et al. has a better performance compared to other Partition approximation algorithms, in case we choose to ignore those factors. On the other hand, if one takes them into account, it might be preferable to use the aforementioned algorithm of Deng et al. (always depending on the relation between n and \(\varepsilon \)).

5.2 Total complexity

The total complexity of the algorithm occurs from the n distinct iterations required and depends on the algorithm chosen to find the (exact or approximate) solution to the Partition problem, since all of the presented algorithms dominate the time of the rest of the operations. Thus, by choosing the fastest one (depending on the relationship between n and \(\varepsilon \)), the final complexity is

$$\begin{aligned} \tilde{\mathcal{O}}\left( \min \left\{ \frac{n^{2.3}}{\varepsilon ^{2.6}}, \frac{n^2}{\varepsilon ^3} \right\} \right) \end{aligned}$$

6 Conclusion and future work

The main contribution of this paper, apart from the introduction of a new FPTAS for the Subset Sum Ratio problem, is the establishment of a connection between Partition and approximating Subset Sum Ratio. In particular, our scheme employs Partition computations, and any improvement in the latter will have an effect to its complexity.

Additionally, we establish that the complexity of approximating Subset Sum Ratio, expressed in the form \(\tilde{\mathcal{O}}(n ^{c_1} / \varepsilon ^{c_2})\), has \(c_1 < 2.3\) and \(c_1+c_2 < 5\), which is an improvement over all the previously presented FPTASes for the problem. Moreover, the exponent of n may go down to 2 if we employ approximation Partition algorithms, which is a significant improvement over the \(\mathcal{O}(n^4 / \varepsilon )\) algorithm of [25].

It is important to note however, that there is a distinct limit to the complexity that one may achieve for the Subset Sum Ratio problem using the techniques discussed in this paper, since although of polylogarithmic size, the number of Partition instances required to be solved is \(\mathcal{O}(n^2 / \varepsilon ^2)\) in total. Consequently, an interesting natural question arising from our work, is whether one can further improve the complexity of the problem, possibly developing a \(\mathcal{O}(n^{c_1} / \varepsilon ^{c_2})\) algorithm, where \(c_1 < 2\) or even \(c_1 + c_2 < 4\).

As another direction for future research, we consider the use of exact Subset Sum or Partition algorithms parameterized by a concentration parameter \(\beta \), as described in [4, 5], where they solve the decision version of Subset Sum. See also [17] for a use of this parameter under a pseudopolynomial setting. It would be interesting to investigate whether analogous arguments could be used to solve the optimization version.