1 Introduction

Given a matrix \(A\in \mathbb {R}^{m\times n}\) and a vector \(\varvec{b}\in \mathbb {R}^m\), we study the sparsity of solutions to the system of linear equations \(A {\varvec{x}}= {\varvec{b}}\), in variables \(x_1,\ldots ,x_n\) restricted to a structured domain \(D \subseteq \mathbb {R}\). The sparsity of these solutions is quantified via the \(\ell _0\)-norm, which is the size \(\Vert \varvec{x}\Vert _0 := |{\text {supp}}({\varvec{x}})|\) of the support \({\text {supp}}({\varvec{x}}) :=\left\{ i \,:\, x_i \ne 0 \right\} \) of the vector \(\varvec{x}\). The sparsest solutions are optimal solutions of the optimization problem

$$\begin{aligned} \min \left\{ \Vert \varvec{x}\Vert _0 \,:\, A \varvec{x} = \varvec{b}, \ \varvec{x} \in D^n \right\} . \end{aligned}$$
(1)

When \(D=\mathbb {R}\) and \(D=\mathbb {R}_{\ge 0}\), the tight upper bounds on (1) in terms of A are given by the rank of A, which follows from basic linear algebra and the well-known Carathéodory’s theorem from convexity, respectively. Nevertheless, even when \(D=\mathbb {R}\), computation of (1) for given A and \(\varvec{b}\) is NP-hard [26].

The \(\ell _0\)-norm minimization problem (1) is central in the theory of the compressed sensing, where for the classical choice \(D=\mathbb {R}\) an appropriate linear programming relaxation of (1) provides a guaranteed approximation [9, 11, 12]. In the present paper, we deal with the two discrete domains, \(D=\mathbb {Z}\) and \(D=\mathbb {Z}_{\ge 0}\), which are naturally related to the theory of systems of linear Diophantine equations and integer linear programming, respectively.

Sparsity of solutions to linear Diophantine equations is relevant for the theory of compressed sensing for integer-valued signals [17, 18, 24], motivated by many applications in which the signal is known to have integer entries, for instance, in wireless communication [31] and in the theory of error-correcting codes [10]. Support minimization was also investigated in connection to integer optimization [2, 16, 29, 30]. Also, numerous applications to combinatorial optimization problems have been explored. For example, the minimum edge-coloring problem can be seen as finding the sparsest representation in the semigroup generated by the matchings of the graph [13, 25]. Further examples of combinatorial applications can be found in [4] and [16].

Since we know that for \(D=\mathbb {R}\), the sparsity of solution is captured by the notion of the rank of A, we introduce a similar notion with respect to an arbitrary underlying domain D. We define the D-rank of A as

$$\begin{aligned} \max _{\varvec{y}\in D^n} \min \{\Vert \varvec{x}\Vert _0 : A \varvec{x}=A \varvec{y}, \varvec{x}\in D^n\}. \end{aligned}$$

In this respect, note that the computation of the D-rank is a bi-level optimization problem. There is yet another natural generalization of the notion of rank in our setting. For that let \([n]:= \{1,\ldots ,n\}\), let \(\left( {\begin{array}{c}[n]\\ k\end{array}}\right) \) be the set of all k-element subsets of [n], and for \(\gamma \in \left( {\begin{array}{c}[n]\\ k\end{array}}\right) \) let \(A_\gamma \) denote the \(m \times k\) submatrix of A with columns indexed by \(\gamma \). Then the D-complexity of A is defined as the minimal \(k \in \mathbb {Z}_{\ge 0}\) such that there exists a \(\tau \in \left( {\begin{array}{c}[n]\\ k\end{array}}\right) \) for which the following equality holds \(\left\{ A {\varvec{x}} \,:\, {\varvec{x}} \in D^n \right\} = \left\{ A_\tau {\varvec{y}} \,:\, {\varvec{y}} \in D^k \right\} \). It is clear that the D-rank is bounded from above by the D-complexity.

As examples, note that when the domain \(D=\mathbb {R}\), both D-rank and D-complexity coincide with the rank of the matrix A from linear algebra. For \(D=\mathbb {R}_{\ge 0}\), the D-rank is again the regular rank of A, but the D-complexity is in general larger. If the columns of A positively span a pointed cone, then the D-complexity is the number of extreme rays of this cone.

In this paper we specialize the above two functions to the two domains \(D=\mathbb {Z}\) and \(D=\mathbb {Z}_{\ge 0}\), which yields a natural geometric interpretation in terms of lattices and semigroups. First, the matrix A determines the lattice \(\mathcal {L}(A):= \left\{ A \varvec{x} \,:\, \varvec{x} \in \mathbb {Z}^n \right\} \) generated by the columns of A. Secondly, the matrix A determines the semigroup \( {\mathcal {Sg}}(A) := \{ A \varvec{x} : \varvec{x} \in \mathbb {Z}_{\ge 0}^n \}\). Note that this set consists of all right-hand-side vectors \(\varvec{b}\), for which the system \(A \varvec{x} =\varvec{b}, \ \varvec{x} \in \mathbb {Z}_{\ge 0}^n\) of integer-programming constraints on \(\varvec{x}\) is feasible. We obtain the following four functions:

\(\mathsf {ILR}(A)\) (integer linear rank)::

The \(\mathbb {Z}\)-rank of A, i.e., the minimal k s.t. \(\mathcal {L}(A) = \bigcup _{\tau \in \left( {\begin{array}{c}[n]\\ k\end{array}}\right) } \mathcal {L}(A_\tau )\).

\(\mathsf {ICR}(A)\) (integer Carathéodory rank)::

The \(\mathbb {Z}_{\ge 0}\)-rank of A, i.e., the minimal k s.t. \({\text {Sg}}(A) = \bigcup _{\tau \in \left( {\begin{array}{c}[n]\\ k\end{array}}\right) } {\text {Sg}}(A_\tau )\)

\(\mathsf {ILC}(A)\) (integer linear complexity)::

The \(\mathbb {Z}\)-complexity of A, i.e., the minimal k s.t. \(\mathcal {L}(A) = \mathcal {L}(A_\tau )\) holds for some \(\tau \in \left( {\begin{array}{c}[n]\\ k\end{array}}\right) \).

\(\mathsf {ICC}(A)\) (integer Carathéodory complexity)::

The \(\mathbb {Z}_{\ge 0}\)-complexity of A, i.e., the minimal k s.t. \({\text {Sg}}(A) = {\text {Sg}}(A_\tau )\) for some \(\tau \in \left( {\begin{array}{c}[n]\\ k\end{array}}\right) \).

In our results we deal with an integer matrix \(A \in \mathbb {Z}^{m \times n}\), and, without loss of generality, A is assumed to have a full row rank. In this case, we have that the determinant of the lattice \(\mathcal {L}(A)\) is equal to

$$\begin{aligned} \gcd (A):= \gcd \left\{ \det (A_\gamma ) \,:\, \gamma \in \left( {\begin{array}{c}[n]\\ m\end{array}}\right) \right\} . \end{aligned}$$

See for example [30, Section 1.3]. For a general introduction to lattices see [21].

1.1 Bounds for \(\mathsf {ILR}(A)\) and \(\mathsf {ILC}(A)\)

For stating our results, we need several number-theoretic functions. Given \(z \in \mathbb {Z}_{>0}\), consider the prime factorization \(z = p_1^{s_1} \cdots p_k^{s_k}\) with pairwise distinct prime factors \(p_1,\ldots ,p_k\) and their multiplicities \(s_1,\ldots ,s_k \in \mathbb {Z}_{>0}\). Then the number of prime factors \(\sum _{i=1}^k s_i\) counting the multiplicities is denoted by \(\varOmega (z)\). Furthermore, we introduce

$$\begin{aligned} \varOmega _m(z) := \sum _{i=1}^k \min \{s_i,m\}. \end{aligned}$$

That is, by introducing m we set a threshold to account for multiplicities. In the case \(m=1\) we thus have

$$\begin{aligned} \omega (z):=\varOmega _1(z) = k, \end{aligned}$$

which is the number of prime factors in z, not taking the multiplicities into account. The functions \(\varOmega \) and \(\omega \) are called prime \(\varOmega \)-function and prime \(\omega \)-function, respectively, in number theory [23]. We call \(\varOmega _m\) the truncated prime \(\varOmega \)-function.

Theorem 1

Let \(A \in \mathbb {Z}^{m \times n}\) be a matrix of rank m. Then

$$\begin{aligned} \mathsf {ILR}(A)\le \mathsf {ILC}(A) \le m + \min _{\begin{array}{c} \tau \in \left( {\begin{array}{c}[n]\\ m\end{array}}\right) \\ \det (A_\tau )\ne 0 \end{array}} \varOmega _m\left( \frac{|\det (A_\tau )| }{\gcd (A)} \right) . \end{aligned}$$
(2)

One can easily see that the estimates \(\omega (z) \le \varOmega _m(z) \le \varOmega (z) \le \log _2 (z)\) hold for every \(z \in \mathbb {Z}_{>0}\). The estimate using \(\log _2 (z)\) gives a first impression on the size of the bound (2). It turns out, however, that \(\varOmega _m(z)\) is much smaller on the average. Results in number theory [23, §22.10] show that the average values \(\frac{1}{z}(\omega (1) + \cdots + \omega (z))\) and \(\frac{1}{z}(\varOmega (1) + \cdots + \varOmega (z))\) are of order \(\log ( \log (z))\), as \(z \rightarrow \infty \).

In Proposition 1 from Sect. 5, we show that (2) is an optimal bound on both \(\mathsf {ILR}(A)\) and \(\mathsf {ILC}(A)\), in the sense that neither m can be replaced by any smaller constant nor the function \(\varOmega _m\) occurring on the right-hand side can be replaced by any smaller function. Furthermore, as a byproduct of our constructive proof of Theorem 1, we obtain the following.

Corollary 1

Let \(A \in \mathbb {Z}^{m \times n}\) be a matrix of full row rank and let \(A_\tau \) be a non-singular sub-matrix of A, where \(\tau \in \left( {\begin{array}{c}[n]\\ m\end{array}}\right) \). If the system \(A \varvec{x} = \varvec{b}, \, \varvec{x} \in \mathbb {Z}^n\) with a right-hand side \(\varvec{b} \in \mathbb {Z}^m\) is feasible, then this system has a solution \(\varvec{x}\) that satisfies

$$\begin{aligned} \Vert \varvec{x} \Vert _0 \le m + \varOmega _m \left( \frac{| \det (A_\tau )| }{\gcd (A)} \right) . \end{aligned}$$

For the input \(A,\;\tau \) and \(\varvec{b}\) in binary encoding, such a solution \(\varvec{x}\) can be computed in polynomial time.

1.2 Bounds for \(\mathsf {ICR}(A)\) and \(\mathsf {ICC}(A)\)

Theorem 1.1(i) in [3] (see also [2, Theorem 1]) immediately implies the bound

$$\begin{aligned} \mathsf {ICR}(A) \le m + \left\lfloor \log _2 \left( \frac{\sqrt{\det (A A^\top )}}{\gcd (A)} \right) \right\rfloor , \end{aligned}$$
(3)

which is an improvement of [16, Theorem 1(ii)]. Note that

$$\begin{aligned} \det (A A^\top ) = \sum _{\tau \in \left( {\begin{array}{c}[n]\\ m\end{array}}\right) }\det (A_\tau )^2 \end{aligned}$$
(4)

by the Cauchy–Binet formula.

We show that, under natural assumptions on A, we can significantly improve this bound. First, we consider matrices A whose columns positively span \(\mathbb {R}^m\). Theorem 1 can be used in this case to obtain the following result.

Theorem 2

Let \(A \in \mathbb {Z}^{m \times n}\) be a matrix whose columns positively span \(\mathbb {R}^m\). Then

$$\begin{aligned} \mathsf {ICR}(A) \le \mathsf {ICC}(A) \le 2m + \min _{\begin{array}{c} \tau \in \left( {\begin{array}{c}[n]\\ m\end{array}}\right) \\ \det (A_\tau )\ne 0 \end{array}} {\varOmega _m\left( \frac{|\det (A_\tau )| }{ \gcd (A)} \right) }. \end{aligned}$$
(5)

Both the general bound (3) and our bound (5) have the first term linearly depending on m and the second term depending on the \(m \times m\) minors of A scaled by \(\gcd (A)\). Thus, taking into account \(|\det (A_\tau )| \le \sqrt{\det (A A^\top )}\) and \(\varOmega _m(z) \le \log _2 (z)\), we see that the second term in (5) is not larger than the second term in the bound (3): in fact, the second term in (5) is much smaller “on the average”. As for (2), we show in Proposition 2 from Sect. 5 that, under the given assumptions on A, the bound (5) is optimal.

In the knapsack case \(m=1\), the bound (5) strengthens Theorem 1.2 in [3] and, as it was already indicated in the IPCO version of this paper [1], it confirms a conjecture posed in [3, page 247]. Moreover, as a byproduct of the proof of Theorem 2 we obtain the following algorithmic result.

Corollary 2

Let \(A \in \mathbb {Z}^{m \times n}\) be a matrix whose columns positively span \(\mathbb {R}^m\) and let \(A_\tau \) be an \(m \times m\) non-singular sub-matrix of A, where \(\tau \in \left( {\begin{array}{c}[n]\\ m\end{array}}\right) \). If the feasibility problem \(A \varvec{x} = \varvec{b}, \ \varvec{x} \in \mathbb {Z}_{\ge 0}^n\) of integer programming with a right-hand side \(\varvec{b} \in \mathbb {Z}^m\) has a solution, then it has a solution satisfying

$$\begin{aligned} \Vert \varvec{x} \Vert _0 \le 2m + {\varOmega _m\left( \frac{|\det (A_\tau )| }{ \gcd (A)} \right) }\,. \end{aligned}$$

For the input \(A,\;\tau \) and \(\varvec{b}\) in binary encoding, such a solution \(\varvec{x}\) can be computed in polynomial time.

Our next contribution gives an improvement on (3) for the case when the columns of A generate a pointed cone. Given \(\varvec{a}_1,\ldots ,\varvec{a}_n \in \mathbb {R}^m\), we denote by \({\text {cone}}(\varvec{a}_1,\ldots ,\varvec{a}_n)\) the convex conic hull of the set \(\{\varvec{a}_1,\ldots ,\varvec{a}_n\}\). Assume that the matrix \(A = (\varvec{a}_1,\ldots ,\varvec{a}_n) \in \mathbb {Z}^{m \times n}\) with columns \({\varvec{a}}_i\) satisfies the following conditions:

$$\begin{aligned}&\varvec{a}_1,\ldots ,\varvec{a}_n \in \mathbb {Z}^m{\setminus } \{\varvec{0}\}, \end{aligned}$$
(6)
$$\begin{aligned}&{\text {cone}}(\varvec{a}_1,\ldots ,\varvec{a}_n)\,\text {is an}\,m\text {-dimensional pointed cone}, \end{aligned}$$
(7)
$$\begin{aligned}&{\text {cone}}(\varvec{a}_1) \text {is an extreme ray of}\,{\text {cone}}(\varvec{a}_1,\ldots ,\varvec{a}_n). \end{aligned}$$
(8)

Theorem 3

Let \(A = (\varvec{a}_1,\ldots ,\varvec{a}_n) \in \mathbb {Z}^{m \times n}\) satisfy (6)–(8). Then

$$\begin{aligned} \mathsf {ICR}(A) \le m + \left\lfloor \log _2 \left( \frac{q(A)}{\gcd (A)}\right) \right\rfloor , \end{aligned}$$
(9)

where

$$\begin{aligned} q(A):= \sqrt{\sum _{I \in \left( {\begin{array}{c}[n]\\ m\end{array}}\right) \,:\, 1 \in I} \det (A_I)^2}. \end{aligned}$$
(10)

In view of (4), the bound (9) improves on (3) by reducing the sum over all \(I \in \left( {\begin{array}{c}[n]\\ m\end{array}}\right) \) to the sum over those I that satisfy \(1 \in I\). The proof of Theorem 3 will be derived as an extension of the proof of our next result in the setting of the knapsack scenario \(m=1\). In this setting, \(A= \varvec{a}\) is a row vector and the assumption (7) is equivalent to \(\varvec{a} \in \mathbb {Z}_{>0}^{1 \times n} \cup \mathbb {Z}_{<0}^{1 \times n}\). Without loss of generality, one can assume \(\varvec{a} \in \mathbb {Z}_{>0}^{1 \times n}\).

Theorem 4

Let \(\varvec{a} = (a_1,\ldots ,a_n) \in \mathbb {Z}_{>0}^{1 \times n}\). Then

$$\begin{aligned} \mathsf {ICR}(\varvec{a}) \le 1 + \left\lfloor \log _2 \left( \frac{\min \{a_1,\ldots ,a_n\}}{\gcd ({\varvec{a}})} \right) \right\rfloor . \end{aligned}$$
(11)

Similar to (5), we show in Proposition 3 from Sect. 5 that the bounds (9) and (11) are optimal.

1.3 Computational complexity

It is well known that the feasibility problem in integer linear programming is NP-complete (see [32, § 18.2]), which means that testing whether the sparsity optimization problem (1) is feasible is hard in the case \(D = \mathbb {Z}_{\ge 0}^n\). But even in the cases when testing feasibility is tractable, solving (1) is usually hard due to the hardness of the \(\ell _0\)-norm as an objective. For example, computation of (1) is NP-hard for \(D=\mathbb {R}^n\) (see [26]). In Sect. 1.3 we study the complexity of computing our four rank-like functions \(\mathsf {ILR}(A)\), \(\mathsf {ILC}(A)\), \(\mathsf {ICR}(A)\) and \(\mathsf {ICC}(A)\). We would like to emphasize that the complexity analysis of these functions is more intricate than the respective analysis of (1).

Theorem 5

Consider the four problems of verifying \(\mathsf {ILC}(A) \le k\), \(\mathsf {ILR}(A) \le k\), \(\mathsf {ICR}(A) \le k\), and \(\mathsf {ICC}(A) \le k\), for given \(A \in \mathbb {Z}^{m \times n}\) and \(k \in \mathbb {Z}_{>0}\). These problems have the following complexity:

  1. (i)

    \(\mathsf {ILR}(A) \le k\), \(\mathsf {ILC}(A) \le k\), and \(\mathsf {ICC}(A) \le k\) are NP-complete when \(m=1\) and n is a part of the input,

  2. (ii)

    \(\mathsf {ICR}(A) \le k\) is NP-hard when \(m =1\) and n is a part of the input,

  3. (iii)

    \(\mathsf {ILC}(A) \le k\) and \(\mathsf {ICC}(A) \le k\) are strongly NP-complete when m and n are a part of the input,

  4. (iv)

    \(\mathsf {ILR}(A) \le k\) and \(\mathsf {ICR}(A) \le k\) are strongly NP-hard when m and n are a part of the input.

In the case \(m=1\), one might be tempted to compare (1) for \(D=\mathbb {Z}^n\) and \(D=\mathbb {Z}_{\ge 0}^n\) with the integer knapsack optimization problem. For the latter, the linearity of the objective function allows one to use dynamic programming to solve the problem in pseudo-polynomial time. It is however not clear if it is possible to adapt this approach for the case of the \(\ell _0\) objective. This motivates the following problem:

Problem 1

If \(m=1\), can (1) and the functions \(\mathsf {ILR}(A), \mathsf {ILC}(A), \mathsf {ICR}(A)\) and \(\mathsf {ICC}(A)\) be computed in pseudo-polynomial time, i.e., when the input numbers are given in unary encoding rather than binary?

Finally, we want to address the case when the number of variables n is fixed. It is easy to see that the optimization problem (1) can be solved in polynomial time for both \(D=\mathbb {Z}^n\) and \(D=\mathbb {Z}_{\ge 0}^n\). For a fixed n, all \(2^n\) possible choices of the support for the vector \(\varvec{x}\) can be enumerated. For each such choice, the existence of the vector \(\varvec{x}\) with \(A \varvec{x} = \varvec{b}, \ \varvec{x} \in D^n\) and the prescribed support can be checked in polynomial time: for \(D=\mathbb {Z}^n\) one needs so solve a Diophantine system, while for \(D=\mathbb {Z}_{\ge 0}^n\) one uses polynomial-time solvability of integer linear problems in fixed dimension [32, § 18.4]. Rather similarly, one can also establish polynomial-time solvability of \(\mathsf {ILC}(A)\) and \(\mathsf {ICC}(A)\) for a fixed n. In contrast to this, since the D-ranks are related to bi-level programming, the study of the computational complexity of \(\mathsf {ILR}(A)\) and \(\mathsf {ICR}(A)\) in the case of fixed n requires a more involved algorithmic theory, which has been developed only recently. Using recent results in the algorithmic theory of Presburger arithmetic, we obtain the following.

Theorem 6

When n is fixed, for given \(m \in \mathbb {Z}_{>0}\) and \(A \in \mathbb {Z}^{m \times n}\), all four values \(\mathsf {ILR}(A)\), \(\mathsf {ICR}(A)\), \(\mathsf {ILC}(A)\) and \(\mathsf {ICC}(A)\) can be computed in polynomial time.

2 Proofs of Theorem 1 and Corollary 1

The proof of Theorem 1 relies on the theory of finite Abelian groups (see [15] for a general reference). We write Abelian groups additively. An Abelian group G is a direct sum of its finitely many subgroups \(G_1,\ldots ,G_m\), which is written as \(G = \bigoplus _{i=1}^m G_i\), if every element \(x \in G\) has a unique representation as \(x = x_1 + \cdots + x_m\) with \(x_i \in G_i\) for each \(i \in [m]\). A primary cyclic group is a non-zero finite cyclic group whose order is a power of a prime number. We use G/H to denote the quotient of G modulo its subgroup H.

The fundamental theorem of finite Abelian groups states that every finite Abelian group G has a primary decomposition, which is essentially unique. This means, G is decomposable into a direct sum of its primary cyclic subgroups and that this decomposition is unique up to automorphisms of G (see Theorems 3 and 5 in Chapter 5.2 of [15], with further details in 12.1). We denote by \(\kappa (G)\) the number of direct summands in the primary decomposition of G.

For a subset S of a finite Abelian group G, we denote by \(\left\langle S \right\rangle \) the subgroup of G generated by S. We call a subset S of G non-redundant if the subgroups \(\left\langle T \right\rangle \) generated by proper subsets T of S are properly contained in \(\left\langle S \right\rangle \). The following result gives an upper bound on the maximum cardinality of S.

Theorem 7

Let G be a finite Abelian group. Then the maximum cardinality of a non-redundant subset S of G is equal to \(\kappa (G)\).

Even though this result is available in the literature (see, for example, [20, Lemma A.6]), it does not seem to be well known, and we have not found any source containing a complete self-contained proof of this result. Thus, we provide a proof of Theorem 7 in the Appendix, relying only on the basic facts from group theory. We will also need the following lemmas.

Lemma 1

Let G be a finite Abelian group representable as a direct sum \(G = \bigoplus _{j=1}^m G_j\), where the groups \(G_1,\ldots ,G_m\) are cyclic. Then \(\kappa (G) \le \varOmega _m(|G|)\).

Proof

Let us consider the prime factorization \(|G| = p_1^{n_1} \cdots p_s^{n_s}\). Then \(|G_j| = p_1^{n_{i,j}} \cdots p_s^{n_{i,j}}\) with \(0 \le n_{i,j} \le n_i\) and, by the Chinese Remainder Theorem, the cyclic group \(G_j\) can be represented as \(G_j = \bigoplus _{i=1}^s G_{i,j}\), where \(G_{i,j}\) is a cyclic group of order \(p_i^{n_{i,j}}\). Consequently, \(G = \bigoplus _{i=1}^s \bigoplus _{j=1}^m G_{i,j}\). This is a decomposition of G into a direct sum of primary cyclic groups and, possibly, some trivial summands \(G_{i,j}\) equal to \(\{0\}\). We can count the non-trivial direct summands whose order is a power of \(p_i\), for a given \(i \in [s]\). There is at most one summand like this for each of the groups \(G_j\). So, there are at most m non-trivial summands in the decomposition whose order is a power of \(p_i\). On the other hand, the direct sum of all non-trivial summands whose order is a power of \(p_i\) is a group of order \(p_i^{n_{i,1} + \cdots + n_{i,s}} = p_i^{n_i}\) so that the total number of such summands is not larger than \(n_i\), as every summand contributes the factor at least \(p_i\) to the power \(p_i^{n_i}\). Thus, the total number of non-zero summands in the decomposition of G is at most \(\sum _{i=1}^s \min \{m, n_i\} = \varOmega _m(|G|)\). \(\square \)

Lemma 2

Let \(\varLambda \) be a m–dimensional sublattice of \(\mathbb {Z}^m\). Then \(G = \mathbb {Z}^m / \varLambda \) is a finite Abelian group of order \(\det (\varLambda )\) that can be represented as a direct sum of at most m cyclic groups.

Proof

The proof relies on the relationship of finite Abelian groups and lattices, see [32, §4.4]. Fix a matrix \(M \in \mathbb {Z}^{m \times m}\) whose columns form a basis of \(\varLambda \). Then \(|\det (M)| = \det (\varLambda )\). There exist unimodular matrices \(U \in \mathbb {Z}^{m \times m}\) and \(V \in \mathbb {Z}^{m \times m}\) such that \(D:= U M V\) is diagonal matrix with positive integer diagonal entries. For example, one can choose D to be the Smith Normal Form of M [32, §4.4]. Let \(d_1,\ldots ,d_m \in \mathbb {Z}_{>0}\) be the diagonal entries of D. Since U and V are unimodular, \(d_1 \cdots d_m = \det (D) = \det (\varLambda )\).

We introduce the quotient group \(G' := \mathbb {Z}^m / \varLambda ' = ( \mathbb {Z}/ d_1 \mathbb {Z}) \times \cdots \times (\mathbb {Z}/ d_m \mathbb {Z})\) with respect to the lattice \(\varLambda ' := \mathcal {L}(D) = (d_1 \mathbb {Z}) \times \cdots \times (d_m \mathbb {Z})\). The order of \(G'\) is \(d_1 \cdots d_m = \det (D) = \det (\varLambda )\) and \(G'\) is a direct sum of at most m cyclic groups, as every \(d_i > 1\) determines a non-trivial direct summand.

To conclude the proof, it suffices to show that \(G'\) is isomorphic to G. To see this, note that \(\varLambda ' = \mathcal {L}(D) = \mathcal {L}(UMV) = \mathcal {L}(UM) = \left\{ U z \,:\, z \in \varLambda \right\} \). Thus, the map \(z \mapsto U z\) is an automorphism of \(\mathbb {Z}^m\) and an isomorphism from \(\varLambda \) to \(\varLambda '\). Thus, \(z \mapsto U z\) induces an isomorphism from the group \(G = \mathbb {Z}^m / \varLambda \) to the group \(G'=\mathbb {Z}^m / \varLambda '\). \(\square \)

The following lemma allows us to reduce considerations to the case \(\gcd (A)=1\), without affecting the sparsity.

Lemma 3

Let \(A \in \mathbb {Z}^{m \times n}\) have a full row rank and let \(M \in \mathbb {Z}^{m \times m}\) be a matrix whose columns form a basis for \(\mathcal {L}(A)\). Then the following holds:

  1. (a)

    \(M^{-1} A\) is an integer matrix of full row rank satisfying \(\gcd (M^{-1} A) = 1\).

  2. (b)

    The map \(\varvec{b} \mapsto M^{-1} \varvec{b}\), as a map from \(\mathcal {L}(A)\) to \(\mathcal {L}(M^{-1} A) = \mathbb {Z}^m\), is an isomorphism of lattices, and, as a map from \({\text {Sg}}(A)\) to \({\text {Sg}}(M^{-1}A)\), is an isomorphism of the semigroups.

  3. (c)

    \(\mathsf {ILC}(A)=\mathsf {ILC}(M^{-1}A)\), \(\mathsf {ILR}(A)=\mathsf {ILR}(M^{-1}A)\), \(\mathsf {ICC}(A)=\mathsf {ICC}(M^{-1}A)\), and \(\mathsf {ICR}(A)=\mathsf {ICR}(M^{-1}A)\).

  4. (d)

    For given mn and A, a matrix M as above can be computed in polynomial time.

Proof

(a) follows from \(\gcd (M^{-1} A) = \gcd (A) / |\det (M)| = \gcd (A) / \det (\mathcal {L}(A)) = 1\), (b) is straightforward, and (c) follows from (b). To show (d), we observe that M can be obtained from the Hermite Normal Form H of A (with respect to the column transformations) by discarding the zero columns of H. \(\square \)

Lemma 4

Let \(A \in \mathbb {Z}^{m \times n}\) have a full row rank and let \(A_\tau \) be a non-singular \(m \times m\) minor of A, where \(\tau \in \left( {\begin{array}{c}[n]\\ m\end{array}}\right) \). Then there exists \(\gamma \subseteq [n]\) with \(\tau \subseteq \gamma \) such that \(\mathcal {L}(A_\gamma ) = \mathcal {L}(A)\) and \(|\gamma | \le m + \varOmega _m \left( \frac{|\det (A_\tau )|}{\gcd (A)} \right) \). For the input A and \(\tau \) in binary encoding, such \(\gamma \) can be computed in polynomial time.

Proof

By Lemma 3, it suffices to consider the case \(\gcd (A)=1\). Let \(\varvec{a}_1,\ldots ,\varvec{a}_n\) be the columns of A. Without loss of generality, let \(\tau = [m]\).

The matrix \(A_\tau \) gives rise to the lattice \(\varLambda := \mathcal {L}(A_\tau )\) of rank m, while \(\varLambda \) determines the finite Abelian group \(\mathbb {Z}^m / \varLambda \). Consider the canonical homomorphism \(\phi : \mathbb {Z}^m \rightarrow \mathbb {Z}^m / \varLambda \), sending an element of \(\mathbb {Z}^m\) to its coset modulo \(\varLambda \). Since \(\gcd (A)=1\), we have \(\mathcal {L}(A)= \mathbb {Z}^m\), which implies \(\left\langle T \right\rangle = \mathbb {Z}^m / \varLambda \) for \(T:= \{\phi (\varvec{a}_{m+1}),\ldots , \phi (\varvec{a}_n) \}\). For every non-redundant subset S of \(\mathbb {Z}^m / \varLambda \) or \(\left\langle T \right\rangle \), we have

$$\begin{aligned} |S|&\le \kappa ( \mathbb {Z}^m / \varLambda )&\text {(by Theorem~7)}\\&\le \varOmega _m(|\det (A_\tau )|)&\text {(by Lemmas 1 and 2).} \end{aligned}$$

We fix a set \(I \subseteq \{m+1,\ldots ,n\}\) that satisfies \(|I|=|S|\) and \(S = \left\{ \phi (\varvec{a}_i) \,:\, i \in I \right\} \) and introduce

$$\begin{aligned} \gamma = I \cup \tau \,. \end{aligned}$$
(12)

In this notation, equality \(\left\langle T \right\rangle = \left\langle S \right\rangle = \mathbb {Z}^m / \varLambda \) can be reformulated as \(\mathbb {Z}^m = \mathcal {L}(A_I) + \varLambda = \mathcal {L}(A_I) + \mathcal {L}(A_\tau ) = \mathcal {L}(A_\gamma )\).

Let us now show that \(\gamma \) can be determined in polynomial time. It is enough to determine the set I, which defines the non-redundant subset \(S = \left\{ \phi (\varvec{a}_i) \,:\, i \in I \right\} \) of \(\mathbb {Z}^m / \varLambda \). Start with \(I = \{m+1,\ldots ,n\}\) and iteratively check if some of the elements \(\phi (\varvec{a}_i) \in \mathbb {Z}^m / \varLambda \), where \(i \in I\), is in the group generated by the remaining elements. Suppose \(j \in I\) and we want to check if \(\phi (\varvec{a}_j)\) is in the group generated by all \(\phi (\varvec{a}_i)\) with \(i \in I{\setminus } \{j\}\). Since \(\varLambda = \mathcal {L}(A_\tau )\), this is equivalent to checking \(\varvec{a}_j \in \mathcal {L}(A_{I {\setminus } \{j\} \cup \tau })\) and is thus reduced to solving a system of linear Diophantine equations with the left-hand side matrix \(A_{I {\setminus } \{j\} \cup \tau }\) and the right-hand side vector \(\varvec{a}_j\) (such systems can be solved in polynomial time by [32, Corollary 5.3b]). Thus, carrying out the above procedure for every \(j \in I\) and removing j from I whenever \(\varvec{a}_j \in \mathcal {L}(A_{I {\setminus } \{j\} \cup \tau })\), we eventually arrive at a set I that determines a non-redundant subset S of \(\mathbb {Z}^m / \varLambda \). This is done by solving at most \(n-m\) linear Diophantine systems in total, where the matrix of each system is a sub-matrix of A and the right-hand side vector of the system is a column of A. \(\square \)

Proof of Theorem 1

Consider an arbitrary \(\tau \in \left( {\begin{array}{c}[n]\\ m\end{array}}\right) \), for which the matrix \(A_\tau \) is non-singular, and the respective \(\gamma \) as in Lemma 4. One has \(\mathsf {ILC}(A) \le |\gamma | \le m + \varOmega _m \left( \frac{|\det (A_\tau )}{\gcd (A)}|\right) \). This yields (2). \(\square \)

Proof of Corollary 1

We use \(\gamma \) from Lemma 4. Since \(\varvec{b} \in \mathcal {L}(A) = \mathcal {L}(A_\gamma )\), there exists a solution \(\varvec{x}\) of \(A \varvec{x} = \varvec{b}, \varvec{x} \in \mathbb {Z}^n\) with \({\text {supp}}(\varvec{x}) \subseteq \gamma \). This solution can be computed by solving the Diophantine system with the left-hand side matrix \(A_\gamma \) and the right-hand side vector \(\varvec{b}\). \(\square \)

3 Proofs of Theorem 2 and Corollary 2

Lemma 5

Let \(A \in \mathbb {Z}^{m \times n}\) be a matrix whose columns positiely span \(\mathbb {R}^m\) and \(A_\tau \) be a non-singular \(m \times m\) sub-matrix of A, where \(\tau \in \left( {\begin{array}{c}[n]\\ m\end{array}}\right) \). Then there exists \(I \subseteq [n]\) satisfying \(\tau \subseteq I\), \(|I| \le 2m + \varOmega _m\left( \frac{|\det (A_\tau )|}{\gcd (A)} \right) \) and \(\mathcal {L}(A) =\mathcal {L}(A_I)\) such that the columns of \(A_I\) positively span \(\mathbb {R}^m\). For the input A and \(\tau \) in binary encoding, such I can be computed in polynomial time.

Proof

Consider \(\gamma \) as in Lemma 4. Let \(\varvec{a}_1,\ldots ,\varvec{a}_n\) be the columns of A. Since the matrix \(A_\tau \) is non-singular, the m vectors \(\varvec{a}_i\), where \(i \in \tau \), together with the vector \(\varvec{v}= -\sum _{i \in \tau } \varvec{a}_i\) positively span \(\mathbb {R}^n\). Since all columns of A positively span \(\mathbb {R}^n\), the conic version of Carathéodory’s theorem implies the existence of a set \(\beta \subseteq [n]\) with \(|\beta | \le m\), such that \(\varvec{v}\) is in the conic hull of \(\left\{ \varvec{a}_i \,:\, i \in \beta \right\} \). Consequently, the set \(\left\{ \varvec{a}_i \,:\, i \in \beta \cup \tau \right\} \) and, by this, also the larger set \(\left\{ \varvec{a}_i \,:\, i \in \beta \cup \gamma \right\} \) positively span \(\mathbb {R}^m\). Thus, in view of Lemma 4, the structural part of the assertion holds \(I = \beta \cup \gamma \).

It remains to show the algorithmic part of the assertion. In view of Lemma 4, one can construct \(\gamma \) in polynomial time. To determine I, we need to construct \(\beta \) in polynomial time. Start with \(\beta =[n]\) and iteratively reduce \(\beta \) as follows. Using a polynomial-time algorithm for linear optimization, check if after a removal of one of the elements from \(\beta \), the vector \(\varvec{v}\) is still in the conic hull of \(\left\{ \varvec{a}_i \,:\, i \in \beta \right\} \). This procedure does at most n iterations. By Carathéodory’s theorem, after the termination, the system of vectors \(\varvec{a}_i\) with \(i \in \beta \) is linearly independent. \(\square \)

Lemma 6

Let \(A \in \mathbb {Z}^{m \times n}\) be a matrix, whose columns positively span \(\mathbb {R}^m\). Then \(\mathcal {L}(A) = {\text {Sg}}(A)\). Let \(\varvec{b} \in \mathcal {L}(A)\). If A and \(\varvec{b}\) are given in binary encoding, then a solution to \(A \varvec{x} = \varvec{b}, \ \varvec{x} \in \mathbb {Z}_{\ge 0}^n\) can be constructed in polynomial time.

Proof

Since the columns of A positively span \(\mathbb {R}^m\), the feasibility problem \(A \varvec{y} = \varvec{0}, \ \varvec{y} \in \mathbb {Q}_{\ge 1}^n\) of linear programming has a solution \(\varvec{y}\). One can determine such a solution \(\varvec{y}\) in polynomial time using a polynomial-time algorithm of linear optimization. The description size of \(\varvec{y}\) is polynomial. Thus, one can re-scale \(\varvec{y}\) to clear denominators in polynomial time, and arrive at a vector \(\varvec{y} \in \mathbb {Z}_{\ge 1}^n\) satisfying \(A \varvec{y} = \varvec{0}\).

Now, if \(\varvec{b} \in \mathcal {L}(A)\). Then we first solve the Diophantine system \(A \varvec{x} = \varvec{b}, \ \varvec{x} \in \mathbb {Z}^n\) in polynomial time and determine one solution \(\varvec{x}^*\) of this system. But then the vector \(\varvec{x} = \varvec{x}^*+ \Vert \varvec{x}^*\Vert _\infty \, \varvec{y}\) is a non-negative integer solution of \(A \varvec{x} =\varvec{b}\). This verifies \(\mathcal {L}(A) = {\text {Sg}}(A)\) and shows the algorithmic part of the assertion. \(\square \)

Proofs of Theorem 2 and Corollary 2

Choose \(\tau \in \left( {\begin{array}{c}[n]\\ m\end{array}}\right) \) such that \(A_\tau \) is non-singular and consdier I as in Lemma 5. In view of Lemma 6, one has \(\mathcal {L}(A_I) ={\text {Sg}}(A_I)\). Consequently, \(\mathsf {ICR}(A) \le \mathsf {ICC}(A) \le |I|\) and the assertion of Theorem 2 follows from Lemma 5.

To show Corollary 2, observe that by Lemma 5, the set I can be constructed in polynomial time. To conclude the proof, it suffices to apply the algorithmic part of Lemma 6 to the sub-matrix \(A_I\). \(\square \)

4 Proofs of Theorems 3 and 4

To motivate Theorem 3 and its proof, we start this section by giving a self-containing proof of Theorem 4 which contains the key ideas to the proof of Theorem 3 while being less technical.

The following lemma is a key to the proof of Theorem 4.

Lemma 7

Let \(a_1,\ldots ,a_t \in \mathbb {Z}_{>0}\), where \(t \in \mathbb {Z}_{> 0}\). If \(t > 1 + \log _2 (a_1)\), then the system

$$\begin{aligned}&y_1 a_1 + \cdots + y_t a_t = 0, \\&y_1 \in \mathbb {Z}_{\ge 0}, \ y_2,\ldots ,y_t \in \{-1,0,1\}. \end{aligned}$$

in the unknowns \(y_1,\ldots ,y_t\) has a solution that is not identically equal to zero.

Proof

If X is a convex set whose affine hull has dimension at most k, then we use \({\text {vol}}_k(X)\) to denote the k-dimensional volume of X.

Consider the convex set \(Y \subseteq \mathbb {R}^t\) defined by 2t strict linear inequalities

$$\begin{aligned} -1<&y_1 a_1 + \cdots + y_t a_t< 1, \\ -2<&y_i < 2 \text { for all } i \in \{2,\ldots ,t\}. \end{aligned}$$

Clearly, the set Y is the interior of a hyper-parallelepiped and can also be described as \(Y = \left\{ \varvec{y} \in \mathbb {R}^t \,:\, \Vert M \varvec{y}\Vert _\infty < 1 \right\} \), where M is the upper triangular matrix

$$\begin{aligned} M = \begin{pmatrix} a_1 &{} a_2 &{} \cdots &{} a_t \\ &{} 1/2 &{} &{} \\ &{} &{} \ddots &{} \\ &{} &{} &{} 1/2 \end{pmatrix}. \end{aligned}$$

It is easy to see that \({\text {vol}}_t(Y)\) of Y is

$$ {\text {vol}}_t(Y)={\text {vol}}_t(M^{-1}[-1,1]^t)=\frac{1}{\det (M)}2^t=\frac{4^t}{2 a_1}.$$

The assumption \(t > 1 + \log _2 (a_1)\) implies that the volume of Y is strictly larger than \(2^t\). Thus, by Minkowski’s first theorem [6, Ch. VII, Sect. 3], the set Y contains a non-zero integer vector \(\varvec{y} = (y_1,\ldots ,y_t)^\top \in \mathbb {Z}^t\). Without loss of generality we can assume that \(y_1 \ge 0\) (if the latter is not true, one can replace \(\varvec{y}\) by \(-\varvec{y}\)). The vector \(\varvec{y}\) is a desired solution from the assertion of the lemma. \(\square \)

Proof of Theorem 4

By Lemma 3, we may assume that \(\gcd ({\varvec{a}})=1\). Further, without loss of generality, let \( a_1 = \min \{a_1,\ldots ,a_n\}\). Let \(b\in {\text {Sg}}({\varvec{a}})\). By the definition of \(\mathsf {ICR}(\varvec{a})\), the integer Carathéodory rank, we need to show the existence of solution to \(\varvec{a} \varvec{x} = b, \ \varvec{x} \in \mathbb {Z}_{\ge 0}^n\) satisfying \(\Vert \varvec{x}\Vert _0 \le 1 + \log _2( a_1)\). Choose a solution \(\varvec{x}=(x_1,\ldots ,x_n)^\top \) with the property that the number of indices \(i \in \{2,\ldots ,n\}\) for which \(x_i \ne 0\) is minimized. Without loss of generality, we can assume that, for some \(t \in \{2,\ldots ,n\}\) one has \(x_2> 0,\ldots , x_t > 0, x_{t+1} = \cdots = x_n = 0\). We claim that Lemma 7 implies \(t \le 1 + \log _2 (a_1) \). In fact, if the latter was not true, then a solution \(\varvec{y} \in \mathbb {R}^t\) of the system in Lemma 7 could be extended to a solution \(\varvec{y} \in \mathbb {R}^n\) by appending zero components. It is clear that some of the components \(y_2,\ldots ,y_t\) are negative, because \(a_2> 0,\ldots , a_t > 0\). It then turns out that, for an appropriate choice of \(k \in \mathbb {Z}_{\ge 0}\), the vector \(\varvec{x}'= (x_1',\ldots ,x'_n)^\top = \varvec{x} + k \varvec{y} \) satisfies \(\varvec{a} \varvec{x} = b\) and \(x_1' \ge 0,\ldots , x'_t \ge 0, \ x'_{t+1} = \cdots = x'_n = 0\) and \(x'_i=0\) for at least one \(i \in \{2,\ldots ,t\}\). Indeed, one can choose k to be the minimum among all \(x_i\) with \(i \in \{2,\ldots ,t\}\) and \(y_i=-1\).

The existence of \(\varvec{x}'\) with at most \(t-1\) non-zero components \(x_i'\) with \(i \in \{2,\ldots ,n\}\) contradicts the choice of \(\varvec{x}\) and yields the assertion. \(\square \)

To prove Theorem 3 we need two auxiliary lemmas, of which one generalizes Lemma 7. In what follows, we will denote the linear hull of \(X \subseteq \mathbb {R}^m\) by \({\text {lin}}(X)\).

Lemma 8

Let \(A = (\varvec{a}_1,\ldots ,\varvec{a}_n) \in \mathbb {Z}^{m \times n}\) satisfy (6)–(8). Then there exists a basis \(\varvec{u}_1,\ldots ,\varvec{u}_m\) of the lattice \(\mathcal {L}(A)\) satisfying

$$\begin{aligned} {\text {cone}}(\varvec{a}_1)&= {\text {cone}}(\varvec{u}_1) \qquad \text {and} \qquad {\text {cone}}(\varvec{a}_1,\ldots ,\varvec{a}_n) {\setminus } \{0\}&\subseteq H, \end{aligned}$$

where H is the open halfspace

$$\begin{aligned} H:= \Bigl \{ \sum _{i=1}^m \alpha _i \varvec{u}_i \, : \, \alpha _1 > 0, \alpha _2,\ldots ,\alpha _m \in \mathbb {R} \Bigr \}. \end{aligned}$$

Proof

We argue by induction on the dimension m. For \(m=1\), the assertion holds with \(u_1=\gcd (A)\). Assume the assertion is true in dimension \(m-1\), where \(m \ge 2\). In view (7) and (8), \({\text {cone}}(\varvec{a}_1,\ldots ,\varvec{a}_n)\) is an m-dimensional pointed cone and so it has a facet F that contains \({\text {cone}}(\varvec{a}_1)\) as a subset. The semigroup \(\mathcal {L}(A) \cap F\) is finitely generated so that one can choose finitely many vectors \(\varvec{b}_1,\ldots ,\varvec{b}_t\) satisfying \(\mathcal {L}(A) \cap F = {\mathcal {Sg}}(\varvec{b}_1,\ldots ,\varvec{b}_t)\). Since \(\varvec{b}_1,\ldots , \varvec{b}_t\) contain generators of the extreme rays of F and \(\varvec{a}_1\) generates one of these extreme rays, we can assume that \({\text {cone}}(\varvec{b}_1) = {\text {cone}}(\varvec{a}_1)\). Viewing \({\text {lin}}(F)\) as \(\mathbb {R}^{m-1}\) and applying the induction assumption to the cone \(F = {\text {cone}}(\varvec{b}_1,\ldots ,\varvec{b}_t)\), we conclude that there exists a basis \(\varvec{u}_1,\ldots ,\varvec{u}_{m-1} \in \mathbb {Z}^m\) of the lattice \(\mathcal {L}(\varvec{b}_1,\ldots ,\varvec{b}_t) = \mathcal {L}(A) \cap {\text {lin}}(F)\) that satisfies the conditions \({\text {cone}}(\varvec{a}_1) = {\text {cone}}(\varvec{b}_1) = {\text {cone}}(\varvec{u}_1)\) and

$$\begin{aligned} F \subseteq \{0\} \cup \Bigl \{ \sum _{i=1}^{m-1} \alpha _i \varvec{u}_i \, : \, \alpha _1 > 0, \alpha _2,\ldots ,\alpha _{m-1} \in \mathbb {R} \Bigr \}. \end{aligned}$$

To conclude the proof, we need to choose \(\varvec{u}_m\) appropriately. We extend the basis \(\varvec{u}_1,\ldots ,\varvec{u}_{m-1}\) of \(\mathcal {L}(A) \cap {\text {lin}}(F)\) to a basis \(\varvec{u}_1,\ldots ,\varvec{u}_{m-1}, \varvec{v}\) of \(\mathcal {L}(A)\). We will fix

$$\begin{aligned} \varvec{u}_m := \varvec{v} - N (\varvec{u}_1 + \cdots + \varvec{u}_{m-1}) \end{aligned}$$

with an appropriate \(N \in \mathbb {Z}_{>0}\).

Since \({\text {lin}}(F)\) is a supporting hyperplane of \({\text {cone}}(\varvec{a}_1,\dots ,\varvec{a}_n)\), we can assume, after possibly exchanging the roles of \(\varvec{v}\) and \(-\varvec{v}\), that \(\varvec{v}\) and \({\text {cone}}(\varvec{a}_1,\ldots ,\varvec{a}_n)\) are on the same side of the hyperplane \({\text {lin}}(F)\). Every vector \(\varvec{a}_i\) can be expressed as

$$\begin{aligned} \varvec{a}_i = \sum _{j=1}^{m-1} \beta _{i,j} \varvec{u}_j + \beta _i \varvec{v}, \end{aligned}$$

where \(\beta _{i,j} \in \mathbb {Z}\) and \(\beta _i \in \mathbb {Z}_{\ge 0}\). Furthermore, if \(\varvec{a}_i \not \in F\), then, in view of (6), \(\beta _i > 0\) and, thus, using the representation

$$\begin{aligned} \varvec{a}_i = \sum _{j=1}^{m-1} (\beta _{i,j} + N \beta _i) \varvec{u}_j + \beta _i \underbrace{(\varvec{v} - N (\varvec{u}_1 + \cdots + \varvec{u}_{m-1}))}_{\varvec{u}_m} \end{aligned}$$

we see that, for N large enough, one has \(\beta _{i,j} + N \beta _i > 0\) so that \(\varvec{a}_i \in H\) for all vectors \(\varvec{a}_i\) that do not belong to F. The vectors \(\varvec{a}_i\) that belong to F are in H by the choice of \(\varvec{u}_1,\ldots ,\varvec{u}_{m-1}\).

Consequently, \(\{\varvec{a}_1,\ldots ,\varvec{a}_n\} \subseteq H\). Since \({\text {cone}}(\varvec{a}_1,\ldots ,\varvec{a}_n)\) is pointed, the latter implies \({\text {cone}}(\varvec{a}_1,\ldots ,\varvec{a}_n) {\setminus } \{\varvec{0} \} \subseteq H\). \(\square \)

The following lemma generalizes Lemma 7.

Lemma 9

Let \(A = (\varvec{a}_1,\ldots ,\varvec{a}_n) \in \mathbb {Z}^{m \times n}\) satisfy (6)–(8). Let

$$\begin{aligned} n > m + \log _2 \left( \frac{q(A)}{\gcd (A)} \right) , \end{aligned}$$
(13)

where \(q(A):= \sqrt{\sum _{I \in \left( {\begin{array}{c}[n]\\ m\end{array}}\right) \,:\, 1 \in I} \det (A_I)^2}\). Then the system

$$\begin{aligned}&y_1 \varvec{a}_1 + \cdots + y_n \varvec{a}_n =\varvec{0},\\&\quad y_1 \in \mathbb {Z}_{\ge 0}, \ y_2,\ldots ,y_n \in \{-1,0,1\}. \end{aligned}$$

in the unknowns \(y_1,\ldots ,y_n\) has a solution such that at least one of \(y_2,\ldots ,y_n\) equals \(-1\).

Proof

In view of Lemma 7 we may assume that \(m\ge 2\). Consider \(U= (\varvec{u}_1,\ldots ,\varvec{u}_m) \in \mathbb {Z}^{m \times m}\), where \(\varvec{u}_1,\ldots ,\varvec{u}_m\) are vectors as in Lemma 8. We can express the columns of A in the basis \(\varvec{u}_1,\ldots ,\varvec{u}_m\). This means that \(A = U \overline{A}\) holds for some matrix \(\overline{A}= ( \overline{a}_{i,j} )_{i \in [m], \ j \in [n]} \in \mathbb {Z}^{m \times n}\). In view of Lemma 8, we have \(\overline{a}_{1,j} > 0\) for every \(j \in [n]\) and \(\overline{a}_{i,1} = 0\) for every \(i \in \{2,\ldots ,m\}\). Clearly, \(A \varvec{y} =0\) is equivalent to \(\overline{A}\varvec{y} = 0\). The system \(\overline{A}\varvec{y} =0\) has the structure

$$\begin{aligned} \underbrace{ \left( \begin{array}{c|ccc} \overline{a}_{1,1} &{} \overline{a}_{1,2} &{} \cdots &{} \overline{a}_{1,n} \\ \hline 0 &{} \overline{a}_{2,2} &{} \cdots &{} \overline{a}_{2,n} \\ \vdots &{} \vdots &{} &{} \vdots \\ 0 &{} \overline{a}_{m,2} &{} \cdots &{} \overline{a}_{m,n} \end{array} \right) }_{\overline{A}} \cdot \underbrace{ \left( \begin{array}{c} y_1 \\ \hline y_2 \\ \vdots \\ y_n \end{array} \right) }_{y} = \left( \begin{array}{c} 0 \\ \hline 0 \\ \vdots \\ 0 \end{array} \right) . \end{aligned}$$
(14)

Using the notation \(\alpha _1,\ldots ,\alpha _n \in \mathbb {R}_{>0}\) and \(B \in \mathbb {Z}^{(m-1) \times (n-1)}\) such that

$$\begin{aligned} \overline{A}= \left( \begin{array}{c|ccc} \alpha _1 &{} \alpha _2 &{} \cdots &{} \alpha _n \\ \hline 0 &{} &{} &{} \\ \vdots &{} &{} B &{} \\ 0 &{} &{} &{} \end{array} \right) , \end{aligned}$$
(15)

we formulate (14) as

$$\begin{aligned} \alpha _1 y_1 + \cdots + \alpha _n y_n&= 0,&\begin{pmatrix} y_2 \\ \vdots \\ y_n \end{pmatrix}&\in \ker (B). \end{aligned}$$
(16)

By the m-dimensionality assumption in (7), the matrix A has rank m. In view of this and \(\alpha _1> 0\), the matrix B has rank \(m-1\). Therefore

$$\begin{aligned} \varLambda := \mathbb {Z}^{n-1} \cap \ker (B) \end{aligned}$$

is a \((n-m)\)–dimensional lattice with

$$\begin{aligned} \det (\varLambda ) = \frac{\sqrt{\det (B B^\top )}}{\gcd (B)} \end{aligned}$$

(recall that we assume \(m\ge 2\) and hence \(\varLambda \) is well-defined).

It turns out that \(\gcd (B)=1\). Indeed, by the choice of \(\overline{A}\), we have \(\mathcal {L}(\overline{A})= \mathbb {Z}^m\). By (15), the lattice \(\mathcal {L}(B)\) is obtained by projecting \(\mathcal {L}(\overline{A})\) onto the last \(m-1\) components. So, \(\mathcal {L}(B) = \mathbb {Z}^{m-1}\) and by this \(\gcd (B)=1\). This yields

$$\begin{aligned} \det (\varLambda ) = \sqrt{\det (BB^\top )}. \end{aligned}$$

We introduce the \(\varvec{0}\)-symmetric convex set

$$\begin{aligned} C := (-2,2)^{n-1} \cap \ker (B), \end{aligned}$$

which is the relative interior of an \((n-m)\)-dimensional cross-section of the cube \([-2,2]^{n-1}\). By Vaaler’s cube slicing inequality [33], we have

$$\begin{aligned} {\text {vol}}_{n-m}(C) \ge 4^{n-m}. \end{aligned}$$

We introduce the \(\varvec{0}\)-symmetric convex set

$$\begin{aligned} C' = \left\{ (y_1,\ldots ,y_n)^\top \,:\, -1< \alpha _1 y_1 + \cdots + \alpha _n y_n < 1, \ (y_2,\ldots ,y_n)^\top \in C \right\} . \end{aligned}$$

It is easy to see that

$$\begin{aligned} {\text {vol}}_{n-m+1}(C') = \frac{2 {\text {vol}}_{n-m}(C)}{\alpha _1}. \end{aligned}$$

By Minkowski’s first theorem applied to the set \(C'\) and the lattice \(\mathbb {Z}\times \varLambda \), we know that if

$$\begin{aligned} {\text {vol}}_{n-m+1}(C') > 2^{n-m+1} \det (\mathbb {Z}\times \varLambda ), \end{aligned}$$

then the set \(C'\) contains a non-zero vector \(\varvec{y}\) of the lattice \(\mathbb {Z}\times \varLambda \). We have

$$\begin{aligned} \det (\mathbb {Z}\times \varLambda ) = \det (\varLambda ) = \sqrt{\det (B B^\top )} \end{aligned}$$

and

$$\begin{aligned} {\text {vol}}_{n-m+1}(C') = \frac{2 {\text {vol}}_{n-m}(C)}{\alpha _1} \ge \frac{2 \cdot 4^{n-m}}{\alpha _1}. \end{aligned}$$

This means, that the assumptions of Minkowski’s first theorem are fulfilled when

$$\begin{aligned} \frac{2 \cdot 4^{n-m}}{\alpha _1} > 2^{n-m+1} \sqrt{\det (B B^\top )}. \end{aligned}$$
(17)

Clearly, (17) can be written as

$$\begin{aligned} n > m + \log _2 \left( \alpha _1 \sqrt{\det (BB^\top )} \right) . \end{aligned}$$
(18)

We show that (13) and (18) are equivalent by verifying that the terms under the logarithm coincide. Keeping in mind that B is a submatrix of \(\overline{A}\) [see (15)], we index columns of B by \(\{2,\ldots ,n\}\). We have

$$\begin{aligned} \begin{aligned} \alpha _1 \sqrt{ \det (B B^\top )} = \alpha _1 \sqrt{ \sum _{J \in \left( {\begin{array}{c}\{2,\ldots ,n\}\\ m-1\end{array}}\right) } \det (B_J)^2}\\ = \sqrt{\sum _{J \in \left( {\begin{array}{c}\{2,\ldots ,n\}\\ m-1\end{array}}\right) } (\alpha _1 \det (B_J))^2} = \sqrt{\sum _{J \in \left( {\begin{array}{c}\{2,\ldots ,n\}\\ m-1\end{array}}\right) } \det (\overline{A}_{\{1\} \cup J})^2} \\ = q(\overline{A}) = q(U^{-1} A) = \frac{q(A)}{|\det (U)|} = \frac{q(A)}{\gcd (A)}, \end{aligned} \end{aligned}$$

which verifies the equivalence of (13) and (18).

Now, consider a non-zero lattice vector \(\varvec{y}=(y_1,\ldots ,y_n)^\top \) in \(C' \cap (\mathbb {Z}\times \varLambda )\). By the choice of \(C'\) and \(\varLambda \), the vector \(\varvec{y}\) is a solution of \(\overline{A}\varvec{y} = 0\) and by this also a solution of \(A \varvec{y} =0\) and, furthermore, we have \(y_2,\ldots , y_n \in \{-1,0,1\}\). Possibly replacing \(\varvec{y}\) with \(-\varvec{y}\), we can ensure that \(y_1 \ge 0\). Since the equation \(\alpha _1 y_1 + \cdots + \alpha _n y_n = 0\) (contained in the system \(\overline{A}\varvec{y} = 0\)) has positive coefficients and since \(y_1 \ge 0\), we conclude that at least one of the variables \(y_2,\ldots ,y_n\) of our solution \(\varvec{y}\) is negative. Thus, our solution \(\varvec{y}\) satisfies the assertions of the lemma. \(\square \)

Proof of Theorem 3

It is sufficient to show that any feasible problem \( A \varvec{x} =\varvec{b}, \ \varvec{x} \in \mathbb {Z}_{\ge 0}^n\) with the matrix A satisfying assumptions (6)–(8) has a solution with the size of support bounded by \(m + \log _2 (q(A)/\gcd (A))\). Choose a solution \(\varvec{x}^*= (x_1^*,\ldots ,x_n^*)^\top \) of \(A \varvec{x} =\varvec{b}, \ \varvec{x} \in \mathbb {Z}_{\ge 0}^n\), for which the number of indices \(i \in \{2,\ldots ,n\}\) satisfying \(x_i^*\ne 0\) is minimal. After a reordering the columns of the matrix A, we can assume that \( x_i^*\ne 0\) for \(2 \le i \le s\) and \(x_i^*= 0\) for \(i > s\), for some choice of s. Consider the vector space \(V:={\text {lin}}(\varvec{a}_1,\ldots ,\varvec{a}_s)\). Since \(\varvec{a}_1,\ldots ,\varvec{a}_n\) linearly span \(\mathbb {R}^m\), among the vectors \(\varvec{a}_{s+1},\ldots ,\varvec{a}_n\) one can choose linearly independent columns that together with a basis of V form a basis of \(\mathbb {R}^m\). Without loss of generality, let \(\varvec{a}_{s+1},\ldots ,\varvec{a}_t\) be such vectors, that is, one has \(V \oplus W = \mathbb {R}^m\) with \(W:={\text {lin}}(\varvec{a}_{s+1},\ldots ,\varvec{a}_t)\). In the degenerate case \(V= \mathbb {R}^m\), we just fix \(t=s\) and \(W=\{\varvec{0}\}\). We show that

$$\begin{aligned} t \le m + \log _2 \left( \frac{q(A)}{\gcd (A)}\right) \end{aligned}$$

arguing by contradiction. Assume that \(t > m + \log _2 \left( \frac{q(A)}{\gcd (A)}\right) \). Conditions (6)–(8) are fulfilled for the matrix \(A_{[t]} = (\varvec{a}_1,\ldots ,\varvec{a}_t)\) in place of \(A= (\varvec{a}_1,\ldots ,\varvec{a}_n)\). Since \(q(A_{[t]}) \le q(A)\) and \(\gcd (A_{[t]}) \ge \gcd (A)\), we have

$$\begin{aligned} t > m + \log _2 \left( \frac{q(A_{[t]})}{\gcd (A_{[t]})}\right) . \end{aligned}$$

The application of Lemma 9 to the matrix \(A_{[t]}\) yields the existence of a vector \(\varvec{y}=(y_1,\ldots ,y_t)^\top \in \mathbb {Z}_{\ge 0} \times \{-1,0,1\}^{t-1}\) satisfying \(A_{[t]} \varvec{y} = \varvec{0}\) with at least one of the values \(y_2,\ldots ,y_t\) equal to \(-1\). In view of \(V \oplus W = \mathbb {R}^m\) and the linear independence of the vectors \(\varvec{a}_{s+1},\ldots ,\varvec{a}_t\), the equality

$$\begin{aligned} \varvec{0} = A_{[t]} \varvec{y} = \sum _{i=1}^t \varvec{a}_i y_i = \underbrace{\sum _{i=1}^s \varvec{a}_i y_i}_{\in V} + \underbrace{\sum _{i=s+1}^t \varvec{a}_i y_i}_{\in W} \end{aligned}$$

yields \(y_i=0\) for \(s < i \le t\). This shows that one of the values \(y_2,\ldots ,y_s\) is equal to \(-1\). We convert \(\varvec{y} \in \mathbb {Z}^t\) to a vector in \(\varvec{y}^\prime \in \mathbb {Z}^n\) by appending zero components.

Clearly, \(\varvec{x}=\varvec{x}^*+ k \varvec{y}^\prime \) is a solution of \(A \varvec{x} =\varvec{b}\), and if we choose k to be the minimum among the values \(x_i^*\), where \(i \in \{2,\ldots ,s\}\) and \(y_i=-1\), then \(\varvec{x} = \varvec{x}^*+ k \varvec{y}^\prime \) is a solution of \( A \varvec{x} =\varvec{b}, \ \varvec{x} \in \mathbb {Z}_{\ge 0}^n\), for which the number of indices \(i \in \{2,\ldots ,n\}\) satisfying \(x_i^*\ne 0\) is smaller than for the solution \(\varvec{x}^*\) of \( A \varvec{x} =\varvec{b}, \ \varvec{x} \in \mathbb {Z}_{\ge 0}^n\). This contradicts the choice of \(\varvec{x}^*\) and shows that

$$\begin{aligned} \Vert \varvec{x}^*\Vert _0 \le t \le m + \log _2 \left( \frac{q(A)}{\gcd (A)}\right) . \end{aligned}$$

\(\square \)

5 Optimality of the bounds

In this section we prove a series of three propositions that show, respectively, that the Theorems 12 and 3 are optimal. For this we introduce the following notation. For an integer \(z \in \mathbb {Z}_{>0}\) with prime factorization \(z = p_1^{s_1} \cdots p_k^{s_k}\), where the distinct prime factors \(p_1,\ldots ,p_k\) have the multiplicities \(s_1,\ldots ,s_k \in \mathbb {Z}_{>0}\), we define the set

$$\begin{aligned} S(z) := \left\{ \frac{z}{p_i^{s_i}} \,:\, i \in [k] \right\} . \end{aligned}$$

The elements of S(z) are relatively prime, but every non-empty proper subset of S(z) has a common divisor larger than one. The set S(z) has \(\omega (z)\) elements. If z is a prime number, we have \(S(z) = \{1\}\).

Proposition 1

Let \(m \in \mathbb {Z}_{>0}\) and let \(F : \mathbb {Z}_{>0} \rightarrow \mathbb {Z}_{> 0}\) be a function providing the bound

$$\begin{aligned} \mathsf {ILR}(A) \le \min \left\{ F \left( \frac{|\varDelta |}{\gcd (A)} \right) \,:\, \varDelta \ \text {nonzero}\,m \times m\, \text {minor of }\ A \right\} \end{aligned}$$

for all \(n \in \mathbb {Z}_{\ge m}\) and all matrices \(A \in \mathbb {Z}^{m \times n}\) of full row rank. Then \(F(z) \ge m + \varOmega _m(z)\) holds for every \(z \in \mathbb {Z}_{>0}\).

Proof

Let \(z \in \mathbb {Z}_{>0}\). We need to show \(F(z) \ge m + \varOmega _m(z)\). For \(z=1\), this reduces to showing \(F(z) \ge m\). The latter is clear, because the matrices A in the formulation of the assertion have rank m. We now consider the case \(z \ge 2\). We decompose z as \(z = z_1 \cdots z_m\) into m factors \(z_1,\ldots ,z_m \in \mathbb {Z}_{>0}\) as follows. Let \(\alpha _p\) denote the multiplicity of the prime number p in the prime factorization of z, i.e., \(z=\prod _{p \text { prime}}p^{\alpha _p}\). Then we define for \(i=1,\ldots ,m-1\)

$$\begin{aligned} z_i:=\prod _{\begin{array}{c} p \text { prime}\\ \alpha _p-i\ge 0 \end{array}}p \qquad \text {and}\qquad z_m:=\prod _{\begin{array}{c} p \text { prime}\\ \alpha _p-m\ge 0 \end{array}}p^{\alpha _p-m+1}. \end{aligned}$$

Let q be prime such that \(\gcd (z,q)=1\). Consider the sets \(S(z_1 q), \ldots , S(z_m q)\). Note that, by construction, \(z_i \in S(z_i q)\) for every \(i \in [m]\). Finally, we fix A to be a matrix of size \(m \times n\) with \(n = \sum _{i=1}^m |S(z_i q)|\) and the columns \(r \varvec{e}_i\), where \(i \in [m]\) and \(r \in S(z_i q)\). (The order of these columns in A does not matter.) By construction, \(\gcd (A) =1\), \(\mathsf {ILR}(A) = n\) and \(\varDelta = \pm z_1 \cdots z_m = \pm z\) is an \(m \times m\) minor of A. Consequently, \(n \le F(z)\). It remains to express n in terms of z. We have \(|S(z_i q) | = \omega ( z_i q) = \omega (z_i) + 1\). By construction, each prime factor of z of multiplicity \(\mu \le m\) has contributed one unit into \(\mu \) of the values \(\omega (z_1),\ldots ,\omega (z_m)\) and each prime factor of z of multiplicity \(\mu \ge m\) has contributed one unit to all m values \(\omega (z_1),\ldots ,\omega (z_m)\). This shows \(\sum _{i=1}^m \omega (z_i) = \varOmega _m(z)\) and implies \(F(z) \ge n = m + \varOmega _m(z)\). \(\square \)

Proposition 2

Let \(m \in \mathbb {Z}_{>0}\) and let \(F : \mathbb {Z}_{>0} \rightarrow \mathbb {Z}_{> 0}\) be a function providing the bound

$$\begin{aligned} \mathsf {ICR}(A) \le \min \left\{ F \left( \frac{|\varDelta |}{\gcd (A)} \right) \,:\, \varDelta \ \text {nonzero}\,m \times m\,\text { minor of }\ A \right\} \end{aligned}$$

for all \(n \in \mathbb {Z}_{>0}\) and all matrices \(A \in \mathbb {Z}^{m \times n}\) whose columns positively span \(\mathbb {R}^m\). Then \(F(z) \ge 2m + \varOmega _m(z)\) holds for every \(z \in \mathbb {Z}_{>0}\).

Proof

To deal with \(z=1\), just fix A to be the matrix with 2m columns \(2\varvec{e}_i\) and \(-\varvec{e}_i\), where \(i \in [m]\). For \(z > 1\) adapt the proof idea of Proposition 1. Decompose z into the product \(z = z_1 \cdots z_m\) as in the proof of Proposition 1, choose a prime q that is not a factor of z and introduce A to be a matrix with the columns \(r \varvec{e}_i\), where \(i \in [m]\) and \(r \in S(z_i q) \cup \{ - z_i q\}\). The matrix A has \(2m + \varOmega _m(z)\) columns, and it satisfies \(\gcd (A) =1\) and \(\mathsf {ICR}(A) \ge 2m + \varOmega _m(z)\). On the other hand, as in the proof of Proposition 1, one can see that \(\pm z\) is an \(m \times m\) minor of A. All this shows \(F(z) \ge 2m + \varOmega _m(z)\). \(\square \)

Proposition 3

Let \(m \in \mathbb {Z}_{>0}\) and let \(F : \mathbb {Z}_{>0} \rightarrow \mathbb {Z}_{>0}\) be a non-decreasing function that yields the bound

$$\begin{aligned} \mathsf {ICR}(A) \le F\left( \frac{q(A)}{\gcd (A)}\right) \end{aligned}$$

for all \(n \in \mathbb {Z}_{\ge m}\) and all matrices \(A = (\varvec{a}_1,\ldots , \varvec{a}_n) \in \mathbb {Z}^{m \times n}\) that satisfy conditions (6)–(8). Then \(F(z) \ge m + \left\lfloor \log _2 (z) \right\rfloor \) holds for every \(z \in \mathbb {Z}_{>0}\).

Proof

In view of the monotonicity of F, it suffices to consider \(z = 2^s\) with \(s \in \mathbb {Z}_{\ge 0}\). For \(s=0\), choose A to be the identity matrix. For \(s \ge 1\), we set \(n=s+m\) and define \(A = (\alpha _1 \varvec{e}_1,\ldots , \alpha _{s+1} \varvec{e}_1, \varvec{e}_2, \ldots , \varvec{e}_m)\), where \(\alpha _1 =2^{s}\), \(\alpha _i = 2^{s-1+i} + 2^{s+1-i}\) for \(i \in \{2,\ldots ,s+1\}\). One has \(\gcd (A) =1\), which can be verified directly for \(s \le 2\) and follows from \(\gcd (\alpha _1,\alpha _2, \alpha _3, \alpha _{s+1}) = \gcd (2^{s}, 2^{s-1} \cdot 5, 2^{s-2} \cdot 17, 2^{2s} + 1) = 1\) for \(s \ge 3\). Further, \(q(A) = \alpha _1 = 2^s\). To show \(\mathsf {ICR}(A) = m + s\), let \(\varvec{b} = (2^{2s+1} -1,1,\ldots ,1)\). It is clear that \(\varvec{x} = (1,\ldots ,1)^\top \) is a solution, and it remains to show that this is the only solution. The number \(2^{2s+1} -1\) is odd, while \(\alpha _s\) is the only odd coefficient on the left-hand side of the first equation. Thus, for every solution, one has \(x_s \ge 1\). Since \(2 a_s > 2^{2s+1} -1\), we even have \(x_s=1\). Substituting \(x_s = 1\), and dividing the first equation by 2, we arrive at the same system of equations with \(m + s -1\) variables. Iterating we conclude that all variables \(x_1,\ldots ,x_{m+s}\) must be equal to 1. This shows \(\mathsf {ICR}(A) = m+s\) and yields the desired inequality \(F(z) \ge m + s\). \(\square \)

6 Results on computational complexity

In this final section, we explore the computational complexity of the functions \(\mathsf {ILR}(A)\), \(\mathsf {ILC}(A)\), \(\mathsf {ICR}(A)\) and \(\mathsf {ICC}(A)\). We begin with the hardness results of Theorem 5.

6.1 Proof of Theorem 5, Parts (i) and (ii)

In the case \(m=1\) of just one row, we use the notation \(\varvec{a} = A\) and denote by \(a_i\) the i-th component of A. As before, we use the notation \(\varvec{a}_\tau \) for \(\tau \subseteq [n]\). It turns out that in the case of one row, two of our four functions coincide:

Proposition 4

For all \(\varvec{a} \in \mathbb {Z}^{1 \times n}\), one has \(\mathsf {ILR}(\varvec{a}) = \mathsf {ILC}(\varvec{a})\).

Proof

The inequality \(\mathsf {ILR}(\varvec{a}) \le \mathsf {ILC}(\varvec{a})\) is clear from the definition. We show the reverse inequality \(\mathsf {ILC}(\varvec{a}) \le \mathsf {ILR}(\varvec{a})\). Assume \(\varvec{a} \ne \varvec{0}\), since otherwise the assertion is trivial. Since \(\varvec{a}\) can be divided by \(\gcd (\varvec{a})\), there is no loss of generality in assuming \(\gcd (\varvec{a})=1\). Let \(k:=\mathsf {ILR}(\varvec{a})\). Taking into account the definition of \(\mathsf {ILR}\) and \(\mathcal {L}(\varvec{a}) = \gcd (\varvec{a}) \mathbb {Z}= \mathbb {Z}\), we obtain

$$\begin{aligned} \mathbb {Z}= \bigcup _{\tau \in \left( {\begin{array}{c}[n]\\ k\end{array}}\right) } \mathcal {L}\left( \varvec{a}_\tau \right) = \bigcup _{\tau \in \left( {\begin{array}{c}[n]\\ k\end{array}}\right) } \gcd (\varvec{a}_\tau ) \mathbb {Z}. \end{aligned}$$
(19)

We claim that \(\gcd (\varvec{a}_\tau ) =1\) holds for some \(\tau \in \left( {\begin{array}{c}[n]\\ k\end{array}}\right) \). Indeed, if \(\gcd (\varvec{a}_\tau ) \ge 2\) for all \(\tau \in \left( {\begin{array}{c}[n]\\ k\end{array}}\right) \), then the number \(z := 1 + \prod _{\tau \in \left( {\begin{array}{c}[n]\\ k\end{array}}\right) } \gcd (\varvec{a}_\tau ),\) which is relatively prime with each \(\gcd (\varvec{a}_\tau )\), does not belong to any of the sets \(\gcd (\varvec{a}_\tau ) \mathbb {Z}\) with \(\tau \in \left( {\begin{array}{c}[n]\\ k\end{array}}\right) \). This is a contradiction to (19). Thus, \(\mathsf {ILC}(\varvec{a}) \le k= \mathsf {ILR}(\varvec{a})\). \(\square \)

The following result shows how to reduce \(\mathsf {ILC}\) to \(\mathsf {ICR}\) and \(\mathsf {ICC}\).

Proposition 5

Let \(\varvec{a} \in \mathbb {Z}_{\ge 2}^{1 \times n}\) with \(\gcd (\varvec{a}) =1\) and let \(\pi := \prod _{i=1}^n a_i\) and \(\varvec{a}^+ := (\varvec{a}, - \pi ) \in \mathbb {Z}^{1 \times (n+1)}\). Then \( \mathsf {ICR}(\varvec{a}^+) = \mathsf {ICC}(\varvec{a}^+) = 1 + \mathsf {ILC}(\varvec{a}). \)

Proof

It suffices to check the validity of the inequalities

$$\begin{aligned} \mathsf {ICR}(\varvec{a}^+) \le \mathsf {ICC}(\varvec{a}^+) \le \mathsf {ILC}(\varvec{a}) + 1 \le \mathsf {ICR}(\varvec{a}^+). \end{aligned}$$

The first inequality follows directly from the definitions of \(\mathsf {ICR}\) and \(\mathsf {ICC}\). Let us show \( \mathsf {ICC}(\varvec{a}^+) \le 1 + k \) for \(k := \mathsf {ILC}(\varvec{a})\). Consider a k-element set \(\tau \subseteq [n]\) such that \(\gcd ( \varvec{a}_\tau ) = 1\). Without loss of generality, let \(\tau = [k]\) so that \(\gcd (a_1,\ldots ,a_k) = 1\). Then there exists \(z_1,\dots ,z_k \in \mathbb {Z}\) such that \( 1 = \sum _{i=1}^k z_i a_i. \) We now simultaneously show \({\text {Sg}}(\varvec{a}^+)=\mathbb {Z}\) and \(\mathsf {ICC}(\varvec{a}^+) \le k+1\). One clearly has \( y_{n+1} (-\pi ) + \sum _{i=1}^{k} y_i a_i =0 \) with \(y_i := \frac{\pi }{a_i}\) for \(i \in [k]\) and \(y_{n+1} = k\). Consequently, for every \(b \in \mathbb {Z}\), the equality \( b = N y_{n+1} a_{n+1} + \sum _{i=1}^k (N y_i + b z_i) a_i \) holds for an arbitrary \(N \in \mathbb {Z}_{>0}\). If we choose N large enough, all of the coefficients in the above representation become non-negative. This shows that every \(b \in \mathbb {Z}\) belongs to \({\text {Sg}}(\varvec{a}^+)\) and, since we have used \(k+1\) generators from \(\varvec{a}^+\) to represent b, we have \(\mathsf {ICC}(\varvec{a}^+) \le k+1\).

To conclude the proof, it remains to verify \(k+1 \le \mathsf {ICR}(\varvec{a}^+)\). It is easy to check that 1 is an element of the semigroup \({\text {Sg}}(\varvec{a}^+)=\mathbb {Z}\) that cannot be represented using at most k of the \(n+1\) generators \(a_1,\ldots ,a_n, -\pi \). Indeed, the only negative generator \(-\pi \) has to be used. If, apart from this generator, one uses at most \(k-1\) positive generators, by the definition of k, the chosen generators have the \(\gcd \) strictly larger than one, which is a contradiction. \(\square \)

We will also make use of the hardness of the set-cover problem, which is the following classical NP-complete problem, see [19, Problem: SP5]. The input of the set-cover problem consists of \(k, t \in \mathbb {Z}_{>0}\) and a family \(\mathcal {S}:= \{S_1,\ldots ,S_n\}\) of n sets with \(S_1 \cup \cdots \cup S_n = [t]\). We use \({\text {mincov}}(\mathcal {S})\) to denote the minimal cardinality of \(\tau \subseteq [n]\) such that \(\bigcup _{i \in \tau } S_i = [t]\) holds. The set-cover problem is the problem to decide whether \({\text {mincov}}(\mathcal {S}) \le k\) holds.

Proof of Theorem 5, Parts (i) and (ii)

Since \(m=1\), we use the notation \(\varvec{a} := A\). By Proposition 4, it is sufficient to consider only the three decision problems \(\mathsf {ILC}(\varvec{a}) \le k\), \(\mathsf {ICC}(\varvec{a}) \le k\) and \(\mathsf {ICR}(\varvec{a}) \le k\).

We assume \(\varvec{a} \ne \varvec{0}\). For \(\tau \subseteq [n]\), one has \(\mathcal {L}(\varvec{a}) = \gcd (\varvec{a}_\tau ) \mathbb {Z}\). Hence, \(\mathsf {ILC}(\varvec{a})\) is the minimum cardinality of a set \(\tau \) that satisfies \(\gcd (\varvec{a}) = \gcd (\varvec{a}_\tau )\). Thus, the validity of \(\mathsf {ILC}(\varvec{a}) \le k\) is certified by a set \(\tau \subseteq [n]\) with at most k elements for which \(\gcd (\varvec{a}) = \gcd (\varvec{a}_\tau )\) is fulfilled. Since the \(\gcd \) is computable in polynomial time, this shows that our decision problem is in NP. In order to show that deciding \(\mathsf {ICC}(\varvec{a})\le k\) is in NP, we can use a certificate consisting of a set \(\tau \) with \(\mathsf {ICC}(\varvec{a}_\tau ) \le k\) and solutions of problems \(\varvec{a}_\tau \varvec{x} = a_i, \ \varvec{x} \in \mathbb {Z}_{\ge 0}^{|\tau |}\), with \(i \in [n]\), that have a polynomial description size.

To prove hardness of \(\mathsf {ILC}(\varvec{a}) \le k\), we will use reduction from the set-cover problem. Consider a family \(\mathcal {S}= \{S_1,\ldots ,S_n\}\) with \(\bigcup _{i=1}^n S_i =[t]\). Since every of the t elements from [t] occurs in some of the sets \(S_1,\ldots ,S_n\) and since we have n sets in total, it is clear that the size for the input the set-cover problem is of order at least \(n+t\). The reduction is as follows. We compute the first t prime numbers \(p_1,\ldots ,p_t\). To this end, we can use a weaker version of the Prime Number Theorem, established by Chebyshev, which asserts that for \(t\ge 2\) there exists a universal constant \(c>0\), such that \(p_t \le c\, t\log _e t\), see [23, Theorem 9]. Hence, \(p_1,\ldots ,p_t\) can be found by running the sieve of Eratosthenes or some more brute-force algorithm on the range of integers \(\{1,\ldots , O(t \log _e t)\}\).

We are going to encode elements of \(\{1,\ldots ,t\}\) via the above prime numbers. Accordingly, we encode sets \(S_1,\ldots ,S_n\) via integer numbers as follows: with \(S_j\) we associate \( a_j := \prod _{i \in [t] {\setminus } S_j} p_i. \) This means that \(a_j\) is the product of those prime numbers \(p_i\) whose index i is not in \(S_j\). As the prime numbers \(p_1,\ldots ,p_t\) have a polynomial bit size in t, the numbers \(a_1,\ldots ,a_n\) can be computed in polynomial time.

Since the union of \(S_1,\ldots ,S_n\) is [t], we conclude that \(\gcd (\varvec{a})=1\) holds for \(\varvec{a} =(a_1,\ldots ,a_n) \in \mathbb {Z}^{1 \times n}\). More generally, \(\bigcup _{j \in \tau } S_i = [t]\) holds if and only if \(\gcd ( \varvec{a}_\tau ) = 1\). This shows \({\text {mincov}}(\mathcal {S}) = \mathsf {ILC}(\varvec{a})\). Thus, polynomial-time reduction \(\mathcal {S}\mapsto \varvec{a}\) converts the set-cover problem \({\text {mincov}}(\mathcal {S}) \le k\) to the problem \(\mathsf {ILC}(\varvec{a}) \le k\).

In view of Proposition 5, the computation of \(\mathsf {ILC}(\varvec{a})\) for \(\varvec{a} \in \mathbb {Z}_{\ge 2}^{1 \times n}\) satisfying \(\gcd (\varvec{a}) =1\) can be reduced, in polynomial time, to computation of \(\mathsf {ICR}\) and \(\mathsf {ICC}\) in the case of one row, by constructing the vector \(\varvec{a}^+\) out of \(\varvec{a}\). Thus, the NP-hardness of deciding \(\mathsf {ICR}(\varvec{a}) \le k\) and \(\mathsf {ICC}(\varvec{a})\le k\) follows form the NP-hardness of deciding \(\mathsf {ILC}(\varvec{a}) \le k\). \(\square \)

The exact complexity status for the analogous decision problem \(\mathsf {ICR}(\varvec{a}) \le k\) remains unresolved. It is neither clear if the latter decision problem is in NP nor is it clear if this problem is in co-NP.

6.2 Proof of Theorem 5, Parts (iii) and (iv)

We recall that a computational problem is called strongly NP-hard if it is NP-hard with respect to the unary encoding of the coefficients of the input. Applied to our setting, this means that the coefficients of \(A \in \mathbb {Z}^{m \times n}\) are given in the unary encoding. A decision problem is called strongly NP-complete if it belongs to NP (with respect to the binary encoding of the coefficients) and is strongly NP-hard.

For a set \(S \subseteq [m]\), let \(\chi _S \in \{0,1\}^m\) be the characteristic vector of S.

Lemma 10

Let \(m \in \mathbb {Z}_{>0}\) and let \(\mathcal {S}= \{S_1,\ldots ,S_\ell \}\) be a family of sets with \(\bigcup _{i=1}^\ell S_i = [m]\). Then, for the matrix

$$\begin{aligned} A := ( - p_1 \varvec{e}_1,\ldots , - p_m \varvec{e}_m, q \chi _{S_1}, \ldots , q \chi _{S_m} ) \in \mathbb {Z}^{m \times (m+ \ell )}, \end{aligned}$$

defined using \(\mathcal {S}\) and arbitrary \(m+1\) pairwise distinct prime numbers \(p_1,\ldots ,p_m\) and q, one has \( \mathsf {ILC}(A) = \mathsf {ILR}(A) = \mathsf {ICC}(A) = \mathsf {ICR}(A) = {\text {mincov}}(\mathcal {S}) +m. \)

Proof

We first show \(\gcd (A) = 1\). The minor from the first m columns of A is equal to \(p_1 \cdots p_m\) in the absolute value. For \(i \in [m]\), the minor obtained by taking the first m columns, except for the i-th one, and a column \(q \chi _{S_j}\) with a set \(S_j\) satisfying \(i \in S_j\), is equal to \(p_1 \cdots p_{i-1} p_{i+1} \cdots p_m q\) in the absolute value. The \(\gcd \) of the mentioned \(m \times m\) minors of A is 1. This shows \(\gcd (A)=1\).

To show \(\mathsf {ILC}(A) = {\text {mincov}}(\mathcal {S}) + m\), we observe that \(\mathsf {ILC}(A)\) is the smallest cardinality of \(\gamma \subseteq [m+\ell ]\), for which \(\gcd (A_\gamma )=1\) holds. Every such sub-matrix \(A_\gamma \) of A contains the first m columns of A. Indeed, if the column \(-p_i \varvec{e}_i\) is missing, then the i-th row of \(A_\gamma \) is divisible by q, which implies that \(\gcd (A_\gamma )\) is divisible by q, a contradiction. Consider the sub-family \(\mathcal {S}'\) of \(\mathcal {S}\), consisting of all \(S \in \mathcal {S}\), for which the column \(q \chi _S\) occurs in \(A_\gamma \). The sub-family \(\mathcal {S}'\) covers [m]. Indeed, otherwise there would exist an \(i \in [m]\) not covered by any element of \(\mathcal {S}'\). But then, the i-th row of \(A_\gamma \) would be divisible by \(p_i\), which would imply that \(\gcd (A_\gamma )\) is divisible by \(p_i\), a contradiction. The above arguments show \(\gcd (A) = 1\) and \(\mathsf {ILC}(A) ={\text {mincov}}(\mathcal {S})+m\).

The equality \(\gcd (A) = 1\) can be phrased as \(\mathcal {L}(A)= \mathbb {Z}^m\). Since the columns of A positively span \(\mathbb {R}^m\), we obtain \({\text {Sg}}(A) = \mathcal {L}(A) = \mathbb {Z}^m\). The equality \({\text {Sg}}(A) = \mathcal {L}(A)\) implies \(\mathsf {ICC}(A) \ge \mathsf {ILC}(A)\). To see that the equality \(\mathsf {ICC}(A) = \mathsf {ILC}(A)\) is true, just observe that for every matrix \(A_\gamma \) satisfying \(\gcd (A_\gamma )=1\), which we have analyzed above, the columns of \(A_\gamma \) positively span \(\mathbb {R}^m\). This means \({\text {Sg}}(A_\gamma ) = \mathcal {L}(A_\gamma ) = \mathbb {Z}^m\) and implies the equality \(\mathsf {ICC}(A) = \mathsf {ILC}(A)\).

From \({\text {Sg}}(A) = \mathcal {L}(A)\) and \(\mathsf {ILC}(A) = \mathsf {ICC}(A)\), we easily obtain \(\mathsf {ILR}(A) \le \mathsf {ICR}(A) \le \mathsf {ICC}(A) = \mathsf {ILC}(A)\), because \(\mathsf {ILR}(A) \le \mathsf {ILC}(A), \mathsf {ICR}(A) \le \mathsf {ICC}(A)\) and every representation of \(\varvec{b} \in {\text {Sg}}(A)\) as a non-negative integer linear combination of the columns of A is also a representation of \(\varvec{b} \in \mathcal {L}(A)\) as an integer linear combination of the columns of A. Thus, to conclude the proof, it suffices to verify \(\mathsf {ILR}(A) \ge {\text {mincov}}(\mathcal {S}) + m\). We use \(\varvec{b} : = (1,\ldots ,1)^\top \in \mathbb {Z}^m\). Consider a sub-matrix \(A_\gamma \) of A, for which the equation \(A_\gamma \varvec{x} = \varvec{b}\) has an integer solution \(\varvec{x}\). Then \(A_\gamma \) contains each of the m columns \(-p_i \varvec{e}_i\): if the column \(- p_i \varvec{e}_i\) is missing, then the coefficients of the left-hand side of the i-th equation of the system \(A_\gamma \varvec{x} = \varvec{b}\) are divisible by q, while the right-hand side coefficient is 1, which contradicts the solvability of the system. Let \(\mathcal {S}'\) be the sub-family of \(\mathcal {S}\) consisting of those S, for which the column \(q \chi _S\) occurs in \(A_\gamma \). The sets of the family \(\mathcal {S}'\) cover [m]: if some element \(i \in [m]\) was not in any of the sets \(S \in \mathcal {S}'\), then the coefficients on the left-hand side of the i-th equation of the system \(A_\gamma \varvec{x} = \varvec{b}\) would be divisible by \(p_i\), while the the right-hand side coefficient is 1, which again contradicts the solvability of the system. \(\square \)

Proof of Theorem 5, Parts (iii) and (iv)

We derive the strong NP-hardness of all four problems by means of Lemma 10, which helps to construct a polynomial-time reduction from the set-cover problem. Consider a family \(\mathcal {S}\) of subsets of [m] that cover [m]. We want to reduce verification of \({\text {mincov}}(\mathcal {S}) \le k\) to the verification of any of the four inequalities in the assertion. Our reduction is the map \(\mathcal {S}\mapsto A\), described in Lemma 10, for which we fix the prime numbers \(p_1,\ldots ,p_m,q\) to be the first \(m+1\) prime numbers. As in the first part of the proof, these prime numbers can be computed in polynomial time in the size of \(\mathcal {S}\), which means that the respective map \(\mathcal {S}\mapsto A\) is computable in polynomial time. Furthermore, the first \(m+1\) prime numbers are of order \(O(m \log _e m)\), which implies that the unary encoding of A has a polynomial size in \(\mathcal {S}\). In view of Lemma 10, we obtain the desired hardness assertions, as verifying \({\text {mincov}}(\mathcal {S}) \le k\) is reduced to verifying \(\mathsf {ILC}(A) \le m+k\), where \(\mathsf {ILC}(A) = \mathsf {ILR}(A)=\mathsf {ICC}(A) = \mathsf {ICR}(A)\).

To show that \(\mathsf {ILC}(A) \le k\) is strongly NP-complete, we need to check that this problem is in NP with A represented in the binary encoding. Clearly, a set \(\gamma \subseteq [n]\) with at most k elements satisfying \(\mathcal {L}(A_\gamma ) = \mathcal {L}(A)\) can be used as a certificate for \(\mathsf {ILC}(A) \le k\). The verification of \(\mathcal {L}(A_\gamma ) = \mathcal {L}(A)\) for a given A and \(\gamma \) can be carried in polynomial time: it suffices to check that each column of A is in \(\mathcal {L}(A_\gamma )\). Each such check can be done by solving a system of linear Diophantine equations.

The NP-completeness of \(\mathsf {ICC}(A) \le k\) is proved as in the case \(m=1\) of one row. Let \(\varvec{a}_1,\ldots , \varvec{a}_n\) be the columns of A. The certificate consists of a set \(\tau \subseteq [n]\) with at most k elements and solutions of problems \(A_\tau \varvec{x} = \varvec{a}_i, \ \varvec{x} \in \mathbb {Z}_{\ge 0}^{|\tau |}\), with \(i \in [n]\), that have a polynomial description size. \(\square \)

6.3 Proof of Theorem 6

Presburger arithmetic is the first-order theory of the integer numbers with addition (but no multiplication) and the usual order \(\le \). A Presburger statement is a quantified expression of the form

$$\begin{aligned} Q_1 x_1 \in \mathbb {Z}\cdots Q_k x_k \in \mathbb {Z}\ : \ \varPhi (x_1,x_2,\dots ,x_k), \end{aligned}$$

where \(Q_1,\ldots ,Q_k \in \{ \forall , \exists \}\) are quantifiers over integer variables \(x_1,\ldots ,x_k\) and \(\varPhi (x_1,\dots ,x_k)\) is a Boolean combination of linear inequalities with integer coefficients in the variables \(x_1,\ldots ,x_k\). In the 1920’s Presburger showed that there is an algorithm, based on elimination of quantifiers, to verify the validity of such statements. But it is also known that deciding general Presburger statements is much harder than deciding NP-complete problems. For example, for statements with a fixed number i of quantifier alternations that start with an existential quantifier the following is known: deciding such statements is complete for the level \(\varSigma _{i-1}^{{\text {EXP}}}\) of the exponential hierarchy for \(i \ge 2\) (see [22, Sect. 5]) and complete for the level \(\varSigma _{i-2}^P\) of the polynomial hierarchy when \(i \ge 3\) and, additionally, the number of variables k and the number of Boolean operations used in \(\varPhi \) is also fixed (see [27]).

Proof of Theorem 6

Using Hermite Normal Form of A with respect to the row transformations, we can reduce the general case to the case of A having full row rank. In particular, this means \(m \le n\). It is clear that one can express the condition \(\mathsf {ILR}(A) \le k\) and \(\mathsf {ICR}(A) \le k\) as the Presburger statement

$$\begin{aligned} \forall \varvec{x} \in D^n \ \exists \varvec{y} \in D^k : \bigvee _{\tau \in \left( {\begin{array}{c}[n]\\ k\end{array}}\right) } (A \varvec{x} = A_\tau \varvec{y}), \end{aligned}$$
(20)

with \(D= \mathbb {Z}\) and \(D =\mathbb {Z}_{\ge 0}\), respectively. Note that, though in our definition of a Presburger statement the quantified variables have values in \(\mathbb {Z}\), it is easy to model quantified variables from \(\mathbb {Z}_{\ge 0}\) via a slight reformulation: For example: \(\forall x \in \mathbb {Z}_{\ge 0} : \varPhi (x)\) can also be formulated as \(\forall x \in \mathbb {Z}: ((x \ge 0) \Rightarrow \varPhi (x))\). When n is fixed, (20) is a so-called short Presburger formula, which means the number of quantified variables as well as the number of Boolean operations used in the formula are fixed. For our formula, we can assume \(k \le n\) because both \(\mathsf {ILR}(A)\) and \(\mathsf {ICR}(A)\) are at most n. Thus, the number of quantified variables is at most 2n. The number of disjunctions used is at most \(2^n\). Each system \(A \varvec{x} =A_\tau \varvec{y}\) is a conjunction of m equalities, which means that we have used at most \(m 2^n \le n 2^n\) conjunctions. It is known that short Presburger statements with one quantifier alternation are solvable in polynomial time. This is explicitly stated as Theorem 1.9 in [28], where the authors of [28] refer to the work of Woods [34] and their own work [27]. We note that the proofs from [27, 34] rely on the algorithmic theory of generating functions (See [5, 7, 8, 14]). Since the short statement (20) has one quantifier alternation we obtain the polynomial-time solvability of the problems \(\mathsf {ILR}(A) \le k\) and \(\mathsf {ICR}(A) \le k\) when n is fixed. Consequently, for computing \(\mathsf {ILR}(A)\) and \(\mathsf {ICR}(A)\) it suffices to check the validity of n respective short Presburger statements (20) that arise by choosing n possible \(k \in [n]\).

We now show that \(\mathsf {ILC}(A)\) and \(\mathsf {ICC}(A)\) are computable in polynomial time, too, when n is fixed. For \(\mathsf {ILC}(A)\) this is easy to see. It suffices to determine all subsets \(\gamma \subseteq [n]\) for which \(\mathcal {L}(A) = \mathcal {L}(A_\gamma )\) holds, where the verification of \(\mathcal {L}(A) = \mathcal {L}(A_\gamma )\) can be reduced to solving n systems of Diophantine equations. \(\mathsf {ILC}(A)\) is the minimum cardinality among all such subsets. For \(\mathsf {ICC}(A)\), one can determine all subsets \(\gamma \subseteq [n]\) for which \({\text {Sg}}(A) = {\text {Sg}}(A_\gamma )\) holds. For checking \({\text {Sg}}(A) = {\text {Sg}}(A_\gamma )\) one can check whether each column of A belongs to \({\text {Sg}}(A_\gamma )\). Each such check is reduced to solving a feasibility problem of linear integer programming in at most n variables and thus can be done in polynomial time, as n is fixed (see Section 18.4 in [32]). \(\square \)