Sparse representation of vectors in lattices and semigroups

We study the sparsity of the solutions to systems of linear Diophantine equations with and without non-negativity constraints. The sparsity of a solution vector is the number of its nonzero entries, which is referred to as the ℓ0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _0$$\end{document}-norm of the vector. Our main results are new improved bounds on the minimal ℓ0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _0$$\end{document}-norm of solutions to systems Ax=b\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A\varvec{x}=\varvec{b}$$\end{document}, where A∈Zm×n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A\in \mathbb {Z}^{m\times n}$$\end{document}, b∈Zm\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{b}}\in \mathbb {Z}^m$$\end{document} and x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varvec{x}$$\end{document} is either a general integer vector (lattice case) or a non-negative integer vector (semigroup case). In certain cases, we give polynomial time algorithms for computing solutions with ℓ0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ell _0$$\end{document}-norm satisfying the obtained bounds. We show that our bounds are tight. Our bounds can be seen as functions naturally generalizing the rank of a matrix over R\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {R}$$\end{document}, to other subdomains such as Z\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathbb {Z}$$\end{document}. We show that these new rank-like functions are all NP-hard to compute in general, but polynomial-time computable for fixed number of variables.


Introduction
Given a matrix A ∈ R m×n and a vector b ∈ R m , we study the sparsity of solutions to the system of linear equations Ax = b, in variables x 1 , . . . , x n restricted to a structured domain D ⊆ R. The sparsity of these solutions is quantified via the 0 -norm, which is the size x 0 := | supp(x)| of the support supp(x) := {i : x i = 0} of the vector x. The sparsest solutions are optimal solutions of the optimization problem When D = R and D = R ≥0 , the tight upper bounds on (1) in terms of A are given by the rank of A, which follows from basic linear algebra and the well-known Carathéodory's theorem from convexity, respectively. Nevertheless, even when D = R, computation of (1) for given A and b is NP-hard [26]. The 0 -norm minimization problem (1) is central in the theory of the compressed sensing, where for the classical choice D = R an appropriate linear programming relaxation of (1) provides a guaranteed approximation [9,11,12]. In the present paper, we deal with the two discrete domains, D = Z and D = Z ≥0 , which are naturally related to the theory of systems of linear Diophantine equations and integer linear programming, respectively.
Sparsity of solutions to linear Diophantine equations is relevant for the theory of compressed sensing for integer-valued signals [17,18,24], motivated by many applications in which the signal is known to have integer entries, for instance, in wireless communication [31] and in the theory of error-correcting codes [10]. Support minimization was also investigated in connection to integer optimization [2,16,29,30]. Also, numerous applications to combinatorial optimization problems have been explored. For example, the minimum edge-coloring problem can be seen as finding the sparsest representation in the semigroup generated by the matchings of the graph [13,25]. Further examples of combinatorial applications can be found in [4] and [16].
Since we know that for D = R, the sparsity of solution is captured by the notion of the rank of A, we introduce a similar notion with respect to an arbitrary underlying domain D. We define the D-rank of A as max y∈D n min{ x 0 : Ax = A y, x ∈ D n }.
In this respect, note that the computation of the D-rank is a bi-level optimization problem. There is yet another natural generalization of the notion of rank in our setting. For that let [n] := {1, . . . , n}, let [n] k be the set of all k-element subsets of [n], and for γ ∈ [n] k let A γ denote the m × k submatrix of A with columns indexed by γ . Then the D-complexity of A is defined as the minimal k ∈ Z ≥0 such that there exists a τ ∈ [n] k for which the following equality holds {Ax : x ∈ D n } = A τ y : y ∈ D k . It is clear that the D-rank is bounded from above by the D-complexity.
As examples, note that when the domain D = R, both D-rank and D-complexity coincide with the rank of the matrix A from linear algebra. For D = R ≥0 , the D-rank is again the regular rank of A, but the D-complexity is in general larger. If the columns of A positively span a pointed cone, then the D-complexity is the number of extreme rays of this cone.
In this paper we specialize the above two functions to the two domains D = Z and D = Z ≥0 , which yields a natural geometric interpretation in terms of lattices and semigroups. First, the matrix A determines the lattice L(A) := {Ax : x ∈ Z n } generated by the columns of A. Secondly, the matrix A determines the semigroup S}(A) := {Ax : x ∈ Z n ≥0 }. Note that this set consists of all right-hand-side vectors b, for which the system Ax = b, x ∈ Z n ≥0 of integer-programming constraints on x is feasible. We obtain the following four functions: In our results we deal with an integer matrix A ∈ Z m×n , and, without loss of generality, A is assumed to have a full row rank. In this case, we have that the See for example [30,Section 1.3]. For a general introduction to lattices see [21].

Bounds for ILR(A) and ILC(A)
For stating our results, we need several number-theoretic functions. Given z ∈ Z >0 , consider the prime factorization z = p s 1 1 · · · p s k k with pairwise distinct prime factors p 1 , . . . , p k and their multiplicities s 1 , . . . , s k ∈ Z >0 . Then the number of prime factors k i=1 s i counting the multiplicities is denoted by Ω(z). Furthermore, we introduce That is, by introducing m we set a threshold to account for multiplicities. In the case m = 1 we thus have ω(z) := Ω 1 (z) = k, which is the number of prime factors in z, not taking the multiplicities into account. The functions Ω and ω are called prime Ω-function and prime ω-function, respectively, in number theory [23]. We call Ω m the truncated prime Ω-function.

Theorem 1 Let A ∈ Z m×n be a matrix of rank m. Then
One can easily see that the estimates ω(z) ≤ Ω m (z) ≤ Ω(z) ≤ log 2 (z) hold for every z ∈ Z >0 . The estimate using log 2 (z) gives a first impression on the size of the bound (2). It turns out, however, that Ω m (z) is much smaller on the average. Results in number theory [23, §22.10] show that the average values 1 z (ω(1) + · · · + ω(z)) and 1 z (Ω(1) + · · · + Ω(z)) are of order log(log(z)), as z → ∞. In Proposition 1 from Sect. 5, we show that (2) is an optimal bound on both ILR(A) and ILC(A), in the sense that neither m can be replaced by any smaller constant nor the function Ω m occurring on the right-hand side can be replaced by any smaller function. Furthermore, as a byproduct of our constructive proof of Theorem 1, we obtain the following.

Corollary 1 Let A ∈ Z m×n be a matrix of full row rank and let A τ be a non-singular sub-matrix of A, where τ ∈ [n]
m . If the system Ax = b, x ∈ Z n with a right-hand side b ∈ Z m is feasible, then this system has a solution x that satisfies For the input A, τ and b in binary encoding, such a solution x can be computed in polynomial time.
We show that, under natural assumptions on A, we can significantly improve this bound. First, we consider matrices A whose columns positively span R m . Theorem 1 can be used in this case to obtain the following result.
Theorem 2 Let A ∈ Z m×n be a matrix whose columns positively span R m . Then Both the general bound (3) and our bound (5) have the first term linearly depending on m and the second term depending on the m × m minors of A scaled by gcd(A). Thus, taking into account | det(A τ )| ≤ det(A A ) and Ω m (z) ≤ log 2 (z), we see that the second term in (5) is not larger than the second term in the bound (3): in fact, the second term in (5) is much smaller "on the average". As for (2), we show in Proposition 2 from Sect. 5 that, under the given assumptions on A, the bound (5) is optimal.
In the knapsack case m = 1, the bound (5) strengthens Theorem 1.2 in [3] and, as it was already indicated in the IPCO version of this paper [1], it confirms a conjecture posed in [3, page 247]. Moreover, as a byproduct of the proof of Theorem 2 we obtain the following algorithmic result.

Corollary 2
Let A ∈ Z m×n be a matrix whose columns positively span R m and let A τ be an m × m non-singular sub-matrix of A, where τ ∈ [n] m . If the feasibility problem Ax = b, x ∈ Z n ≥0 of integer programming with a right-hand side b ∈ Z m has a solution, then it has a solution satisfying For the input A, τ and b in binary encoding, such a solution x can be computed in polynomial time.
Our next contribution gives an improvement on (3) for the case when the columns of A generate a pointed cone. Given a 1 , . . . , a n ∈ R m , we denote by cone(a 1 , . . . , a n ) the convex conic hull of the set {a 1 , . . . , a n }. Assume that the matrix A = (a 1 , . . . , a n ) ∈ Z m×n with columns a i satisfies the following conditions: cone(a 1 , . . . , a n ) is an m-dimensional pointed cone, (7) cone(a 1 )is an extreme ray of cone(a 1 , . . . , a n ).
Theorem 3 Let A = (a 1 , . . . , a n ) ∈ Z m×n satisfy (6)- (8). Then where q(A) := In view of (4), the bound (9) improves on (3) by reducing the sum over all I ∈ [n] m to the sum over those I that satisfy 1 ∈ I . The proof of Theorem 3 will be derived as an extension of the proof of our next result in the setting of the knapsack scenario m = 1. In this setting, A = a is a row vector and the assumption (7) is equivalent to a ∈ Z 1×n >0 ∪ Z 1×n <0 . Without loss of generality, one can assume a ∈ Z 1×n >0 .

Computational complexity
It is well known that the feasibility problem in integer linear programming is NPcomplete (see [32, § 18.2]), which means that testing whether the sparsity optimization problem (1) is feasible is hard in the case D = Z n ≥0 . But even in the cases when testing feasibility is tractable, solving (1) is usually hard due to the hardness of the 0 -norm as an objective. For example, computation of (1) is NP-hard for D = R n (see [26]). In Sect. 1.3 we study the complexity of computing our four rank-like functions ILR(A), ILC(A), ICR(A) and ICC(A). We would like to emphasize that the complexity analysis of these functions is more intricate than the respective analysis of (1).

Theorem 5
Consider the four problems of verifying ILC(A) ≤ k, ILR(A) ≤ k, ICR(A) ≤ k, and ICC(A) ≤ k, for given A ∈ Z m×n and k ∈ Z >0 . These problems have the following complexity: Finally, we want to address the case when the number of variables n is fixed. It is easy to see that the optimization problem (1) can be solved in polynomial time for both D = Z n and D = Z n ≥0 . For a fixed n, all 2 n possible choices of the support for the vector x can be enumerated. For each such choice, the existence of the vector x with Ax = b, x ∈ D n and the prescribed support can be checked in polynomial time: for D = Z n one needs so solve a Diophantine system, while for D = Z n ≥0 one uses polynomial-time solvability of integer linear problems in fixed dimension [32, § 18.4]. Rather similarly, one can also establish polynomial-time solvability of ILC(A) and ICC(A) for a fixed n. In contrast to this, since the D-ranks are related to bi-level programming, the study of the computational complexity of ILR(A) and ICR(A) in the case of fixed n requires a more involved algorithmic theory, which has been developed only recently. Using recent results in the algorithmic theory of Presburger arithmetic, we obtain the following.

Proofs of Theorem 1 and Corollary 1
The proof of Theorem 1 relies on the theory of finite Abelian groups (see [15] for a general reference). We write Abelian groups additively. An Abelian group G is a direct sum of its finitely many subgroups G 1 , . . . , G m , which is written as G = m i=1 G i , if every element x ∈ G has a unique representation as x = x 1 + · · · + x m with x i ∈ G i for each i ∈ [m]. A primary cyclic group is a non-zero finite cyclic group whose order is a power of a prime number. We use G/H to denote the quotient of G modulo its subgroup H .
The fundamental theorem of finite Abelian groups states that every finite Abelian group G has a primary decomposition, which is essentially unique. This means, G is decomposable into a direct sum of its primary cyclic subgroups and that this decomposition is unique up to automorphisms of G (see Theorems 3 and 5 in Chapter 5.2 of [15], with further details in 12.1). We denote by κ(G) the number of direct summands in the primary decomposition of G.
For a subset S of a finite Abelian group G, we denote by S the subgroup of G generated by S. We call a subset S of G non-redundant if the subgroups T generated by proper subsets T of S are properly contained in S . The following result gives an upper bound on the maximum cardinality of S.

Theorem 7 Let G be a finite Abelian group. Then the maximum cardinality of a nonredundant subset S of G is equal to κ(G).
Even though this result is available in the literature (see, for example, [20,Lemma A.6]), it does not seem to be well known, and we have not found any source containing a complete self-contained proof of this result. Thus, we provide a proof of Theorem 7 in the Appendix, relying only on the basic facts from group theory. We will also need the following lemmas. Proof Let us consider the prime factorization |G| = p n 1 1 · · · p n s s . Then |G j | = p n i, j 1 · · · p n i, j s with 0 ≤ n i, j ≤ n i and, by the Chinese Remainder Theorem, the cyclic group G j can be represented as

Lemma 1 Let G be a finite Abelian group representable as a direct sum G =
This is a decomposition of G into a direct sum of primary cyclic groups and, possibly, some trivial summands G i, j equal to {0}. We can count the non-trivial direct summands whose order is a power of p i , for a given i ∈ [s]. There is at most one summand like this for each of the groups G j . So, there are at most m non-trivial summands in the decomposition whose order is a power of p i . On the other hand, the direct sum of all non-trivial summands whose order is a power of p i is a group of order p n i,1 +···+n i,s i = p n i i so that the total number of such summands is not larger than n i , as every summand contributes the factor at least p i to the power p n i i . Thus, the total number of non-zero summands in the decomposition of G is at most s i=1 min{m, n i } = Ω m (|G|). We introduce the quotient group and G is a direct sum of at most m cyclic groups, as every d i > 1 determines a non-trivial direct summand.
To conclude the proof, it suffices to show that G is isomorphic to G. To see this, The following lemma allows us to reduce considerations to the case gcd(A) = 1, without affecting the sparsity.

Lemma 3
Let A ∈ Z m×n have a full row rank and let M ∈ Z m×m be a matrix whose columns form a basis for L(A). Then the following holds: The matrix A τ gives rise to the lattice Λ : (by Lemmas 1 and 2).
In this notation, equality Let us now show that γ can be determined in polynomial time. It is enough to determine the set I , which defines the non-redundant subset S = {φ(a i ) : i ∈ I } of Z m /Λ. Start with I = {m + 1, . . . , n} and iteratively check if some of the elements φ(a i ) ∈ Z m /Λ, where i ∈ I , is in the group generated by the remaining elements. Suppose j ∈ I and we want to check if φ(a j ) is in the group generated by all φ(a i ) with i ∈ I \{ j}. Since Λ = L(A τ ), this is equivalent to checking a j ∈ L(A I \{ j}∪τ ) and is thus reduced to solving a system of linear Diophantine equations with the left-hand side matrix A I \{ j}∪τ and the right-hand side vector a j (such systems can be solved in polynomial time by [32,Corollary 5.3b]). Thus, carrying out the above procedure for every j ∈ I and removing j from I whenever a j ∈ L(A I \{ j}∪τ ), we eventually arrive at a set I that determines a non-redundant subset S of Z m /Λ. This is done by solving at most n − m linear Diophantine systems in total, where the matrix of each system is a sub-matrix of A and the right-hand side vector of the system is a column of A.

Proof of Theorem 1 Consider an arbitrary τ ∈ [n]
m , for which the matrix A τ is non-singular, and the respective γ as in Lemma 4. One has This solution can be computed by solving the Diophantine system with the left-hand side matrix A γ and the right-hand side vector b.

Proofs of Theorem 2 and Corollary 2
Lemma 5 Let A ∈ Z m×n be a matrix whose columns positiely span R m and A τ be and L(A) = L(A I ) such that the columns of A I positively span R m . For the input A and τ in binary encoding, such I can be computed in polynomial time.
Proof Consider γ as in Lemma 4. Let a 1 , . . . , a n be the columns of A. Since the matrix A τ is non-singular, the m vectors a i , where i ∈ τ , together with the vector v = − i∈τ a i positively span R n . Since all columns of A positively span R n , the conic version of Carathéodory's theorem implies the existence of a set β ⊆ [n] with |β| ≤ m, such that v is in the conic hull of {a i : i ∈ β}. Consequently, the set {a i : i ∈ β ∪ τ } and, by this, also the larger set {a i : i ∈ β ∪ γ } positively span R m . Thus, in view of Lemma 4, the structural part of the assertion holds I = β ∪ γ .
It remains to show the algorithmic part of the assertion. In view of Lemma 4, one can construct γ in polynomial time. To determine I , we need to construct β in polynomial time. Start with β = [n] and iteratively reduce β as follows. Using a polynomial-time algorithm for linear optimization, check if after a removal of one of the elements from β, the vector v is still in the conic hull of {a i : i ∈ β}. This procedure does at most n iterations. By Carathéodory's theorem, after the termination, the system of vectors a i with i ∈ β is linearly independent.

Lemma 6
Let A ∈ Z m×n be a matrix, whose columns positively span R m . Then . If A and b are given in binary encoding, then a solution to Ax = b, x ∈ Z n ≥0 can be constructed in polynomial time.
Proof Since the columns of A positively span R m , the feasibility problem A y = 0, y ∈ Q n ≥1 of linear programming has a solution y. One can determine such a solution y in polynomial time using a polynomial-time algorithm of linear optimization. The description size of y is polynomial. Thus, one can re-scale y to clear denominators in polynomial time, and arrive at a vector y ∈ Z n ≥1 satisfying A y = 0. Now, if b ∈ L(A). Then we first solve the Diophantine system Ax = b, x ∈ Z n in polynomial time and determine one solution x * of this system. But then the vector x = x * + x * ∞ y is a non-negative integer solution of Ax = b. This verifies L(A) = Sg(A) and shows the algorithmic part of the assertion.

Proofs of Theorem 2 and Corollary 2 Choose τ ∈ [n]
m such that A τ is non-singular and consdier I as in Lemma 5. In view of Lemma 6, one has L(A I ) = Sg(A I ).
To show Corollary 2, observe that by Lemma 5, the set I can be constructed in polynomial time. To conclude the proof, it suffices to apply the algorithmic part of Lemma 6 to the sub-matrix A I .

Proofs of Theorems 3 and 4
To motivate Theorem 3 and its proof, we start this section by giving a self-containing proof of Theorem 4 which contains the key ideas to the proof of Theorem 3 while being less technical.
The following lemma is a key to the proof of Theorem 4.
in the unknowns y 1 , . . . , y t has a solution that is not identically equal to zero.
Proof If X is a convex set whose affine hull has dimension at most k, then we use vol k (X ) to denote the k-dimensional volume of X . Consider the convex set Y ⊆ R t defined by 2t strict linear inequalities −1 <y 1 a 1 + · · · + y t a t < 1, −2 <y i < 2 for all i ∈ {2, . . . , t}.
Clearly, the set Y is the interior of a hyper-parallelepiped and can also be described as It is easy to see The assumption t > 1+log 2 (a 1 ) implies that the volume of Y is strictly larger than 2 t . Thus, by Minkowski's first theorem [6, Ch. VII, Sect. 3], the set Y contains a non-zero integer vector y = (y 1 , . . . , y t ) ∈ Z t . Without loss of generality we can assume that y 1 ≥ 0 (if the latter is not true, one can replace y by − y). The vector y is a desired solution from the assertion of the lemma.

Proof of Theorem 4
By Lemma 3, we may assume that gcd(a) = 1. Further, without loss of generality, let a 1 = min{a 1 , . . . , a n }. Let b ∈ Sg(a). By the definition of ICR(a), the integer Carathéodory rank, we need to show the existence of solution to ax = b, x ∈ Z n ≥0 satisfying x 0 ≤ 1 + log 2 (a 1 ). Choose a solution x = (x 1 , . . . , x n ) with the property that the number of indices i ∈ {2, . . . , n} for which x i = 0 is minimized. Without loss of generality, we can assume that, for some t ∈ {2, . . . , n} one has x 2 > 0, . . . , x t > 0, x t+1 = · · · = x n = 0. We claim that Lemma 7 implies t ≤ 1 + log 2 (a 1 ). In fact, if the latter was not true, then a solution y ∈ R t of the system in Lemma 7 could be extended to a solution y ∈ R n by appending zero components. It is clear that some of the components y 2 , . . . , y t are negative, because a 2 > 0, . . . , a t > 0. It then turns out that, for an appropriate choice of k ∈ Z ≥0 , the vector x = (x 1 , . . . , x n ) = x + k y satisfies ax = b and x 1 ≥ 0, . . . , x t ≥ 0, x t+1 = · · · = x n = 0 and x i = 0 for at least one i ∈ {2, . . . , t}. Indeed, one can choose k to be the minimum among all x i with i ∈ {2, . . . , t} and y i = −1.
The existence of x with at most t − 1 non-zero components x i with i ∈ {2, . . . , n} contradicts the choice of x and yields the assertion.
To prove Theorem 3 we need two auxiliary lemmas, of which one generalizes Lemma 7. In what follows, we will denote the linear hull of X ⊆ R m by lin(X ). A = (a 1 , . . . , a n ) ∈ Z m×n satisfy (6)- (8).

Lemma 8 Let
To conclude the proof, we need to choose u m appropriately. We extend the basis a basis u 1 , . . . , u m−1 , v of L(A). We will fix with an appropriate N ∈ Z >0 .
Since lin(F) is a supporting hyperplane of cone(a 1 , . . . , a n ), we can assume, after possibly exchanging the roles of v and −v, that v and cone(a 1 , . . . , a n ) are on the same side of the hyperplane lin(F). Every vector a i can be expressed as where β i, j ∈ Z and β i ∈ Z ≥0 . Furthermore, if a i / ∈ F, then, in view of (6), β i > 0 and, thus, using the representation we see that, for N large enough, one has β i, j + N β i > 0 so that a i ∈ H for all vectors a i that do not belong to F. The vectors a i that belong to F are in H by the choice of u 1 , . . . , u m−1 .
It is easy to see that This means, that the assumptions of Minkowski's first theorem are fulfilled when Clearly, (17) can be written as We show that (13) and (18) which verifies the equivalence of (13) and (18). Now, consider a non-zero lattice vector y = (y 1 , . . . , y n ) in C ∩ (Z × Λ). By the choice of C and Λ, the vector y is a solution of A y = 0 and by this also a solution of A y = 0 and, furthermore, we have y 2 , . . . , y n ∈ {−1, 0, 1}. Possibly replacing y with − y, we can ensure that y 1 ≥ 0. Since the equation α 1 y 1 + · · · + α n y n = 0 (contained in the system A y = 0) has positive coefficients and since y 1 ≥ 0, we conclude that at least one of the variables y 2 , . . . , y n of our solution y is negative. Thus, our solution y satisfies the assertions of the lemma.

Proof of Theorem 3 It is sufficient to show that any feasible problem
≥0 with the matrix A satisfying assumptions (6) (a 1 , . . . , a s ). Since a 1 , . . . , a n linearly span R m , among the vectors a s+1 , . . . , a n one can choose linearly independent columns that together with a basis of V form a basis of R m . Without loss of generality, let a s+1 , . . . , a t be such vectors, that is, one has V ⊕ W = R m with W := lin(a s+1 , . . . , a t ). In the degenerate case V = R m , we just fix t = s and W = {0}. We show that  (a 1 , . . . , a t ) in place of A = (a 1 , . . . , a n ).
This shows that one of the values y 2 , . . . , y s is equal to −1. We convert y ∈ Z t to a vector in y ∈ Z n by appending zero components.
Clearly, x = x * + k y is a solution of Ax = b, and if we choose k to be the minimum among the values x * i , where i ∈ {2, . . . , s} and y i = −1, then x = x * + k y is a solution of Ax = b, x ∈ Z n ≥0 , for which the number of indices i ∈ {2, . . . , n} satisfying x * i = 0 is smaller than for the solution x * of Ax = b, x ∈ Z n ≥0 . This contradicts the choice of x * and shows that

Optimality of the bounds
In this section we prove a series of three propositions that show, respectively, that the Theorems 1, 2 and 3 are optimal. For this we introduce the following notation. For an integer z ∈ Z >0 with prime factorization z = p s 1 1 · · · p s k k , where the distinct prime factors p 1 , . . . , p k have the multiplicities s 1 , . . . , s k ∈ Z >0 , we define the set The elements of S(z) are relatively prime, but every non-empty proper subset of S(z) has a common divisor larger than one. The set S(z) has ω(z) elements. If z is a prime number, we have S(z) = {1}.

Proposition 1
Let m ∈ Z >0 and let F : Z >0 → Z >0 be a function providing the bound : Δ nonzero m × m minor of A for all n ∈ Z ≥m and all matrices A ∈ Z m×n of full row rank. Then F(z) ≥ m + Ω m (z) holds for every z ∈ Z >0 .
Proof Let z ∈ Z >0 . We need to show F(z) ≥ m + Ω m (z). For z = 1, this reduces to showing F(z) ≥ m. The latter is clear, because the matrices A in the formulation of the assertion have rank m. We now consider the case z ≥ 2. We decompose z as z = z 1 · · · z m into m factors z 1 , . . . , z m ∈ Z >0 as follows. Let α p denote the multiplicity of the prime number p in the prime factorization of z, i.e., z = p prime p α p . Then we define for i = 1, . . . , m − 1 Let q be prime such that gcd(z, q) = 1. Consider the sets S (z 1 q), . . . , S(z m q). Note that, by construction, z i ∈ S(z i q) for every i ∈ [m]. Finally, we fix A to be a matrix of size m × n with n = for all n ∈ Z ≥m and all matrices A = (a 1 , . . . , a n ) ∈ Z m×n that satisfy conditions (6)- (8). Then F(z) ≥ m + log 2 (z) holds for every z ∈ Z >0 .

Results on computational complexity
In this final section, we explore the computational complexity of the functions ILR(A), ILC(A), ICR(A) and ICC(A). We begin with the hardness results of Theorem 5.

Proof of Theorem 5, Parts (i) and (ii)
In the case m = 1 of just one row, we use the notation a = A and denote by a i the i-th component of A. As before, we use the notation a τ for τ ⊆ [n]. It turns out that in the case of one row, two of our four functions coincide: We claim that gcd(a τ ) = 1 holds for some τ ∈ [n] k . Indeed, if gcd(a τ ) ≥ 2 for all τ ∈ [n] k , then the number z := 1 + τ ∈( [n] k ) gcd(a τ ), which is relatively prime with each gcd(a τ ), does not belong to any of the sets gcd(a τ )Z with τ ∈ [n] k . This is a contradiction to (19). Thus, ILC(a) ≤ k = ILR(a).
The following result shows how to reduce ILC to ICR and ICC.
The first inequality follows directly from the definitions of ICR and ICC. Let us show ICC(a + ) ≤ 1 + k for k := ILC(a). Consider a k-element set τ ⊆ [n] such that gcd(a τ ) = 1. Without loss of generality, let τ = [k] so that gcd(a 1 , . . . , a k ) = 1. Then there exists z 1 , . . . , z k ∈ Z such that 1 = k i=1 z i a i . We now simultaneously show Sg(a + ) = Z and ICC(a + ) ≤ k + 1. One clearly has y n+1 (−π) + k i=1 y i a i = 0 with y i := π a i for i ∈ [k] and y n+1 = k. Consequently, for every b ∈ Z, the equality b = N y n+1 a n+1 + k i=1 (N y i +bz i )a i holds for an arbitrary N ∈ Z >0 . If we choose N large enough, all of the coefficients in the above representation become non-negative. This shows that every b ∈ Z belongs to Sg(a + ) and, since we have used k + 1 generators from a + to represent b, we have ICC(a + ) ≤ k + 1.
To conclude the proof, it remains to verify k + 1 ≤ ICR(a + ). It is easy to check that 1 is an element of the semigroup Sg(a + ) = Z that cannot be represented using at most k of the n + 1 generators a 1 , . . . , a n , −π . Indeed, the only negative generator −π has to be used. If, apart from this generator, one uses at most k − 1 positive generators, by the definition of k, the chosen generators have the gcd strictly larger than one, which is a contradiction.
We will also make use of the hardness of the set-cover problem, which is the following classical NP-complete problem, see [19,Problem: SP5]. The input of the set-cover problem consists of k, t ∈ Z >0 and a family S := {S 1 , . . . , S n } of n sets with S 1 ∪ · · · ∪ S n = [t]. We use mincov(S) to denote the minimal cardinality of τ ⊆ [n] such that i∈τ S i = [t] holds. The set-cover problem is the problem to decide whether mincov(S) ≤ k holds.

Proof of Theorem 5, Parts (i) and (ii)
Since m = 1, we use the notation a := A. By Proposition 4, it is sufficient to consider only the three decision problems ILC(a) ≤ k, ICC(a) ≤ k and ICR(a) ≤ k.
We assume a = 0. For τ ⊆ [n], one has L(a) = gcd(a τ )Z. Hence, ILC(a) is the minimum cardinality of a set τ that satisfies gcd(a) = gcd(a τ ). Thus, the validity of ILC(a) ≤ k is certified by a set τ ⊆ [n] with at most k elements for which gcd(a) = gcd(a τ ) is fulfilled. Since the gcd is computable in polynomial time, this shows that our decision problem is in NP. In order to show that deciding ICC(a) ≤ k is in NP, we can use a certificate consisting of a set τ with ICC(a τ ) ≤ k and solutions of problems , that have a polynomial description size. To prove hardness of ILC(a) ≤ k, we will use reduction from the set-cover problem.
Since every of the t elements from [t] occurs in some of the sets S 1 , . . . , S n and since we have n sets in total, it is clear that the size for the input the set-cover problem is of order at least n + t. The reduction is as follows. We compute the first t prime numbers p 1 , . . . , p t . To this end, we can use a weaker version of the Prime Number Theorem, established by Chebyshev, which asserts that for t ≥ 2 there exists a universal constant c > 0, such that p t ≤ c t log e t, see [23,Theorem 9]. Hence, p 1 , . . . , p t can be found by running the sieve of Eratosthenes or some more brute-force algorithm on the range of integers {1, . . . , O(t log e t)}.
We are going to encode elements of {1, . . . , t} via the above prime numbers. Accordingly, we encode sets S 1 , . . . , S n via integer numbers as follows: with S j we associate a j := i∈[t]\S j p i . This means that a j is the product of those prime numbers p i whose index i is not in S j . As the prime numbers p 1 , . . . , p t have a polynomial bit size in t, the numbers a 1 , . . . , a n can be computed in polynomial time.
In view of Proposition 5, the computation of ILC(a) for a ∈ Z 1×n ≥2 satisfying gcd(a) = 1 can be reduced, in polynomial time, to computation of ICR and ICC in the case of one row, by constructing the vector a + out of a. Thus, the NP-hardness of deciding ICR(a) ≤ k and ICC(a) ≤ k follows form the NP-hardness of deciding ILC(a) ≤ k.
The exact complexity status for the analogous decision problem ICR(a) ≤ k remains unresolved. It is neither clear if the latter decision problem is in NP nor is it clear if this problem is in co-NP.

Proof of Theorem 5, Parts (iii) and (iv)
We recall that a computational problem is called strongly NP-hard if it is NP-hard with respect to the unary encoding of the coefficients of the input. Applied to our setting, this means that the coefficients of A ∈ Z m×n are given in the unary encoding. A decision problem is called strongly NP-complete if it belongs to NP (with respect to the binary encoding of the coefficients) and is strongly NP-hard. was not in any of the sets S ∈ S , then the coefficients on the left-hand side of the i-th equation of the system A γ x = b would be divisible by p i , while the the right-hand side coefficient is 1, which again contradicts the solvability of the system.

Proof of Theorem 5, Parts (iii) and (iv)
We derive the strong NP-hardness of all four problems by means of Lemma 10, which helps to construct a polynomial-time reduction from the set-cover problem. Consider a family S of subsets of [m] that cover [m]. We want to reduce verification of mincov(S) ≤ k to the verification of any of the four inequalities in the assertion. Our reduction is the map S → A, described in Lemma 10, for which we fix the prime numbers p 1 , . . . , p m , q to be the first m +1 prime numbers. As in the first part of the proof, these prime numbers can be computed in polynomial time in the size of S, which means that the respective map S → A is computable in polynomial time. Furthermore, the first m+1 prime numbers are of order O(m log e m), which implies that the unary encoding of A has a polynomial size in S. In view of Lemma 10, we obtain the desired hardness assertions, as verifying mincov(S) ≤ k is

Proof of Theorem 6
Presburger arithmetic is the first-order theory of the integer numbers with addition (but no multiplication) and the usual order ≤. A Presburger statement is a quantified expression of the form where Q 1 , . . . , Q k ∈ {∀, ∃} are quantifiers over integer variables x 1 , . . . , x k and Φ(x 1 , . . . , x k ) is a Boolean combination of linear inequalities with integer coefficients in the variables x 1 , . . . , x k . In the 1920's Presburger showed that there is an algorithm, based on elimination of quantifiers, to verify the validity of such statements. But it is also known that deciding general Presburger statements is much harder than deciding NP-complete problems. For example, for statements with a fixed number i of quantifier alternations that start with an existential quantifier the following is known: deciding such statements is complete for the level Σ EXP i−1 of the exponential hierarchy for i ≥ 2 (see [22,Sect. 5]) and complete for the level Σ P i−2 of the polynomial hierarchy when i ≥ 3 and, additionally, the number of variables k and the number of Boolean operations used in Φ is also fixed (see [27]).

Proof of Theorem 6
Using Hermite Normal Form of A with respect to the row transformations, we can reduce the general case to the case of A having full row rank. In particular, this means m ≤ n. It is clear that one can express the condition ILR(A) ≤ k and ICR(A) ≤ k as the Presburger statement ∀x ∈ D n ∃ y ∈ D k : with D = Z and D = Z ≥0 , respectively. Note that, though in our definition of a Presburger statement the quantified variables have values in Z, it is easy to model quantified variables from Z ≥0 via a slight reformulation: For example: ∀x ∈ Z ≥0 : Φ(x) can also be formulated as ∀x ∈ Z : ((x ≥ 0) ⇒ Φ(x)). When n is fixed, (20) is a so-called short Presburger formula, which means the number of quantified variables as well as the number of Boolean operations used in the formula are fixed. For our formula, we can assume k ≤ n because both ILR(A) and ICR(A) are at most n. Thus, the number of quantified variables is at most 2n. The number of disjunctions used is at most 2 n . Each system Ax = A τ y is a conjunction of m equalities, which means that we have used at most m2 n ≤ n2 n conjunctions. It is known that short Presburger statements with one quantifier alternation are solvable in polynomial time. This is explicitly stated as Theorem 1.9 in [28], where the authors of [28] refer to the work of Woods [34] and their own work [27]. We note that the proofs from [27,34] rely on the algorithmic theory of generating functions (See [5,7,8,14]). Since the short statement (20)  Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Appendix
Even though Theorem 7 is available in the specialized literature (see, for example, [20]), it does not seem to be well known, and we have not found any elementary group theory source containing a complete self-contained proof of this result. Thus, we prove Theorem 7 relying only on the basic facts from group theory.
The following lemma shows that by "projecting out" a subset of a non-redundant set, one obtains a smaller non-redundant set.

Lemma 11 Let S be a non-redundant subset of a finite Abelian group G. Let T ⊆ S and consider the canonical homomorphism φ : G → G/ T . Then φ(S\T ) is a non-redundant subset of G/ T of cardinality |S| − |T |.
Proof For all x , x ∈ S\T with x = x , we have φ(x ) = φ(x ). If the latter was not the case, then we had x − x ∈ T , which implies x ∈ T ∪ {x } . Consequently, S = S\{x } which contradicts the fact that S is non-redundant. It follows that φ is injective on S\T . Hence φ(S\T ) has |S\T | = |S| − |T | elements. Let us verify that φ(S\T ) is non-redundant subset of G/ T . If φ(S\T ) were redundant, then we had φ(S\T ) = φ(U ) for some proper subset U of S. This means S\T + T = U + T . The latter equality can be simplified to S = U ∪ T . Thus, the proper subset U ∪ T of S generates S , which contradicts the non-redundancy of S. Proof We claim that the group f + h is cyclic of order |F|. Note that z( f + h) with z ∈ {0, . . . , |F| − 1} are |F| distinct elements of F, because z f with z ∈ {0, . . . , |F| − 1} are the |F| distinct elements of F. On the other hand |F| · ( f + h) = |F| · f + |F| · h = 0 + 0 = 0, which shows that f + h has no other elements.
Since the order of f + h is |F|, the order of G/ f + h coincides with the order of H . Thus, to verify the assertion, it suffices to show that φ is injective. To this end, we check that the kernel of φ is {0}. Consider an arbitrary x ∈ H with φ(x) = 0. This means, x ∈ f + h . Thus, x = z( f + h) holds for some z ∈ Z. In view of G = F ⊕ H , x, h ∈ H and f ∈ F, the latter implies 0 = z f and x = zh. Since f is a generator of the cyclic group f , the equality 0 = z f implies that z is a multiple of |F|. But then zh = 0, which implies x = 0. This shows that φ has trivial kernel and concludes the proof.

Proof of Theorem 7
Consider the decomposition of G into direct sum of primary cyclic groups: where k 1 , . . . , k m ∈ N and, for each i ∈ [m] and j ∈ [k i ], the direct summand G i, j is a primary cyclic group of order p n i, j i with n i, j ∈ N. One has κ(G) = m i=1 k i . See Chapter 5 in [15] for details. Some algebra books call this the elementary divisor decomposition of G.
First note that G contains a non-redundant set of cardinality κ(G), which can be constructed by picking a generator of G i, j for each cyclic group G i, j from the decomposition of G. We thus need to show that for an arbitrary non-redundant subset S of G the inequality |S| ≤ κ(G) is fulfilled. We argue by induction on Ω(|G|). If Ω(|G|) = 1, then |G| is prime. Consequently, G is a cyclic group of prime order, which means that every non-zero element of G generates the whole G. We conclude that |S| ≤ 1 = κ(G).
We fix an arbitrary integer N > 1 and assume that the bound on the cardinality of non-redundant sets is true in every finite Abelian groupG with Ω(|G|) < N . Let G be a finite Abelian group with Ω(|G|) = N . We verify the bound for the group G.
Using projection homomorphisms x → x i, j from G to G i, j , each x ∈ G can be uniquely written as Case 1: There exist i ∈ [m] and j ∈ [k i ] such that for every x ∈ S, the group x i, j is a proper subgroup of G i, j . Assume, without loss of generality, that x 1,1 is a proper subgroup of G 1,1 for every x ∈ S. Since G 1,1 is cyclic of order p n 1,1 1 , every subgroup properly contained in G 1,1 is a subgroup of the (unique) cyclic subgroupG 1,1 of G 1,1 of order p n 1,1 −1 1 . FixG i, j := G i, j for all i ∈ [m] and j ∈ [k i ] with (i, j) = (1, 1). It follows that S is a subset of the group where Ω(|G|) = Ω(|G|) − 1 < N . Thus, by the induction assumption we obtain |S| ≤ κ(G). Since κ(G) ≤ κ(G) we conclude |S| ≤ κ(|G|).
Case 2: For all i ∈ [m], j ∈ [k i ], there exists some x ∈ S such that x i, j = G i, j . Without loss of generality, we can assume that the numbers n i, j are ordered so that n i,1 ≥ · · · ≥ n i,k i (22) holds for every i ∈ [m]. We represent G as the direct sum G = F ⊕ H , where Since the orders of the cyclic groups G 1,1 , . . . , G m,1 are pairwise relatively prime, the Chinese Remainder Theorem implies that F is a cyclic group of order d := m i=1 p n i,1 i, 1 . Each x ∈ G can be projected onto F and H . That is, for x ∈ G, we introduce 1 , and Choose a subset T of S of cardinality at most m by picking, for each i ∈ {1, . . . , m}, an element x of S satisfying x i,1 = G i,1 . Since F is cyclic, the order of {x F : x ∈ T } is the least common multiplier of the orders of G 1,1 , . . . , G m,1 and thus is equal to |F|. This means {x F : x ∈ T } = F. We fix t ∈ T such that t F is a generator of the cyclic group F and consider the projection homomorphism In view of (22), the decomposition G = F ⊕ H satisfies the assumptions of Lemma 12. This implies that φ := ψ| H is an isomorphism from H to G/ t .