Optimal partial clique edge covering guided by potential energy minimization

For given integers k, n, r we aim at families of k sub-cliques called blocks, of a clique with n vertices, such that every block has r vertices, and the blocks together cover a maximum number of edges. We demonstrate a combinatorial optimization method that generates such optimal partial clique edge coverings. It takes certain packages of columns (corresponding to vertices) in the incidence matrix of the blocks, considers the number of uncovered edges as an energy term that has to be minimized by transforming these packages. As a proof of concept we can completely solve the above maximization problem in the case of k≤4\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k\le 4$$\end{document} blocks and obtain optimal coverings for all integers n and r with r/n≥5/9\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r/n\ge 5/9$$\end{document}. This generalizes known results for total coverings to partial coverings. The method as such is not restricted to k≤4\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k\le 4$$\end{document} blocks, but a challenge for further research (also on total coverings) is to limit the case distinctions when more blocks are involved.


Introduction
Covering designs are a classic subject in extremal combinatorics. Applications include the generation of efficient test cases that cover all (or many) conditions, the design of fault-tolerant systems, and collision avoidance. The general problem is: In a set V with |V | = n, place k subsets of size r called blocks, so as to cover a maximum number of subsets T ⊂ V with |T | = t (where T is said to be covered if T is a subset of a block). Unlike the trivial case t = 1, case t = 2 is already subtle. It can be phrased as a graph problem: It is known that a clique edge covering always exists if r /n ≥ γ k . A counting argument trivially yields γ k ≥ 1/ √ k. However, in general one cannot exactly pack the edge sets of several K r into K n , hence the γ k are larger. Precisely known values are γ 2 = 1, γ 3 = 2/3, γ 4 = 3/5, γ 5 = 5/9, γ 6 = 1/2. See, e.g., Theorem 8.21 in Chapter IV of [4]; here we have only adjusted the notation. Minimum k for all n ≤ 32 and r ≤ 16 can be found in [6].
In the present paper we consider the more general optimal partial clique edge coverings, and we introduce a method for constructing them. For fixed k, it can cope with arbitrarily large n. In [5] we had considered a continuous counterpart of the problem. The new combinatorial approach is able to compute exact edge numbers for the discrete problem, moreover, it gives an intuitive understanding of the structure of optimal coverings: The main idea is to interpret the vertices as columns of the k × n incidence matrix, and the number of uncovered edges as potential energy between pairs of them. Then we transform packages of columns so as to decrease this energy, leading to an optimal solution composed of special packages of minimum energy. Amazingly, the potential energy view of graph problems has recently been proposed in [10], and here the analogy turns out to be directly fruitful for solving a concrete problem. As the idea looks natural and general, it may also apply to the construction of other optimal combinatorial designs.
Recall that optimal total clique edge coverings are known for all n, r with r /n ≥ 4/13. Our work generalizes this type of results to partial coverings, and we manage all instances with k ≤ 4 and r /n ≥ 5/9, as a proof of concept. But the method as such does not stop there. One might be surprised how complicated already the case of k = 4 blocks is, however, even for the special case of total coverings the difficulty increases drastically as k grows.
The matter is also related to other known structures: A K r -decomposition of K n is an (n, r ) clique edge covering whose blocks cover every edge exactly once. More generally, an induced H -decomposition of a graph G consists of induced subgraphs H 1 , . . . , H k of G such that every edge of G is in exactly one H i , and all H i are isomorphic to a fixed graph H . Elegant necessary and sufficient conditions for induced H -decompositions of K n are given in [11]. Various cases of H and general G are studied in [2,3,8].

Further notation and preliminaries
Definition 4 For a given number k of blocks we define: -The incidence matrix of a family of k blocks in K n is a binary k × n matrix with a row for every block and a column for every vertex. A matrix entry equals 1 if the vertex belongs to the block, and 0 otherwise. -For any set I ⊆ {1, . . . , k} of row indices, we use the shorthand "a column I " to refer to any column whose set of row indices with matrix entry 1 is exactly I . We also write any column as the set I of the row indices where the matrix entry is 1. -A row sum is the number of 1s in a row, hence it equals the size of the block represented by that row. A column sum is the number of 1s in a column. The column sum of a column I is denoted |I |.
That is, for convenience we treat a column both as a bit vector and as the set of positions of entries 1 interchangeably. Also, since the order of vertices is arbitrary, we need not distinguish between incidence matrices whose columns are permuted. The covered edges in this example are uv, uy, vw, vy, wx, wy, x y. As we shall see later in Theorem 1, this example is not an optimal covering, since we can cover 8 edges by 3 blocks of size 3. In general, the connection to our problem is given by the following obvious fact: Proposition 1 An edge is covered if and only if the two columns representing its two vertices intersect, i.e., they are columns I , J with I ∩ J = ∅.
Proof By Definition 4, a column I represents a vertex of K n that belongs to exactly those blocks whose indices are in I . Thus, for any two vertices p and q, the following statements are equivalent: the edge pq is covered by some block; some block contains both p and q; some row has entries 1 in the columns of p and q; the columns of p and q (as sets of positions of the 1s) intersect.
In [5] we found a property similar to the following one for the continuous counterpart of our problem. But the present lemma does not follow immediately, as there might be "discretization effects".
Lemma 1 For any integers k, n, r there exist k blocks of r vertices that cover a maximum number g of edges of K n and obey the following property: For any two sets A, C of row indices such that A ⊂ C ⊆ {1, . . . , k} and |C| − |A| ≥ 2, the incidence matrix does not contain both a column A and a column C. In particular, no column with only 0s exists if kr > n.
Proof We show that, in any incidence matrix with k rows and with row sums r , we can get rid of the mentioned pairs of columns, by transformations that neither change the row sums nor decrease the number of covered edges.
Assume that (A, C) is any pair of columns as specified above. Consider two rows corresponding to some indices in C \ A. The crossing of the mentioned two columns and rows is a 2 × 2 submatrix with two rows (0, 1). Note that none of the other rows in the pair of columns (A, C) is (1, 0), since A ⊂ C. We replace (A, C) with a new pair of columns (A , C ), by turning one row (0, 1) into (1, 0): Obviously, the row sums are preserved. Consider any further column B. If B intersects both A and C, then B also intersects both A and C . If B intersects only C but not A, then B still intersects at least one of A and C . From these two statements it follows that the number of covered edges does not decrease. Also note that |A | and |C | are strictly between |A| and |C|.
We repeat this step as long as two columns A and C as above exist. Specifically, we always pick a column C with maximum |C|. This decreases the number of columns with maximum column sum, and eventually it decreases the maximum column sum itself. Thus, the process does not run into a cycle and terminates with an incidence matrix satisfying the claimed property.
The last assertion follows since, by the pigeonhole principle, for kr > n some column must have at least two 1s.
Henceforth it suffices to consider partial clique edge coverings that satisfy the property in Lemma 1. A first consequence are optimal coverings with k = 2 blocks K r : Since columns with two 0s or two 1s cannot coexist, the two blocks are either disjoint (if r ≤ n/2) or they together contain all n vertices (if r > n/2).

Outline of the method
Next we introduce a novel concept that will allow us to structurally characterize optimal partial clique edge coverings.

Definition 5
For incidence matrices with k rows we define: -A packet with c columns is any binary k × c matrix where all k row sums are equal.
The density of a packet is the row sum divided by c, or equivalently, the number of 1s divided by kc. -A partitioning of an incidence matrix divides the multiset of its columns into packets. A partitioning may contain arbitrarily many identical copies of every packet, however, for certain packets we allow only some fixed maximum number of copies. We refer to the latter packets as the remainder. -The energy of an incidence matrix is the number of uncovered edges, i.e., of pairs of disjoint columns. The energy E(P, Q) between two packets P and Q is the number of pairs of disjoint columns, one being in P and one being in Q. -By c 1 P 1 + · · · + c l P l we denote a submatrix consisting of c i identical copies of packet P i , for i = 1, · · · , l, where the sets of column indices of all these c 1 + · · · + c l packets are pairwise disjoint. The expression d 1 Q 1 + · · · + d m Q m is similarly defined, and we assume that both submatrices have the same total number of columns and the same total row sums. The interaction c 1 P 1 + · · · + c l P l → d 1 Q 1 + · · · + d m Q m replaces the submatrix c 1 P 1 + · · · + c l P l with the submatrix d 1 Q 1 + · · · + d m Q m . An interaction done within an incidence matrix is valid if it does not increase the energy of the incidence matrix.
The energy within a packet P is obviously E(P, P)/2. (If we take two copies of P, then every disjoint pair is counted twice, moreover, no column is disjoint to itself.) Also note that E(P, Q) = E(Q, P).

Example 2
The matrix below shows an incidence matrix (with k = 3, n = 8, r = 5, hence with density 5/8) partitioned into three packets two of which are identical. The energy is 0 within and between the first two packets, 1 within the last packet, and 1 between the last packet and each of the first two packets, resulting in the total energy 3. ⎛ ⎝ 1 1 0 1 1 0 1 0 1 0 1 1 0 1 1 0 0 1 1 0 1 1 0 1 ⎞ ⎠ In the rest of the paper we prove, for k ≤ 4 blocks, the existence of optimal clique edge coverings whose incidence matrices are composed of only two types of packets, subject to remainders. This finally allows to compute the amounts of these packets (and hence optimal coverings) uniquely from the given sizes n and r , by integer linear equations. The structure of the existence proofs is as follows. We start from an arbitrary incidence matrix that has row sums r and satisfies the property from Lemma 1, and show that it can be divided into a small number of different simple packets with many symmetries. With these packets we perform interactions reducing the energy, until the matrix is transfomed into one consisting of the claimed packets. Every interaction replaces some packets with others, thereby preserving the number n of columns and the row sums r , that is, it changes the blocks and re-assigns some vertices but preserves the sizes of blocks.
The technical contribution is this proof method. We remark that it is not merely local search. Besides reducing the energy among the packets in an interaction, one must count in the energy between these packets and the rest of the matrix. It is especially the limited number of different packets and their symmetries that will make it rather convenient to compute these energies.
We conclude this section with some simple but useful observations. A sequence of interactions may run into a cycle and never terminate. This cannot happen if they strictly decrease the energy. We give another simple sufficient condition:

Lemma 3 Let Y be some finite set of indices. Consider a set of interactions M i → M i (i ∈ Y ) that can be applied to an incidence matrix, each one transforming a submatrix (subset of columns) identical to M i into a submatrix M i . Suppose that every M i contains some column that does not appear in any M j ( j ∈ Y ). Then any sequence of such interactions is finite. The conclusion also holds if all matrices M i (i ∈ Y ) except one satisfy the above condition.
Proof Trivially, an interaction that consumes some column that is not produced elsewhere can be applied only finitely often. If one M i does not contain such a column, then still all other interactions can be applied only finitely often. But an infinite sequence of a single interaction is not possible either.

Lemma 4 Suppose that an interaction turns a submatrix M into M , and the rest of the incidence matrix is divided into submatrices P. If E(M, M) ≥ E(M , M ) and E(M, P) ≥ E(M , P) for all P, then the interaction is valid.
Proof This follows immediately from the definition of energy and the fact that the packets P are not changed by the interaction.

Partial covering of a clique by three blocks
First we demonstrate the principle for k = 3. Since γ 3 = 2/3, three K r can cover all edges of K n if and only if r /n ≥ 2/3. Now we will also yield optimal partial clique edge coverings with three blocks, for any n and r . Since the case r /n ≤ 1/3 is trivial, we assume r /n > 1/3. Remember that it suffices to consider incidence matrices that satisfy the property in Lemma 1. If all three entries in some column are 1s, then all other columns have at least two 1s. But then any two columns intersect, thus all edges are covered. This in turn implies r /n ≥ 2/3 (by the known result γ 3 = 2/3), contradicting our assumption r /n < 2/3. Hence only columns with one or two 1s are present, which are at most 6 different columns.
We partition our incidence matrix into packets. First we form cliques as long as possible. That is, we repeatedly take three columns Assume that c ≥ 2 (or c ≥ 2, but this case is symmetric). That is, we have two further columns {1, 2} and two further columns {3}. We take these four columns and apply the following interaction to them: We repeat the above steps (build further cliques and perform interactions) until c < 2 and c < 2. Hence there remains at most one anti-edge or path outside the cliques. Due to the equal row sums again, the numbers of further columns {1}, {2}, {3} are equal, hence we can group them to anticliques. Altogether, we always obtain one of the claimed partitionings.
Knowing the structure of an optimal clique edge covering from Theorem 1, it is now straightforward to compute one, for a given n and r . The "algorithm" for that is described as follows. First we compute the number c and a of cliques and anticliques, respectively. Since these packets have 3 columns, Theorem 1 yields the following case distiction and formulas for calculating c and a.
Finally, in either case we simply take c cliques and a anticliques and the respective remainder, and stack them together to an incidence matrix, which is our optimal clique edge cover.
Example 2 (in Sect. 3) actually shows an optimal incidence matrix for n = 8 and r = 5, with an anti-edge and c = 2 cliques, whereas a = 0.
It may be interesting to observe the number of covered edges. For example, for n = 0 mod 3, this number increases by exactly n whenever r is raised by 1. This is shown as follows. One anticlique is turned into a clique, and we have a − 1 other anticliques and c other cliques. From the partitioning we see directly that the number of covered edges increases by 3 + 3(a − 1) + 3c = 3a + 3c = n.

Partial covering of a clique by four blocks
Until now we can compute optimal partial clique edge coverings for all n, r with r /n ≥ 3/5: Case k ≤ 2 was simple. If k = 3 and r /n ≥ 2/3, then all edges of K n can be covered, due to γ 3 = 2/3. If k = 3 and r /n < 2/3, then we use Theorem 1. If k ≥ 4 and r /n ≥ 3/5, then all edges of K n can be covered, due to γ 4 = 3/5. Now we turn to the case k = 4 and r /n < 3/5, which is already intricate and shows the power of the packet approach. We continue on the lines of Theorem 1, now for the range 1/2 < r /n < 3/5. Since γ 5 = 5/9, the following Theorem 2 enables us to compute optimal families of blocks for all n, r with r /n ≥ 5/9. (Namely, for k ≥ 5 blocks it is known that all edges can be covered, and for k ≤ 4 blocks, the maximum number of covered edges is given by our results.) The final construction of an optimal covering is completely analogous to the case k = 3, only with different packets and numbers. The details are therefore omitted. Again, for each of the possible remainders, n and r uniquely determine the amount of cliques and stars, via two integer linear equations.
We come to the existence theorem. The basic ideas are the same as in Theorem 1, but the details are much more complex. The reader may first skip some case distinctions and verifications without losing track of the overall structure of the proof. Theorem 2 For 1/2 < r /n < 3/5, there exist four K r that cover a maximum number of edges of K n , of the following form: Their incidence matrix can be partitioned into cliques and stars, and one of these remainders: -at most two cycles and at most one pair, -at most two diagonals.
The mentioned packets are defined below (with the understanding that rows may be permuted simultaneously in all packets).
In the rest of this section we prove Theorem 2. First we define some other packets that will appear only in intermediate steps: Assumption Every column in the incidence matrix has at least two 1s. (Later we must drop this extra assumption and include also columns with one 1.) Again we start from any incidence matrix that satisfies the property in Lemma 1, and we first build cliques as long as possible, from all six different columns with two 1s. After that, some of the columns with two 1s is no longer available outside the cliques. Specifically, we can assume (by permuting rows if necessary) that no further column {2, 4} exists.
Next we build cycles as long as possible, from the remaining columns. By definition they do not contain any columns {2, 4} and {1, 3}. After this phase, also some column from the cycle is no longer available outside the packets. Specifically, we can assume (by permuting rows if necessary) that no further column {2, 3} exists. From now on the order of rows remains fixed.
Next we also build stars as long as possible, from the remaining columns. After this grouping of columns into cliqes, cycles, and stars, we perform the following interactions, as long as possible and in any order.
This set fulfills the condition of Lemma 3, hence any sequence of these interactions terminates. To show that every interaction M → M is valid, we apply Lemma 4, where P is either a packet (clique, cycle, star) or consisits of a single column outside the packets. Verifying E(M, P) ≥ E(M , P) is easy (just slightly tedious) in each case, recalling that P is neither {2, 4} nor {2, 3}.
After termination of the interactions we build further stars as long as possible, from columns that are not yet in packets. Assume that some column {3, 4} still remains outside the packets. Due to the equal row sums, there must exist a column with more 1s in the first two rows than in the last two rows. But neither {1, 2, 3} nor {1, 2, 4} is in the incidence matrix, since otherwise the above interactions would still apply. Thus, for every column {3, 4} there also exists a column {1, 2}, and we can form pairs of them.
At this stage, the only possible columns outside the packets (cliques, cycles, stars, pairs) are columns with three 1s, and {1, 2}, {1, 4}, {1, 3}. If some column {1, 2} or {1, 4} exists, then either this column can be turned into {1, 3} by some of the interactions above, or the partner column with three 1s required for the interaction does not exist. We consider the latter case now.
Suppose that there is some {1, 2} but no {1, 3, 4}. Again, since the row sums are equal, some other columns must have more 1s in the last two rows than in the first two rows. The only possibility for that is the presence of two columns {2, 3, 4}, one further column {1, 3} and one further column {1, 4}. From the aforementioned columns we can build another star.
We argue similarly if some {1, 4} but no {1, 2, 3} exists. It follows that all remaining columns with two 1s outside the packets are {1, 3}, that is, all others have three 1s. Using again the fact that all row sums are equal, we conclude that all remaining columns can finally be grouped to diagonals and hypercliques. Finally we have managed to put all columns in packets.
With these packets we do another set of interactions which are again applied exhaustively and in any order. (See Definition 5 for the notation.) (1) 2 pair → cycle (2) clique + hyperclique → 2 star (3) cycle + 2 diagonal → 2 star (4) cycle + hyperclique → star + diagonal (5) pair + diagonal → star   Table 1 of the pairwise and inner energies of packets. Among all interaction products, only the cycle has a positive energy, and it is only produced in interaction (1). But since 2 + 2 · 1 ≥ 2, interaction (1) does not increase the energy of interacting packets either. It remains to compare the energies, before and after an interaction, between the interacting packets and any other packet in the partitioning. For that, we only need to compare the corresponding multiples of rows in Table 1 component-wise.
For the five listed interactions this just means to confirm the following inequalities.
Consider any column S with exactly one 1. Since r /n > 1/2, some column has three 1s. Due to Lemma 1, every such column is the complement of S. Thus, all columns with one 1 are equal to S. Furthermore, diagonals and hypercliques cannot exist, since they contain different columns with three 1s.
Precisely as before we assign the columns with two or three 1s to packets of these types, as long as possible: clique, cycle, star, pair. Now the small Lemma 2 turns out to be very useful: The interactions we had applied earlier are still valid, since the energy terms of all further columns with one 1 are not changed, due to Lemma 2. By the equal row sums and the absence of diagonals and hypercliques, all remaining columns form corners: In fact, the columns with a single 1 must be equal to {1}: If stars exist, then this claim follows from Lemma 1, and otherwise we can permute the rows.
Finally we erase the corners by further interactions. Their validity is checked as before, using Lemma 4 and Table 2. First we observe that the interaction "corner + pair → cycle" is valid. If no pair exists, then "corner → pair" is valid, too, hence we can produce a pair and do the former interaction. The interactions "2 pair → cycle" and "3 cycle → 2 clique" remain valid. After their exhaustive application, we reach the case "clique, star, cycle(2), pair(1)". This completes the proof.

Conclusions and further resarch
We have constructed optimal clique edge coverings with k ≤ 4 blocks, by inventing the method of interactions between packets of columns of incidence matrices that guides the search. Only the interaction sequences are laborious, but the final solutions have a nice and simple structure. We conjecture that, likewise, for every fixed k, optimal coverings with k blocks consist of only two types of packets and remainders of constant size. Probably, further ideas that reduce the amount of case distinctions would be needed to attack larger k. The method might also be suited for obtaining approximate solutions.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.