Universal Cycle Packings and Coverings for k -Subsets of an n -Set

A cyclic sequence of elements of [ n ] is an ( n , k ) -Ucycle packing (respec-tively, ( n , k ) -Ucycle covering) if every k -subset of [ n ] appears in this sequence at most once (resp. at least once) as a subsequence of consecutive terms. Let p n , k be the length of a longest ( n , k ) -Ucycle packing and c n , k the length of a shortest ( n , k ) -Ucycle covering. We show that, for a ﬁxed k , p n , k = (cid:2) nk (cid:3) − O ( n (cid:2) k / 2 (cid:3) ) . Moreover, when k is not ﬁxed, we prove that if k = k ( n ) ≤ n α , where 0 < α < 1 / 3, then p n , k = (cid:2) nk (cid:3) − o ( (cid:2) nk (cid:3) β ) and c n , k = (cid:2) nk (cid:3) + o ( (cid:2) nk (cid:3) β ) , for some β < 1. Finally, we show that if k = o ( n ) , then p n , k = (cid:2) nk (cid:3) ( 1 − o ( 1 )) .


Universal Cycles
Universal cycles have been introduced by Chung, Diaconis, and Graham [5]. They are closely related to both de Bruijn cycles and Gray codes. A universal cycle is a cyclic sequence which contains as a subsequence of consecutive terms a representation of every element of some collection of "combinatorial objects" exactly once. For example B Zbigniew Lonc zblonc@mini.pw.edu.pl Michał Dȩbski michal.debski87@gmail.com 1 Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland de Bruijn cycle is a binary cyclic sequence in which (for a fixed k) every binary sequence of length k appears as a subsequence of k consecutive terms exactly once. The problems of existence and construction of universal cycles for many combinatorial objects such as strings, subsets, multisets, permutations, partitions, lattice paths, vector spaces, weak orders, etc. have been considered by many authors (see [3][4][5]15,16,18,19,21,23,25]). Some variants of these problems where the subsequences representing combinatorial objects are not necessarily continuous or they do not overlap completely were investigated too (see [1,6,8,9,13,14,22]).
In this paper we deal with universal cycles for k-subsets of an n-set that is, sequences where every k-subset of an n-set appears exactly once as a subsequence of k consecutive terms. More precisely, for positive integers k and n, k ≤ n, a sequence (a 0 , a 1 , . . . , a m−1 ) of elements of the set [n] = {1, 2, . . . , n} is a universal cycle if every k-subset of [n] appears in this sequence exactly once as a subsequence (a i+1 , . . . , a i+k ) of k consecutive terms, where subscripts are taken modulo m. For example, when n = 5 and k = 2, the cyclic sequence (1, 3, 2, 5, 4, 2, 1, 5, 3, 4) is a universal cycle because every 2-subset of the set [5] appears in it exactly once as a subsequence of 2 consecutive terms.
The research on universal cycles for k-subsets of an n-set has been concentrated around the following conjecture of Chung, Diaconis, and Graham [5].

Conjecture 1 Let k ≥ 1.
There is an integer n 0 (k) such that for n ≥ n 0 (k), there exists a universal cycle for k-subsets of [n] if and only if n−1 k−1 ≡ 0 (mod k).
The condition n−1 k−1 ≡ 0 (mod k) is an obvious necessary condition for existence of a universal cycle (see [5]). Its sufficiency for large n is far from being settled.
It is easy to see that Conjecture 1 is true for k = 1, 2. Jackson [19] proved it for k = 3 and constructed universal cycles for 4-subsets of an n-set for all odd n ≥ 9. Hurlbert [16] showed that for k = 3, 4, 6 and n sufficiently large there exist universal cycles for k-subsets of an n-set whenever n and k are relatively prime. It is reported in [21] that Jackson [20] settled the Conjecture 1 for k = 4, 5 by constructing with some aid of a computer universal cycles for 4-subsets when n ≡ 2 (mod 8) and n ≥ 10 and for 5-subsets when n is not divisible by 5 and n ≥ 8; the former of these results has been proved as well by Rudoy [27] via an inductive construction. For k ≥ 7, and for k = 6 when n and k are not relatively prime, Conjecture 1 remains open.

Near-Universal Cycles
In view of apparent difficulty of Conjecture 1, it is natural to consider a relaxed problem, and there are two possible relaxations: to look for shortest possible cyclic sequences in which every k-subset of an n-set appears at least once as a subsequence of consecutive terms (it is referred to as the covering variant), and to look for longest possible cyclic sequences in which every k-subset of an n-set appears at most once as a subsequence of consecutive terms (the packing variant). Formally, by an (n, k)-Ucycle packing (resp. (n, k)-Ucycle covering) we mean a sequence (a 0 , a 1 , . . . , a m−1 ) of elements of [n] such that every k-subset of [n] appears in this sequence at most (resp. at least) once as a subsequence (a i+1 , . . . , a i+k ) of k consecutive terms, where the index addition is performed modulo m. We denote by p n,k (resp. c n,k ) the length of a longest (n, k)-Ucycle packing (resp. a shortest (n, k)-Ucycle covering). Obviously, p n,k ≤ n k ≤ c n,k . The problem of constructing a longest (n, k)-Ucycle packing has been considered by Curtis et al. [7] and Stevens et al. [28]. In [7] the authors, for a fixed k, prove the asymptotic formula p n,k = n k (1 − o(1)) 1 . It is shown in [28] that p n,n−2 = n; an immediate consequence of this result is that for k = n − 2 and n ≥ 4 no universal cycle exists.
Lonc et al. [26] considered, in connection with database file organization, the problem of constructing a shortest (non-cyclic) sequence in which every k-subset of an n-set appears as a subsequence k of consecutive terms. Translating one of the results in [26] to the language of the present paper, they proved that, for a fixed k, c n,k = n k +O(n k/2 ). A simple non-constructive proof of a weaker asymptotic result c n,k = n k (1 + o(1)) (where k is fixed) was given by Blackburn [2] who considered this problem in the context of so-called k-radius sequences.
A bit different but related covering problem has been studied by Hurlbert [17] and Stevens et al. [28]. They consider the problem of existence of so-called t-cover Ucycles, i.e. cyclic sequences of t n k integers in which every k-subset of [n] appears exactly t times as a subsequence of consecutive terms. Blanca and Godbole [3] deal with another variant of the problem. They represent a k-subset of [n] as a binary nterm sequence with exactly k1s and define a universal covering to be a binary cyclic sequence in which such representation of every k-set of [n] appears at least once as a subsequence of n consecutive terms. The authors in [3] construct a binary sequence that contains exactly one representation of each k-set and each (k − 1)-set; it follows that if k = o(n), then there exists a universal covering, in the sense defined in this paragraph, of length n k (1 + o(1)).

Our Contributions
In this paper we prove new asymptotic formulas for both the packing number p n,k and the covering number c n,k . Our main achievement is showing that p n,k and c n,k are asymptotically equal to n k for a wider range of values of k -note that previous results apply only in the case when k is a constant and n goes to infinity, and we allow k to grow with n. We prove (in Sect. 3) a theorem that gives the following corollary.
For the packing version we give an asymptotic result for k growing even faster with n, but with a larger error term (the proof is given in Sect. 4).

Theorem 1 If k = o(n), then
We also improve the error term in the packing version from O(n k−1 ) to O(n k/2 ) in the case when k is a constant (the proof is given in Sect. 3).

Theorem 2 For every fixed positive integer k,
Note that the important part of Theorem 2 is the formula for p n,k , as the formula for c n,k was known before. Nevertheless, it is included in the theorem because it is obtained as a simple byproduct of the proof.
The major (and also novel) part of the proofs is estimating the number of awesome compositions of numbers (the term is explained in Sect. 2). We do not use Shur's theorem (on which the proof in [7] relies-it is the reason why their argument works only for constant k); instead, we apply direct arguments (in Sects. 2 and 3) and probabilistic method (in Sect. 4).

Good and Awesome Subsets and Compositions
Let us divide a circle of length n with n points into n arcs of length 1 and label these points with 1, 2, . . . , n. If we represent the elements of the k-set X as points x 1 , x 2 , . . . , x k on this circle, then the numbers d i are just the lengths of arcs of the circle bounded by the consecutive points representing the elements of X . Following terminology used in Curtis et al. [7], we say that a k-subset X of an n-set [n] is good if there is a term (called a unique term) in d(X ) which is different from all the remaining terms. For example the 5-subset {2, 4, 6, 8, 9} of the set [12] is good because 1 (and also 5) are unique terms in its sequence of differences (5, 2, 2, 2, 1) and the 5-subset {3, 5, 8, 10, 12} of the set [12] is not good because its sequence of differences (3, 2, 3, 2, 2) has no unique terms. We say that a good k-subset X of [n] is awesome if d(X ) has a unique term greater than 1.
In [7] 2 the authors construct an (n, k)-Ucycle packing (a 0 , a 1 , . . . , a m−1 ) in which the k-subsets of [n] that appear as subsequences of k consecutive terms are precisely the awesome k-subsets. Thus, the number m of awesome k-subsets of [n] is a lower bound for the packing number p n,k . Let s n,k = n k − m be the number of k-subsets of [n] which are not awesome. Obviously, Moreover, we observe that in the linear (non-cyclic) sequence (a 0 , a 1 , . . ., a m−1 , a 0 , a 1 , . . . , a k−2 ) every awesome k-subset of [n] appears as a subsequence of k consecutive terms. We extend this sequence by a concatenation of s n,k k-term sequences which are (arbitrary) permutations of the non-awesome k-subsets of [n]. In the resulting sequence of length m + k − 1 + ks n,k ≤ n k + (k − 1)(s n,k + 1) every k-subset of [n] appears at least once as a subsequence of k consecutive terms. This brute force construction yields the inequality be a composition of a positive integer n into k parts, i.e. a sequence of positive integers such that d 1 +d 2 +. . .+d k = n. We say that a composition (d 1 , d 2 , . . . , d k ) is good if it has a part (called a unique part) which is different from all the remaining parts. More precisely, there exists i = 1, 2, . . . , k such that for all j = i, d i = d j . If a good composition has a unique part which is larger than 1, then we call it awesome. We denote by g n,k the number of compositions of n into k parts which are not good.
The next lemma explains the relationship between the numbers of non-awesome k-subsets of [n] and non-awesome compositions of n into k parts. Let a n,k be the number of compositions of n into k parts which are not awesome.

Lemma 1
For every n ≥ k ≥ 1, s n,k = n k a n,k .
Proof We will double-count the number of pairs (x, S), where S is a non-awesome k-subset of [n] and x ∈ S. Clearly, this number is equal to ks n,k . On the other hand, consider a function ϕ that assigns to every non-awesome composition ( p 1 , p 2 , . . . , p k ) of n into k parts the k-subset {x, x + p 1 , x + p 1 + p 2 , . . . , x + p 1 + p 2 + · · · + p k−1 } of [n], where addition is performed modulo n. One can easily verify that ϕ is a bijection from the set of all non-awesome compositions of n into k parts onto the set of all non-awesome k-subsets of [n] containing x. It follows that the number of pairs equals na n,k , so the lemma holds.

Lemma 2 For every n
Proof The set of non-awesome compositions of n into k parts is a union of two disjoint subsets: the set of non-good compositions and the set of good compositions which are non-awesome. The cardinality of the former set is g n,k . The cardinality of the latter set is not larger than kg n−1,k−1 because every good non-awesome composition of n into k parts can be obtained in a unique way from a non-good composition of n − 1 into k − 1 parts which has no part equal to 1 by inserting a unit part in one of k possible positions. Thus, a n,k ≤ g n,k + kg n−1,k−1 and we are done by Lemma 1.

Lemma 3
For n ≥ k ≥ 4, Proof We denote by G n,k the set of all non-good compositions of n into k parts. Let To estimate |A n,k (d)|, we define C i , i = 2, . . . , k, to be the set of members We estimate |B n,k (d)| similarly. Let D i, j , 2 ≤ i < j ≤ k, be the set of members (d 1 , d 2 , . . . , d k ) It follows from the definition of B n,k that after removing from (d 1 , d 2 , . . . , d k ) ∈ D i, j the terms d 1 , d i and d j we get a non-good composition of n − 3d into k − 3 parts in which no part is equal to d. Hence, |D i, j | ≤ g n−3d,k−3 and, consequently, The assertion follows by the equality (4) and the inequalities (5) and (6).
One can readily verify that, for k = 2, 3, g n,k = 1 if and only if k is a divisor of n and g n,k = 0 otherwise. Moreover g n,1 = 0.
Proof (i) We proceed by induction on k. Obviously, the lemma holds for k = 1, 2, 3 so let us assume that k ≥ 4. By Lemma 3 and the induction hypothesis we get (ii) We proceed by induction on k again. The lemma holds for k = 1, 2, 3 so let us assume that k ≥ 4. By Lemma 3 and the induction hypothesis, similarly as in the proof of (i), we get

Packing and Covering Number Asymptotic Formulas for a Fixed k and for k ≤ n 1/3
We are ready now to prove asymptotic bounds on the numbers p n,k and c n,k when k is fixed.
Proof (of Theorem 2) As the theorem obviously holds for k = 1, let us assume that k ≥ 2. By the inequalities (2) and (3), to prove the theorem, it suffices to show that s n,k ≤ cn k/2 , for some constant c = c(k).

By Lemmas 2 and 4(i),
which completes the proof of the theorem.
We shall apply now our results on compositions shown in Sect. 2 to prove asymptotic formulas for p n,k and c n,k in the case when k = k(n) is any function of n such that k ≤ n 1/3 .

Theorem 3
Let α, 0 < α ≤ 1/3, be a fixed real number. If k = k(n) ≤ n α , then Thus, using the Stirling formula for the factorial approximation one can easily observe that there is a constant d > 0 such that for all positive integers n and k, where n ≥ 2k, By Lemmas 2 and 4(ii) and the inequality k ≤ n α , For n ≥ 2k, by the inequality k , true for α ≤ 1 3 , and the inequalities (8), k ≤ n α , (7) and β > 1 2 , we get As n ≥ 2k for n ≥ 3 (because k ≤ n 1/3 ), the theorem follows now from the inequalities (9), (2) and (3).
Proof (of Corollary 1). By Theorem 3, if k → ∞ as n → ∞, then the equalities (1) hold. Since k 2 < βk, it follows from Theorem 2 that if k is a fixed positive integer, then the equalities (1) hold too. One can readily verify that these two statements imply that the equalities (1) are true for every function k = k(n) ≤ n α .

A Packing Number Asymptotic Formula for k = o(n)
Let us pass on now to a more general case when k = o(n). Using a probabilistic method, we will prove in this case that p n,k = n k (1 − o(1)). To show this result, we will prove first that if k = o(n) and k → ∞ as n → ∞, then almost every composition of n into k parts is good. More precisely, we will show (see Lemma 5(iv)) that if we select a composition of n into k parts uniformly at random, then with high probability its largest part is unique. We will call the random partition model described in the preceding sentence a uniform model and denote it by Π n,k because of its similarity to the uniform random graph model. Some related results on the issues concerning the largest parts and multiplicities of parts in random compositions can be found in Knopfmacher and Robbins [24], Hitczenko and Louchard [11], Hitczenko and Savage [12] and the book by Heubach and Mansour [10]. In these results, however, the number of parts is not a parameter so it seems they are not useful in our considerations.
We can imagine that a positive integer n is represented as an interval of length n which is divided into n segments of length 1 by n − 1 dashes. Note that a composition (d 1 , d 2 , . . . , d k ) of n into k parts can be thought of as a selection of k − 1 dashes that divide our interval (into segments of lengths d 1 , d 2 , . . . , d k ). We will say that a part (of a given composition) starts at position i if the i-th dash is selected (the part d 1 starts at position 0). The length of this part is the distance from the i-th dash to the next selected dash (or to the position n if no dash at positions larger than i have been selected).
Let us define the binomial model, denoted by Π n, p , so that each composition of n into k parts is chosen with probability p k−1 (1 − p) n−k . This is equivalent to saying that in the "interval" representation described above we pick dashes to form our composition independently, each with probability p. Note that the expected number of parts in Π n, p is p(n − 1) + 1, so one may hope that it would be equivalent to Π n,k when k ≈ np.
Here is a sketch of our reasoning in the remaining part of this section. We start by proving that, with high probability, the largest part in Π n, p is unique. We set p to be slightly smaller than k n , so, with high probability, the number of parts in Π n, p is smaller than k. To obtain a composition with exactly k parts we take a random composition from Π n, p and randomly add the missing dashes. If the number of added dashes is small enough, we can show that, with high probability, it does not affect our unique part. This procedure is a Π n,k model in disguise, which completes the argument. Lemma 5 Let p = p(n), 0 < p < 1, be a function of n such that p = o(1) and np → ∞ as n → ∞. We define 0 = ln(np)−ln ω − ln(1− p) and 1 (i) We need to show that, with high probability, X ≥ 1 +1 = 0. One can readily verify that (1 − p) 1 (1) and np → ∞ as n → ∞, so ω → ∞. As Pr[X ≥ 1] ≤ E[X ] for any nonnegative integer valued random variable X , we get (i).
(ii) We shall use the second moment method. First we observe that for sufficiently large Let I i be the 0-1 random variable indicating that a part of length at least 0 + 1 starts at position i. By the first equality in (10), we have (1)). (12) As for the second moment, we have Note that Thus, applying (13), (14) and (11), we get Applying (12) and (15), by the second moment method, we have because ω → ∞ as n → ∞, which proves (ii).
(iii) For any d > 0 , let us estimate the probability of the event C d that a part d occurs at least twice in a composition Π n, p . We denote by Let D be the event that there is a part in Π n, p from the interval ( 0 , 1 ] which is not unique. Clearly, D ⊆ of the d − 1 dashes inside it. The probability that such a bad event will not happen is In the present paper we also proved asymptotic formulas for the numbers p n,k and c n,k when k is not a constant but is a function of n. The results for the packing number are, however, stronger than the results for the covering number. In particular the covering analog of Theorem 1 remains open.

Problem 2
Is it true that if k = o(n), then c n,k = n k (1 + o(1))?
We believe the answer to Problem 2 is positive.
We know very little about existence of universal cycles in the case when k is very large. Clearly, the case of k = n − 1 is trivial. Stevens et al. [28] showed that for k = n − 2 universal cycles do not exist and that p n,n−2 = n. They also considered linear packings instead of cyclic packings of (n − 2)-subsets of an n-set and proved that in this case the corresponding packing number is 3n − 6 (far from the trivial upper bound n n−2 + n − 3). Using a similar method as the one described in [26], we are able to show that c n,n−2 ≤ 2 n n−2 − n but we do not know whether or not the actual value of c n,n−2 is equal to this upper bound. For k = n − s, where s > 2, we do not know any nontrivial general results. Therefore we state the following problem.

Problem 3
For a fixed s > 2, find good asymptotic formulas for p n,n−s and c n,n−s .
It is not difficult to see that if k = n − s, where s > 1 is a constant, then almost all k-subsets of [n] are not good. On the other hand, we proved in this paper (see Corollary 2) that for k = o(n) almost all k-subsets of [n] are good. In view of these facts it is interesting to ask what happens when k grows linearly with n. Therefore we state our next problem.

Problem 4
Let c be a constant real number such that 0 < c < 1 and let k ∼ cn. Are almost all k-subsets of [n] good?
Applying a similar argument as in the proof of Lemma 1 one can easily prove that the number of non-good k-subsets of [n] is equal to n k g n,k . Therefore the question formulated in Problem 4 is equivalent to asking if almost all compositions of n into k parts are good.
Some computer experiments done by P. Rzażewski suggest that for c ≥ 1 2 the answers to both questions are negative.