On Blocky Ranks Of Matrices

. A matrix is blocky if it is a “blowup” of a permutation matrix. The blocky rank of a matrix M is the minimum number of blocky matrices that linearly span M . Hambardzumyan, Hatami and Hatami deﬁned blocky rank and showed that it is connected to communication complexity and operator theory. We describe additional connections to circuit complexity and combinatorics, and we prove upper and lower bounds on blocky rank in various contexts.


Introduction
Matrices serve as a model for many objects; linear operators in algebra, communication problems in computational complexity, concept classes in machine learning, and more.There are many ways to measure the complexity of matrices; there are various notions of rank (the "usual" rank, approximate rank, non-negative rank, sign rank, etc.), of communication complexity (deterministic, randomized, quantum, etc.), in learning theory (VC dimension, Littlestone dimension, margin complexity, etc.), and more.We focus on the notion of blocky rank recently defined by Hambardzumyan, Hatami & Hatami (2023).
A standard mechanism for defining a complexity measure has two stages.In the first stage, we define the building blocks of the model (in our case, matrices of blocky rank one).In the second stage, complexity is defined as the minimum number of operations 2 Page 2 of 18 Avraham & Yehudayoff cc that are needed to generate the target (in our case, sum operations).
Definition 1.1.All identity matrices have blocky rank one.The set of matrices of blocky rank one is also closed under three operations: duplicating a row or a column, permuting the rows or columns, and adding a zero row or a zero column.In other words, a matrix has blocky rank one if up to a permutation of the rows and columns it has blocks of ones of different sizes on the "diagonal" followed by some amount of zeros.
Definition 1.2.The blocky rank blocky(M ) of a matrix M is the minimum integer R so that M can be written as a linear combination of R matrices B 1 , . . ., B R , each of blocky rank one.In this work, we always work over the field R.
Motivation to study blocky rank, and its relatives, comes from various areas.In communication complexity, it is related to understanding randomized communication problems, deterministic communication complexity with equality oracle queries, and other communication complexity models (Hambardzumyan et al. (2023) and Pitassi, Shirley & Shraibman (2023)).In operator theory, it is related to idempotents in Schur algebras (see Hambardzumyan et al. (2023) and references within).In circuit complexity, it is related to depth-two threshold circuits.In combinatorics, it is related to covering problems in graphs.In machine learning, it is related to closure properties of Littlestone classes (Alon, Beimel, Moran & Stemmer (2020)).

Generic matrices.
A typical first question about complexity is "what is the complexity of a random object?"The "obvious" upper bound on the blocky rank of an n × n Boolean matrix is n, because a Boolean matrix with one nonzero row has blocky rank one.The following theorem provides a lower bound for random matrices.
Page 3 of 18 2 Theorem 1.3 is proved in Section 2.1.The lower bound has a factor of log n compared to the obvious upper bound.This factor turns out to be needed.The blocky rank of all Boolean matrices is much smaller than n.This immediately follows from a result of Pudlák & Rödl (1994). 1 In fact, this holds for a large complexity measure.The blocky partition number blocky-par(M ) of M is the minimum number of blocky matrices that sum to M .It trivially holds that blocky(M ) ≤ blocky-par(M ).
The two theorems are reminiscent of Shannon's lower bounds and Lupanovs upper bound in the context of Boolean circuit complexity.Shannon proved that the circuit complexity of a random n-variate Boolean function is at least Ω( 2 n n ), while the obvious upper bound is larger.Lupanov proved that in fact the lower bound is sharp; every n-variate Boolean function has a Boolean circuit of size at most O( 2 n n ).The theorems above have the following additional combinatorial interpretation.The clique cover number of a graph is the least number of (induced) cliques that are required to cover it.The intersection number of a graph is the least k so that the graph can be represented as the intersection graphs over a universe of size k.An intersection graph consists of a set S v ⊆ [k] for each vertex v, so that every two vertices u = v are connected by an edge iff S v ∩ S u = ∅.Erdös, Goodman & Pósa (1966) showed that the clique cover number is equal to the intersection number.Bollobás, Erdős, Spencer & West (1993) proved that the clique cover number of a uniformly random graph on n vertices is at least Ω( n 2 log 2 n ) and at most O( n 2 log log n log 2 n ).Part of their motivation was to understand the interval number of random graphs.Frieze & Reed (1995) improved the upper bound to a sharp O( n2 log 2 n ).Roughly speaking, the connection between their n 2 log 2 n and our n log n is that a typical clique is of size log n, and n log n cliques can (sometimes) be glued to a single blocky matrix, so the total number of blocky matrices becomes ≈ n 2 log 2 n / n log n .It is worth noting that the upper bounds from Bollobás et al. (1993) andFrieze &Reed (1995) hold for random graphs and are false for some graphs, whereas the upper bound above holds for all matrices.
Let us now make the connection more formal.We work with bipartite graphs, because they correspond to (general) Boolean matrices. 2A blocky graph is a bipartite graph that consists of a disjoint union of full bipartite graphs.Equivalently, the adjacency matrix of a blocky graph has blocky rank one.The blocky cover number of a bipartite graph is the minimum number of (induced) blocky graphs that are required to cover it.
The blocky cover number can be thought of as a variant of the intersection number.An intersection representation of a graph is a map s that assigns to each vertex v a vector s(v) ∈ {0, 1} k , and two vertices v = u are connected by an edge if there is i ∈ [k] so that s(v) i = s(u) i > 0. The intersection number of a graph is the minimum k for which there is such a representation.We can extend this definition to larger alphabets.An agreement representation consists of a map s that assigns to each vertex v a vector s(v) ∈ {0, 1, . . ., L} k , so that every two vertices v = u are connected by an edge if there is i ∈ [k] so that s(v) i = s(u) i > 0. The integer k is called the universe size of the graph (the integer L is not assumed to be bounded).
If G is a bipartite graph, then G has an agreement representation with universe size one iff G is blocky.More generally, the cover number of G using blocky graphs is equal to the least universe size of an agreement representation of G.This is analogous to the fact that the clique cover number is equal to the intersection number (Erdös et al. (1966)).
Instead of covering the edges of a graph, we can ask to partition them to structured parts.The blocky partition number of a Page 5 of 18 2 bipartite graph is the minimum number of pairwise edge-disjoint (induced) blocky graphs that are required to cover it.Theorem 1.4 says that the blocky partition number of every bipartite graph with n vertices is O( n log n ).

The greater-than matrix.
A more interesting but often more difficult question is understanding the complexity of specific objects (and not of random objects).We move to investigating the blocky rank of specific matrices.
The first matrix we consider is the n × n greater-than matrix GT n defined by GT n (x, y) = 1 x≤y , where we think of the rows and columns of GT n as integers in [n].The greater-than matrix is the adjacency matrix of the half graph.
The upper bound is relatively straightforward and was proved a long time ago in the context of Schur algebras (Kwapień & Pe lczyński (1970)).It actually states that the blocky partition number of the half graph is at most log n + 1; see Claim 4.4 below.In particular, even the monotone blocky rank of GT n is at most order log n (in monotone ranks, we only allow to use positive coefficients).
A variant of the blocky rank of the greater-than matrix was studied in the context of closure properties of "threshold classes" in machine learning (Alon et al. (2020); Ghazi et al. ( 2021)).There are many variants of blocky rank we can study: a monotone version where the linear combination just uses positive numbers, an approximate version where we just need to approximate the target matrix, a signed version where we just need to get the sign pattern correctly, and so forth.Here is a variant that is related to closure properties in machine learning.For a tuple B = (B 1 , . . ., B R ) of n × n Boolean matrices, and a function F : {0, 1} R → {0, 1}, let F (B) be the n × n matrix obtained by applying F entry-wise: for all i, j,

Page 6 of 18
Avraham & Yehudayoff cc Definition 1.6.The functional blocky rank fun-blocky(M ) of a Boolean matrix M is the minimum number R so that there is a tuple B = (B 1 , . . ., B R ) of blocky matrices and The lower bound fun-blocky(GT n ) ≥ Ω(log log n) is implicit in the work of Alon, Beimel, Moran & Stemmer (2020).The better lower bound fun-blocky Kumar & Manurangsi (2021).These two works consider a more general framework, and their arguments are based on Ramsey theory.And even the stronger lower bound is off by a log log n factor.We remove this factor, and get a sharp bound.
Theorem 1.7.fun-blocky(GT n ) ≥ 1 8 log n.The lower bound is proved in Section 3.2.This argument too is related to covering graphs.The Graham-Pollak theorem states that the edges of the full graph on n vertices cannot be partitioned to less than n − 1 complete bipartite graphs (Graham & Pollak (1971)).Orlin (1977) suggested to study the problem of covering the cocktail party graph (a full graph minus a perfect matching).Part of his motivation came from computational complexity theory (see Remark 3.7 in his paper).Gregory & Pullman (1982) proved that the clique cover number of the cocktail party graph is Θ(log n).
We consider the following bipartite strengthening of their result.The bipartite cocktail party graph is the full bipartite graph minus a perfect matching.
Theorem 1.8.The blocky cover number of the bipartite cocktail party graph with n vertices on each side is at least 1 4 log n.This is quantitatively weaker but more general than the lower bound of Gregory and Pullman.The cocktail party graph contains a copy of the bipartite cocktail party graph.And a clique in the cocktail party graph corresponds to a connected blocky graph in that copy.Our lower bound holds also when we are allowed to cover the bipartite cocktail party graph by any blocky graphs (not necessarily connected).
Avishay Tal shared with us the following observation.For every n×n Boolean matrix M , the functional blocky rank of M is at most O(log n).The reason is that the matrix B defined by B x,y = x i for some fixed i is blocky, and with 2 log n such matrices we can encode both x, y.This shows that, somewhat unusually, GT n is an explicit matrix of essentially maximum functional blocky rank.
1.3.The inner-product matrix.The second matrix we consider is the inner-product matrix; let IP n be the {0, 1} n × {0, 1} n matrix defined by IP n (x, y) = i x i y i mod 2. This matrix has been studied in various contexts, like circuit complexity, communication complexity, margin complexity and more.We focus here on its connection to circuit complexity; in particular, to depthtwo threshold circuits.There is a long line of research on this topic; see Amano ( 2020 Hajnal, Maass, Pudlák, Szegedy & Turán (1993) proved a lower bound of roughly 2 n/3 for the size of MAJ • LTF circuits computing the inner product function.Amano (2020) constructed a MAJ•LTF circuit of size (1.899) n computing the inner product function.The blocky rank perspective allows to improve the lower bound.
Theorem 1.9.Any MAJ • LTF circuit computing IP n has size at least Ω( 2 n/2 n ).The theorem is proved in Section 4.1.The proof proceeds by bounding the correlation between IP n and a threshold gate.An upper bound of ≈ 2 −n/3 on the correlation was proved in Hajnal et al. (1993).We improve the bound to ≈ 2 −n/2 which is basically sharp.
1.4.Depth-two threshold circuits.Finally, we describe a general connection between blocky rank and circuit complexity, specif-ically, depth-two threshold circuits.A similar connection was already discovered by Jukna (2006).
Proving strong lower bounds for LTF • LTF circuits is a longstanding open problem.Kane & Williams (2016) proved the best known lower bound for this model.They proved that the size of every LTF • LTF circuit computing the n-variate Andreev function must be of size Ω(n 3/2 ).
Roychowdhury, Orlitsky, and Siu observed that we do not even know how to prove lower bounds for Σ • LTF circuits, where the upper gate just computes a linear function (with arbitrary coefficients); see Roychowdhury et al. (1994); Williams (2018).We observe that lower bounds on blocky rank yield circuit lower bounds in this model.
The theorem is proved in Section 4.2.It shows that proving strong lower bound on the blocky rank of explicit Boolean matrices might be difficult but rewarding.
Remark 1.11.A similar theorem holds for the signed version of blocky rank and general LTF • LTF circuits.
The theorem suggests that even proving relatively weak lower bounds (say, polynomial in n) on the blocky rank of an explicit 2 n × 2 n matrix is interesting.The lower bound from Kane & Williams (2016) relies on the anti-concentration phenomenon, which does not seem directly relevant to blocky rank.So, even obtaining an Ω(n 5/2 ) lower bound on the blocky rank (which would yield the same circuit lower bound) seems interesting to us.

General matrices
The lower bound follows from a counting argument showing that there are few Boolean matrices with low blocky rank.
Proof.We can choose a basis for V in echelon form.That is, there are v 1 , . . ., v k ∈ V and i 1 < . . .< i k in [n] so that (v j ) i j = 1 and (v j ) i = 0 for all < j.If i a i v i ∈ {0, 1} n , it follows that given a 1 , . . ., a i , there are at most two possible options for a i+1 .The total number of possibilities for a 1 , . . ., a k is at most 2 k .
Proof.By permuting the rows and columns, every blocky matrix can be brought into a block diagonal form.A matrix in block diagonal form can be represented by two sets {i 1 < i 2 < . . .< i r } and {j 1 < j 2 < . . .< j r } in [n] so that the first block is of size ) and so on.The case when there are zero rows or columns is encoded by i r < n or j r < n.
There is at most 2 n • 2 n = 2 2n ways to choose this representation and n! • n! ≤ 1 2 n 2n ways to order the rows and columns.
Proof (Theorem 1.3).For fixed blocky matrices B 1 , . . ., B R , The number of Boolean matrices of blocky rank at most R is therefore at most

An upper bound for all matrices.
In this sub-section, we explain the proof of Theorem 1.4; for every Boolean n×n matrix The proof is based on the work of Pudlák and Rödl Pudlák & Rödl (1994).In that paper, blocky matrices are called "fat matchings".Theorem 11.1 in Pudlák & Rödl (1994) says that if M is a Boolean n × n matrix, then M can be covered by R ≤ O( n log n ) blocky matrices B 1 , . . ., B R ; that is, M ≤ r B r and for all r ∈ [R] we have B r ≤ M .The proof of Theorem 11.1, however, shows that M = r B r so that blocky(M ) ≤ R.

Blocky ranks of greater-than
3.1.An upper bound on blocky rank.It is known that the blocky rank of GT n is at most logarithmic; see, e.g.Kwapień & Pe lczyński (1970).We include a proof for completeness.
Proof.We prove the claim for GT n for n = 2 k .The proof is by induction on k.For the base case k = 0, we have blocky(G 1 ) ≤ 1.The matrix GT 2 k+1 can be written as where J is the all-ones matrix.Let B 1 , • • • , B k+1 be the matrices so that GT Page 11 of 18 2

A lower bound on functional blocky rank.
In this section we prove Theorem 1.7 stating that fun-blocky(GT n ) ≥ 1 8 log n.Let C be the n×n matrix with zeros on the diagonal and ones elsewhere; it corresponds to the bipartite cocktail party graph.The following lemma provides a lower bound on the blocky cover number of the bipartite cocktail party graph.
Lemma 3.2.If C = ∨(B) where ∨ denotes the OR function and Proof (Lemma 3.2).The proof is by induction on n.For n = 1, the claim is trivial.For the inductive step, the ones of the matrix B 1 correspond to pairwise disjoint sets S 1 , . . ., Define two random subsets S and T of [n] as follows.Let 1 , . . ., A be i.i.d.uniformly distributed in {0, 1}.Let S be the complement of a: a=1 S a and T be the complement of a: a=0 T a . Let The projection of B 1 to S × T is the zero matrix (with probability one).For each i ∈ [n], the probability that (i, i) ∈ S × T is one quarter, because (B 1 ) i,i = 0.There is a choice for S × T so that |I| ≥ n 4 .Let C be the matrix C after deleting all rows and columns not in I.The matrix C is a cocktail party matrix of dimension |I|, and the matrix B 1 does not contribute to its representation.The inductive hypothesis completes the proof.
Proof (Theorem 1.7).Assume that GT n = F (B) for B = (B 1 , . . ., B R ) where each B r is blocky.Assume towards a contradiction that R < 1 8 log n.For i, j ∈ [n], denote by B i,j ∈ {0, 1} R the vector There is a set Page 12 of 18 Avraham & Yehudayoff cc so that all the m vectors B i,i for i ∈ I are the same.Delete the rest n − m rows and columns from GT n and from B, and focus on the remaining m × m part.Denote by G the obtained copy of GT m , and denote by B the obtained tuple of matrices so that G = F (B ).It follows that there is f ∈ {0, 1} R so that for each i ∈ [m], we have B i,i = f .Order B so that the first k entries in f are ones, and the last R − k are zeros.
Claim 3.3.For every i = j in [m], there is r > k so that (exactly) one of (B r ) i,j and (B r ) j,i is one.
Proof.For each r ≤ k, because B r is blocky and (B r ) i,i = (B r ) j,j = 1, we know that (B r ) i,j = (B r ) j,i .Because G i,j = G j,i , the two lists ((B k+1 ) i,j , . . ., (B R ) i,j ) and ((B k+1 ) j,i , . . ., (B R ) j,i ) must be different.

Circuit complexity
4.1.MAJ • LTF circuits.In this sub-section, we prove Theorem 1.9; if D ∈ MAJ • LTF and D = IP n then |D| ≥ Ω( 2 n/2 n ).We use the blocky rank perspective to prove circuit lower bounds for inner-product.
Definition 4.1.The nuclear norm of the matrix M is Proof.The claim follows from the well-known fact that the nuclear norm of the unit matrix is at most one (see, e.g.Hambardzumyan et al. (2023)).We include a proof for completeness.It is sufficient to consider n × n identity matrices for the case that n + 1 is prime.Then, for all x, y ∈ [n], Consider the following generalization of LTFs.Proof.Because duplicating rows and columns do not increase the blocky rank, the blocky rank of M is at most that of GT n .Now, use Claim 3.1.
Corollary 4.5.The nuclear norm of an n×n Boolean monotone matrix is at most log(n) + 1.
The property of inner-product we rely on is Lindsey's lemma.This was done in many works, including Hajnal et al. (1993).The proof of the lemma uses the fact that the rows of IP n are orthogonal using the Cauchy-Schwarz inequality.
We can conclude the following strengthening of the correlation bound from Hajnal et al. (1993).
Proof.The matrix T is monotone (up to a permutation of the rows and columns).Corollary 4.5 bounds the nuclear norm of T from above; we can write where each p i > 0, where each B i is of rank one and B i ∞ ≤ 1, and where i p i ≤ (n + 1).Lemma 4.6 implies x,y∈{0,1} n (−1) IPn(x,y) T (x, y) = x,y∈{0,1} n (−1) IPn(x,y)   i p i B i (x, y) x,y∈{0,1} n (−1) IPn(x,y) B i (x, y) Proof (Theorem 1.9).Assume that where each T i is an LTF and w i ∈ {−1, 0, 1}.It follows that |b| ≤ s because otherwise the right-hand side is constant.For all x, y, (−1) 1+IPn(x,y) 1 + 2 − b + i w i T i (x, y) ≥ 1.
2.1.A lower bound for random matrices.In this sub-section we prove Theorem 1.3; if M is a uniformly random n × n Boolean In this sub-section, we prove Theorem 1.10; if M is a {0, 1} n × {0, 1} n matrix so that M = s i=1 w i T i where each w i ∈ R and each T i is an LTF then In other words, blocky rank lower bounds imply circuit lower bounds.Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a 1 + 2 − b + i w i T i (x, y) ≥ 2 2n .4.2.Σ•LTF circuits.iblocky(T i ) ≤ s • (n + 1).copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.