Parameterized Complexity of Streaming Diameter and Connectivity Problems

Oostveen, Jelle J.; van Leeuwen, Erik Jan

doi:10.1007/s00453-024-01246-z

Parameterized Complexity of Streaming Diameter and Connectivity Problems

Open access
Published: 19 June 2024

Volume 86, pages 2885–2928, (2024)
Cite this article

Download PDF

You have full access to this open access article

Algorithmica Aims and scope Submit manuscript

Parameterized Complexity of Streaming Diameter and Connectivity Problems

Download PDF

Jelle J. Oostveen¹ &
Erik Jan van Leeuwen¹

304 Accesses
Explore all metrics

Abstract

We initiate the investigation of the parameterized complexity of Diameter and Connectivity in the streaming paradigm. On the positive end, we show that knowing a vertex cover of size k allows for algorithms in the Adjacency List (AL) streaming model whose number of passes is constant and memory is \(\mathcal {O}(\log n)\) for any fixed k. Underlying these algorithms is a method to execute a breadth-first search in \(\mathcal {O}(k)\) passes and \(\mathcal {O}(k \log n)\) bits of memory. On the negative end, we show that many other parameters lead to lower bounds in the AL model, where \(\Omega (n/p)\) bits of memory is needed for any p-pass algorithm even for constant parameter values. In particular, this holds for graphs with a known modulator (deletion set) of constant size to a graph that has no induced subgraph isomorphic to a fixed graph H, for most H. For some cases, we can also show one-pass, \(\Omega (n \log n)\) bits of memory lower bounds. We also prove a much stronger \(\Omega (n^2/p)\) lower bound for Diameter on bipartite graphs. Finally, using the insights we developed into streaming parameterized graph exploration algorithms, we show a new streaming kernelization algorithm for computing a vertex cover of size k. This yields a kernel of 2k vertices (with \(\mathcal {O}(k^2)\) edges) produced as a stream in \(\text {poly}(k)\) passes and only \(\mathcal {O}(k \log n)\) bits of memory.

Small Vertex Cover Helps in Fixed-Parameter Tractability of Graph Deletion Problems over Data Streams

Article Open access 20 September 2023

Dynamic Graph Stream Algorithms in o(n) Space

Article Open access 25 September 2018

Fixed Parameter Tractability of Graph Deletion Problems over Data Streams

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Graph algorithms, such as to compute the diameter of an unweighted graph (Diameter) or to determine whether it is connected (Connectivity), often rely on keeping the entire graph in (random access) memory. However, very large networks might not fit in memory. Hence, graph streaming has been proposed as a paradigm where the graph is inspected through a so-called stream, in which its edges appear one by one [2]. To compensate for the assumption of limited memory, multiple passes may be made over the stream and computation time is assumed to be unlimited. The complexity theory question is which problems remain solvable and which problems are hard in such a model, taking into account trade-offs between the amount of memory and passes.

Many graph streaming problems require \(\Omega (n)\) bits of memory [3, 4] for a constant number of passes on n-vertex graphs. Any p-pass algorithm for Connectivity needs \(\Omega (n/p)\) bits of memory [2]. Single pass algorithms for Connectivity or Diameter need \(\Omega (n \log n)\) bits of memory on sparse graphs [5]. A 2-approximation of Diameter requires \(\Omega (n^{3/2})\) bits of memory on weighted graphs [4]. A naive streaming algorithm for Connectivity or Diameter stores the entire graph, using \(\mathcal {O}(m \log n) = \mathcal {O}(n^2 \log n)\) bits and a single pass. For Connectivity, union-find yields a 1-pass, \(\mathcal {O}(n \log n)\) bits of memory, algorithm [6].

An intriguing aspect on Diameter and Connectivity is that classic algorithms for them rely on breadth-first search (BFS) or depth-first search (DFS). These seem difficult to execute efficiently in a streaming setting. It was a longstanding open problem to compute a DFS tree using o(n) passes and \(o(m \log n)\) bits of memory. This barrier was recently broken [7], through an algorithm that uses \(\mathcal {O}(n/k)\) passes and \(\mathcal {O}(nk \log n)\) bits of memory, for any k. The situation for computing single-source shortest paths seems similar [8], although good approximations exist even on weighted graphs (see e.g. [6, 9]). We do know that DFS algorithms cannot be executed in logarithmic space [10]. In streaming, any BFS algorithm that explores k layers of the BFS tree must use at least k/2 passes or \(\Omega (n^{1+1/k}/{(\log n)}^{1/k})\) space [4]. Hence, much remains unexplored when it comes to graph exploration- and graph distance-related streaming problems such as BFS/DFS, Diameter, and Connectivity. In particular, most lower bounds hold for general graphs. As such, a more fine-grained view of the complexity of these problems has so far been lacking.

In this paper, we seek to obtain this fine-grained view using parameterized complexity [11]. The idea of using parameterized complexity in the streaming setting was first introduced by Fafianie and Kratsch [12] and Chitnis et al. [13]. Many problems are hard in streaming parameterized by their solution size [12,13,14]. Crucially, however, deciding whether a graph has a vertex cover of size k has a one-pass, \(\mathcal {O}(k^2 \log (k)\log (n))\) bits of memory kernelization algorithm by Chitnis et al. [15], and a \(2^k\)-passes, \(\mathcal {O}(k\log n)\) bits of memory direct algorithm by Chitnis and Cormode [14]. Bishnu et al. [16] then showed that knowing a vertex cover of size k is useful in solving other deletion problems using p(k) passes and \(f(k) \log n\) memory, notably H-free deletion; this approach was recently expanded on by Oostveen and van Leeuwen [17]. This leads to the more general question how knowing a (small) H-free modulator, that is, a set X such that \(G-X\) has no induced subgraph isomorphic to H (note that \(H=P_2\) in Vertex Cover [ k ]^{Footnote 1}), would affect the complexity of streaming problems and of BFS/DFS, Diameter, and Connectivity in particular. We are not aware of any investigations in this direction.

An important consideration is the streaming model (see [2, 18,19,20] or the survey by McGregor [6]). In the edge arrival (EA) model, each edge of the graph appears once in the stream, and the edges appear in some fixed but arbitrary order. Most aforementioned results use this model. In the vertex arrival (VA) model the edges arrive grouped per vertex, and an edge is revealed at its endpoint that arrives latest. In the adjacency list (AL) model the edges also arrive grouped per vertex, but each edge is present for both its endpoints. This means we see each edge twice and when a vertex arrives we immediately see all its adjacencies (rather than some subset dependent on the arrival order, as in the VA model). This model is quite strong, but as we shall see, unavoidable for our positive results. We do not consider dynamic streaming models in this paper, although they do exist. Note that all these streaming models concern edge streams, so even though we may talk about ‘arrival of a vertex’ this really means ‘arrival of the edges incident to a vertex’, and vertices are not separately present in the graph stream.

Our Contributions The main takeaway from our work is that the vertex cover number likely sits right at the frontier of parameters that are helpful in computing Diameter and Connectivity. As our main positive result, we show the following.

Theorem 1

Given a graph G as an AL stream and a vertex cover of G of size k in memory, Diameter [k] and Connectivity [k] can be solved using \(\mathcal {O}(2^kk)\) passes and \(\mathcal {O}(k \log n)\) bits of space or using one pass and \(\mathcal {O}(2^k + k \log n)\) bits of space.

The crux to our approach is to perform a BFS in an efficient manner, using \(\mathcal {O}(k)\) passes and \(\mathcal {O}(k \log n)\) space. Knowledge of a vertex cover is not a restricting assumption, as one may be computed using similar memory requirements [14, 15]. We will also show how to extend the single-pass result to work without a vertex cover being given, at the cost of increasing the memory use to \(\mathcal {O}(4^k + k \log n)\) bits of space.

As a contrasting result, we will show that in the VA model, even a constant-size vertex cover does not help in computing Diameter and Connectivity. Moreover, the bound on the vertex cover seems necessary, as we can prove that any p-pass algorithm for Diameter requires \(\Omega (n^2/p)\) bits of memory even on bipartite graphs and any p-pass algorithm for Connectivity requires \(\Omega (n/p)\) bits of memory, both in the AL model. This indicates that both the permissive AL model and a low vertex cover number are truly needed.

In some cases, we are also able to prove that a single-pass algorithm requires \(\Omega (n \log n)\) bits of memory.

More broadly, knowledge of being H-free (that is, not having a fixed graph H as an induced subgraph) or having an H-free modulator does not help even in the AL model. Here, \(H\not \subseteq _i G\) denotes that H is not an induced subgraph of G.

Theorem 2

For any fixed graph H with \(H \not \subseteq _i P_4\) and \(H \not = 3P_1, P_3+P_1,P_2+2P_1\), any streaming algorithm for Diameter in the AL model that uses p passes over the stream must use \(\Omega (n/p)\) bits of memory even on the class of H-free graphs.

We note that these results hold for H-free graphs (without the need for a modulator). The case when \(H \subseteq _i P_4\) is straightforward to solve with \(\mathcal {O}(\log n)\) bits of memory, as the diameter is either 1 or 2 (an induced path of length 3 is a \(P_4\)). If the graph has diameter 1, it is a clique. This can be tested in a single pass by counting the number of edges.

Theorem 3

For any fixed graph H with \(H \not = P_2 + sP_1\) for \(s \in \{0,1,2\}\) and \(H \not = sP_1\) for \(s \in \{1,2,3\}\) and any fixed constant \(k \ge 3\), any streaming algorithm for Diameter in the AL model that uses p passes over the stream must use \(\Omega (n/p)\) bits of memory even on the class of graphs G with a given set of vertices X with \(|X|= k\) such that \(G-X\) is H-free. If \(G-X\) must be connected and H-free, then additionally \(H \not = P_3\).

We note that the case when \(H = P_2\) or \(H=P_1\) is covered by Theorem 1. Cobipartite graphs seem to be a bottleneck class. The cases when \(H = 2P_1\) or when \(H = P_3\) and \(G-X\) must be connected lead to a surprising second positive result.

Theorem 4

Given a graph \(G = (V,E)\) as an AL stream and a set \(X\subseteq V\) of size k in memory such that \(G-X\) is a disjoint union of \(\ell \) cliques, Diameter [\(k, \ell \)] and Connectivity [\(k,\ell \)] can be solved using \(\mathcal {O}(2^kk\ell )\) passes and \(\mathcal {O}((k+\ell )\log n)\) bits of space or one pass and \(\mathcal {O}(2^k\ell + (k + \ell ) \log n)\) bits of space.

The approach for this result is similar as for Theorem 1. Moreover, we show a complementary lower bound in the VA model, even for \(\ell =1\) and constant k.

To summarize our results in words, generalizing Theorem 1 using the perspective of an H-free modulator does not seem to lead to a positive result (Theorem 3). Instead, connectivity of the remaining graph after removing the modulator seems crucial. However, this perspective only helps for Theorem 4, while the problem remains hard for most other H-free modulators and even for the seemingly simple case of a modulator to a path (we will show this in Theorem 24). While Theorem 4 would also hint at the possibility of using a modulator to a few components of small diameter, this also leads to hardness (we show this in Corollary 20).

We emphasize that all instances of Diameter in our hardness reductions are connected graphs. Hence, the hardness of computing Diameter is separated from the hardness of computing Connectivity.

For Connectivity, we also give two broad theorems that knowledge of being H-free or having an H-free modulator does not help even in the AL model.

Theorem 5

For any fixed graph H that is not a linear forest containing only paths of length at most 5, any streaming algorithm for Connectivity in the AL model that uses p passes over the stream must use \(\Omega (n/p)\) bits of memory even on the class of H-free graphs.

Theorem 6

For any fixed graph H that is not a linear forest containing only paths of length at most 1 and any fixed constant \(k\ge 2\), any streaming algorithm for Connectivity in the AL model that uses p passes over the stream must use \(\Omega (n/p)\) bits of memory even on the class of graphs G with a given set of vertices X with \(|X|= k\) such that \(G-X\) is H-free.

Our hardness results for H-free modulators for both Diameter and Connectivity have meaning for several standard graph parameters, we explicitly introduce and mention these parameters in the main body of the paper.

As a final result, we use our insights into graph exploration on graphs of bounded vertex cover to show a result on the Vertex Cover problem itself. In particular, a kernel on 2k vertices for Vertex Cover [k] can be obtained as a stream in \(\mathcal {O}(k^3)\) passes in the EA model using only \(\mathcal {O}(k \log n)\) bits of memory. In the AL model, the number of passes is only \(\mathcal {O}(k^2)\). This kernel still may have \(\mathcal {O}(k^2)\) edges, which means that saving it in memory would not improve over the result of Chitnis et al. [15] (which uses \(\mathcal {O}(k^2 \log (k)\log (n))\) bits of memory), up to the \(\log (k)\) factor and hidden constants. Indeed, a better kernel seems unlikely to exist [21]. However, the important point is that storing the (partial) kernel in memory is not needed during its computation. Hence, it may be viewed as a possible first step towards a streaming algorithm for Vertex Cover [k] using \(\mathcal {O}(k \log n)\) bits of memory and \(\text {poly}(k)\) passes, which is an important open problem in the field, see [14]. Our kernel is constructed through a kernel by Buss and Goldsmith [22], and then finding a maximum matching in an auxiliary bipartite graph (following Chen et al. [23]) of bounded size through repeated DFS applications.

Related work There has been substantial work on the complexity of graph-distance and reachability problems in the streaming setting. For example, Guruswami and Onak [24] showed that any p-pass algorithm needs \(n^{1+\Omega (1/p)}/p^{\mathcal {O}(1)}\) memory when given vertices s, t to test if s, t are at distance at most \(2p+2\) in undirected graphs or to test s–t reachability in directed graphs. Further work on directed s–t reachability [25] recently led to a lower bound that any \(o(\sqrt{\log n})\)-pass algorithm needs \(n^{2-o(1)}\) bits of memory [26]. Other recent work considers p-pass algorithms for \(\epsilon \)-property testing of connectivity [27,28,29], including strong memory lower bounds \(n^{1-\mathcal {O}(\epsilon \cdot p)}\) on bounded-degree planar graphs [30]. Further problems in graph streaming are extensively discussed and referenced in these works; see also [31].

In the non-streaming setting, the Diameter problem can be solved in \(\mathcal {O}(nm)\) time by BFS. There is a lower bound of \(n^{2-\epsilon }\) for any \(\epsilon > 0\) under the Strong Exponential Time Hypothesis (SETH) [32]. Parameterizations of Diameter have been studied with parameter vertex cover [33], treewidth [33,34,35], and other parameters [36, 37], leading to a \(2^{\mathcal {O}(k)} n^{1+\epsilon }\) time algorithm on graphs of treewidth k [33]. Running time \(2^{o(k)}n^{2-\epsilon }\) for graphs of treewidth k is not possible under SETH [34]. Subquadratic algorithms are known for various hereditary graph classes; see e.g. [38,39,40,41,42,43,44] and references in [39].

2 Preliminaries

We work on undirected, unweighted graphs. We denote a computational problem A with A [k], where [.] denotes the parameterization. A parameter k is an integer given as additional input. In parameterized complexity the aim is to find algorithms with running time \(f(k)\cdot n^{\mathcal {O}(1)}\), where f is some computable function. This notion was introduced by Downey and Fellows [11], and we refer the reader to [45] for more on parameterized complexity. In our setting, our aim will be to find streaming algorithms with a space complexity of \(O(f(k) \log n)\) for some computable function f. This space complexity class is dubbed Fixed Parameter Streaming by Chitnis and Cormode [14].

Diameter is to compute \(\max _{s,t\in V}d(s,t)\) where d(s, t) denotes the distance between s and t. Connectivity asks to decide whether or not the graph is connected. A twin class consists of all vertices with the same open neighbourhood. In a graph with vertex cover size k, we have \(\mathcal {O}(2^k)\) twin classes. For two graphs G, H, \(G+H\) denotes their disjoint union. We also use 2G to denote \(G+G\); 3G is \(G+G+G\), etc. \(H\not \subseteq _i G\) denotes that H is not an induced subgraph of G. A linear forest is a disjoint union of paths. A path on a vertices is denoted \(P_a\) and has length \(a-1\).

We employ the following problems in communication complexity.

The communication complexity necessary between Alice and Bob to solve Disjointness is well understood, and can be used to prove lower bounds on the memory use of streaming algorithms. This was first done by Henzinger et al. [2].

The values \(\gamma \) and \(\psi \) essentially break j down into the quotient (\(\psi \)) and the remainder (\(\gamma \)) with divisor \(\log n\). In essence, \(\psi \) tells us in which ‘block’ of \(\log n\) bits Bob should look, while \(\gamma \) tells us at which bit in a block Bob should look. Let it be clear that \(\psi \in [n]\) and \(\gamma \in [\log n]\). For computation, one can compute \(\gamma \) as \((j + \log n - \lceil \frac{j}{\log n} \rceil \times \log n)\). The problem of Permutation was created and applied first to the streaming setting by Sun and Woodruff [5]. The following formulations by Bishnu et al. [46] come in very useful.

Proposition 7

(Rephrasing of items (ii) of [46, Proposition 5.6]) If we can show a reduction from Disj\(_n\) to problem \(\Pi \) in streaming model \(\mathcal {M}\) such that in the reduction, Alice and Bob construct one model-\(\mathcal {M}\) pass for a streaming algorithm for \(\Pi \) by communicating the memory state of the algorithm only a constant number of times to each other, then any streaming algorithm working in the model \(\mathcal {M}\) for \(\Pi \) that uses p passes requires \(\Omega (n/p)\) bits of memory, for any \(p \in \mathbb {N}\) [47,48,49].

Proposition 8

(Rephrasing of item (iii) of [46, Proposition 5.6]) If we can show a reduction from Perm\(_n\) to a problem \(\Pi \) in the streaming model \(\mathcal {M}\) such that in the reduction, Alice and Bob construct one model-\(\mathcal {M}\) pass for a streaming algorithm for \(\Pi \) by communicating the memory state of the algorithm only a constant number of times to each other, then any streaming algorithm working in the model \(\mathcal {M}\) for \(\Pi \) that uses 1 pass requires \(\Omega (n\log n)\) bits of memory.

If we can show a reduction from either Disjointness or Permutation, we call a problem ‘hard’, as it does not admit algorithms using only poly-logarithmic memory.

Any upper bound for the EA model holds for all models, and an upper bound for the VA model also holds for the AL model. On the other hand, a lower bound in the AL model holds for all models, and a lower bound for the VA model also holds for the EA model.

3 Upper Bounds for Diameter

We give an overview of our upper bound results for Diameter in Table 1. The memory-efficient results rely on executing a BFS on the graph, which is made possible by both the parameter and the use of the AL model. The one-pass results rely on the possibility to save the entire graph in a bounded fashion. Our upper bounds assume the deletion set related to the parameter is given, that is, it is in memory.

Table 1 Overview of the algorithms and their complexity for Diameter and Connectivity

Full size table

Lemma 9

In a graph with vertex cover size k, any simple path has length at most 2k.

Proof

Let G be a graph with vertex cover size k. Consider some simple path P in the graph. Any vertex in the independent set of G (i.e. not in the vertex cover) that is on the path P only has neighbours in the vertex cover. Hence, for each vertex in the independent set the path visits, we also visit a vertex in the vertex cover. As the vertex cover has size k, any simple path can visit at most \(2k + 1\) vertices, as then all vertices in the vertex cover have been visited. \(\square \)

Lemma 9 is useful in that the diameter of such a graph can be at most 2k if the graph is connected. Our algorithm will simulate a BFS for 2k rounds, deciding on the distance of a vertex to all other vertices.

Lemma 10

Given a graph G as an AL stream with a vertex cover X of size k, we can compute the distance from a vertex v to all others using \(\mathcal {O}(k)\) passes and \(\mathcal {O}(k \log n)\) bits of memory.

Proof

We simulate a BFS originating at v for at most 2k rounds on our graph, using a pass for each round. Contrary to a normal BFS, we only remember whether we visited the vertices in X and their distances, to reduce memory complexity.

For every vertex \(w\in X\), we save its tentative distance d(w) from v; if this is not yet decided, this field has value \(\infty \). Our claim will be that after round i, the value of d(w) for vertices w within distance i from v is correct. We initialize the distance of v as \(d(v) = 0\) (we store d(v) regardless of whether \(v\in X\)).

Say we are in round \(i\ge 1\). We execute a pass over the stream. Say we view a vertex \(w \in X \cup \{v\}\) in the stream with its adjacencies. If w has a distance of d(w), we update the neighbours u of w in X to have distance \(d(u) = \min (d(u), d(w)+1)\). If instead we view a vertex \(w \notin X \cup \{v\}\) in the stream, we do the following. Locally save all the neighbours and look at their distances, and let z be the neighbour with minimum d(z) value. For every \(u\in N(w)\) we update the distance as \(d(u) = \min (d(z)+2, d(u))\). This simulates the distance of a path passing through w (note that this may not be the shortest path, but this may be resolved by other vertices). Executing this procedure for every vertex of G takes a single pass, as by the AL model we see all the adjacencies of a vertex when it arrives in the stream. This completes the procedure for round i.

Notice that we use only \(\mathcal {O}(k \log n)\) bits of memory during the procedure, and that the total number of passes is indeed \(\mathcal {O}\)(k) as we execute 2k rounds, using one pass each.

For the correctness, let us first argue the correctness of the claim after round i, the value of d(w) of vertices \(w\in X\) within distance i from v is correct. We proceed by induction, clearly the base case of 0 is correct. Now consider some vertex w at distance i from v. Consider a shortest path from v to w. Look at the last vertex on the path before visiting w. If this vertex is in X, then by induction, this vertex has a correct distance after round \(i-1\), and so, in round i this vertex will update the distance of w to be i. If this vertex is not in X, then it has a neighbour with distance \(i-2\), which is correct after round \(i-2\) by induction, and so, the vertex not in X will (have) update(d) the distance of w to be i in round i.

The correctness of the algorithm now follows from the claim, together with Lemma 9, and the fact that we can now output all distances using a single pass by either outputting the value of the field d(w) for a vertex \(w\in X\), or by looking at all neighbours of a vertex \(w \notin X\) and outputting the smallest value \(+1\). \(\square \)

Related is a lower bound result by Feigenbaum et al. [4], which says that any BFS procedure that explores k layers of the BFS tree must use at least k/2 passes or super-linear memory. This indicates that memory- and pass-efficient implementations of BFS, as in Lemma 10, are hard to come by.

We can now use Lemma 10 to construct an algorithm for finding the diameter of a graph parameterized by vertex cover, essentially by executing Lemma 10 for every twin class, which considers all options for vertices in the graph.

Theorem 11

Given a graph G as an AL stream with vertex cover X of size k, we can solve Diameter [k] in \(\mathcal {O}(2^kk)\) passes and \(\mathcal {O}(k \log n)\) bits of memory.

Proof

We enumerate all the twin classes of the neighbourhood in X of vertices (which we can do with k bits), and for each such a class, we find if there is a vertex realizing this class in a pass. Then, we call the algorithm of Lemma 10 with this vertex as v. Instead of outputting all distances, we are only interested in the largest distance found (which may be \(+\,\infty \)). We also call Lemma 10 for every \(x\in X\) with x as the vertex v. We keep track of the largest distance found over all calls to Lemma 10, and output this value as the diameter.

The correctness follows from the correctness of Lemma 10, together with the fact that considering each twin class of the neighbourhood in X combined with all vertices in X actually considers all possible vertices that may occur in G, and so we also consider one of the vertices of the diametric pair in one of these iterations. \(\square \)

We show an alternative one-pass algorithm, by saving the graph as a representation by its twin classes, thereby completing the proof of Theorem 1.

Theorem 12

Given a graph G as an AL stream, we can solve Diameter [k] in one pass and \(\mathcal {O}(4^k + k \log n)\) bits of memory, or correctly report that a vertex cover of size k does not exist. When a vertex cover of size k is given, the memory use is \(\mathcal {O}(2^k + k \log n)\).

Proof

In our pass, we greedily construct a vertex cover of size 2k by maintaining a maximal matching. If at any point the matching exceeds 2k vertices, we report that no vertex cover of size k exists. We can characterize vertices not in the vertex cover by their adjacencies towards the vertex cover, i.e. the binary string of at most 2k bits with a 1 if the vertex is adjacent. We call this binary string the characterization of a vertex. In the pass, we also keep track of the adjacency matrix of edges within the vertex cover, and the characterization of vertices not in the vertex cover. Seeing a vertex, we either add it to the vertex cover if it has a neighbour that is not in the vertex cover (add that edge to the matching), in which case we can update the edges within the vertex cover. Otherwise, a vertex has only neighbours in the vertex cover, which means we can save its characterization. Any edge in the vertex cover will be registered, as when the second of its two vertices is added to the vertex cover, we will register the presence of this edge. Any edge with one endpoint v not in the vertex cover will be registered when saving the characterization of v, which, at that point in the stream, can only have neighbours in the vertex cover (otherwise it would have been added too).

There are only \(\mathcal {O}(2^{2k})\) different characterizations of adjacencies to the vertex cover of size 2k, and hence, for each we can save one or two bits whether there is a vertex with this neighbourhood and whether there is more than one. The procedure above can decide such properties locally using \(\mathcal {O}(k \log n)\) bits. The adjacency matrix of the vertex cover takes \(\mathcal {O}(k^2)\) bits. So, we can save all this information and the vertex cover itself using \(\mathcal {O}(4^k + k \log n)\) bits. When a vertex cover is given, there are only \(\mathcal {O}(2^{k})\) different characterizations and so we use only \(\mathcal {O}(2^k + k \log n)\) bits.

Next we argue that this information is enough to decide on Diameter. We can use a simple enumeration technique to find the diameter of the graph. To do this, for every pair, we find the distance between them, and keep track of the largest distance found. For a given pair of vertices (given by their adjacencies towards the vertex cover, or a vertex in the vertex cover itself), we can decide on the distance between them using a procedure similar to Lemma 10 but internally instead of making actual passes over the stream. \(\square \)

We have seen all the elements of Theorem 1.

Proof of Theorem 1

This follows immediately from Theorems 11 and 12. \(\square \)

Next, we show that the idea of simulating a BFS extends to another similar setting, where instead of a bounded vertex cover we have a bounded deletion distance to \(\ell \) cliques. The good thing about cliques is that we need not search in them, the distances in a clique are known if we know the smallest distance to some vertex in the clique. However, we will need to save the smallest distance to each clique to propagate distances in the network as different vertices in a single clique can have many different adjacencies to the deletion set. This is the reason we require a bounded number of cliques.

Lemma 13

In a graph G with deletion distance k to \(\ell \) cliques, any shortest path between two vertices is of length at most \(3k + 1\), if it exists.

Proof

Let G be a graph with deletion distance k to \(\ell \) cliques. Consider some shortest path between two vertices v, w. Any vertex on this path that is not v, w in one of the \(\ell \) cliques must have as one of its neighbours on the path a vertex in the deletion set. If the path contains more than one edge from a single clique, it is not a shortest path. Hence, the path has length at most \(3k + 1\). \(\square \)

Lemma 13 indicates that we can use a similar approach in simulating a BFS of bounded depth to find distances.

Lemma 14

Given a graph G as an AL stream with deletion distance k to \(\ell \) cliques, with the given deletion set X, we can decide on the distance from one vertex v to all others using \(\mathcal {O}(k)\) passes and \(\mathcal {O}((k+\ell ) \log n)\) bits of memory.

Proof

Similar to Lemma 10, we simulate a BFS originating at v of at most \(3k+1\) rounds, using a pass for each round. We only remember distances for vertices in the deletion set, and the smallest distance in each clique. This way the memory complexity remains small.

The setup of the algorithm is as follows. For every vertex in the deletion set X, and for every clique, we save its distance from v, if this is not yet decided this field has value \(+\infty \). For a clique, this value means the smallest distance from v to some vertex of the clique. Let us denote d(w) as the value of this field for a vertex \(w\in X\) or a clique with which we associate a (non-existent) vertex w. Our claim is that after round i, the fields d(w) for w within distance i from v are correct. We initialize \(d(v) = 0\), and if it is contained in a clique, then set \(d(w) = 0\) for the associated vertex w.

Let us describe the workings of a round, say round \(i\ge 1\). We make a pass over the stream, and for each vertex we do the following. If we see a vertex \(w\in X\) with distance d(w), we update the distances of its neighbours \(u\in N(w)\) as \(\min (d(u), d(w)+1)\). If we see a vertex c contained in some clique with associated vertex w, we look at its neighbours in \(N(c) \cap X\). If a neighbour u of c has \(d(u) + 1 \le d(w)\), we update d(w), as c realizes this distance in the clique. Therefore, we update the neighbours \(u \in N(c) \cap X\) with \(d(u) = \min (d(u), d(w)+1)\). Otherwise, d(w) is not realized by c, but by another vertex of the clique, and so, we can update the neighbours u of c in X with \(d(u) = \min (d(u), d(w) + 2)\). This concludes what we do in a round.

Notice that we can always identify which clique a vertex not in X belongs to, as the AL stream provides all its neighbours.

The correctness of this algorithm quickly follows from the correctness of Lemma 10 combined with how we handle cliques here, and Lemma 13. Let it also be clear that we use \(\mathcal {O}(k)\) passes and use \(\mathcal {O}((k+\ell )\log n)\) bits of memory. \(\square \)

Using Lemma 14 we can decide the diameter of the graph by calling it many times for the possible vertices in the graph.

Theorem 15

Given a graph G as an AL stream with deletion distance k to \(\ell \) cliques, with the given deletion set X, we can solve Diameter [\(k,\ell \)] using \(\mathcal {O}(2^k\ell k)\) passes and \(\mathcal {O}((k+\ell ) \log n)\) bits of memory.

Proof

We can enumerate all the twin classes of neighbourhoods in X (with k bits), and for each class use a pass to find at most \(\ell \) vertices which realize this neighbourhood in X (at most one from each clique, two vertices from the same clique with the same neighbourhood in X are equivalent). We call the algorithm of Lemma 14 for each of these vertices as v, and also for each of the vertices of X as v. We keep track of the largest distance found, and output it as the diameter after all calls. This results in an algorithm using \(\mathcal {O}(2^k\ell \cdot k)\) passes and \(\mathcal {O}((k+ \ell ) \log n)\) bits of memory.

The correctness follows from the correctness of Lemma 14, together with the fact that we consider all vertices as a start node, up to equivalence. This is because we can characterize each vertex by its adjacencies towards X together with the clique it is contained in, which identifies the vertex up to equivalence on the closed neighbourhood. \(\square \)

The performance of the algorithm in Theorem 15 is distinctly worse than that of Theorem 11, however, it does allow for more flexibility in the input in some specific cases. Note that the number of passes especially is exponential in k, but only linear in \(\ell \), so a graph that is very close to a number of big cliques is well suited to apply this algorithm to.

For this setting, there also is a one pass but high memory approach.

Theorem 16

Given a graph G as an AL stream with deletion distance k to \(\ell \) cliques, with the given deletion set X, we can solve Diameter [\(k,\ell \)] using one pass and \(\mathcal {O}(2^k\ell + (k + \ell ) \log n)\) bits of memory.

Proof

The approach here is similar to that of Theorem 12. In our pass, we save the deletion set X and its internal edges. Next to this, every vertex not in X can be characterized by its adjacencies towards X, together with what clique it is in. Therefore, each vertex can be characterized by \(k + \log \ell \) bits, and for each option, we save whether we have 0, 1, or more of this vertex (this takes two bits). We need \(\mathcal {O}(\ell \log n)\) bits to be able to identify which clique a vertex is in, as we can save a representative for each clique and identify a vertex by its adjacency to one of the representatives. We can find this information in our pass because it is an AL stream, and this takes \(\mathcal {O}(2 \cdot 2^{k + \log \ell } + (k + \ell ) \log n) = \mathcal {O}(2^k\ell + (k + \ell ) \log n)\) bits of memory.

To solve Diameter, we now only need to execute a procedure like that in Lemma 14 for every pair of vertices, deciding on the distance between them. The theorem follows. \(\square \)

We have now seen all the elements of Theorem 4. We are not aware of any algorithms to compute the parameter distance to \(\ell \) cliques.

Proof of Theorem 4

This follows immediately from Theorems 15 and 16. \(\square \)

4 Lower Bounds for Diameter

We work with reductions from Disj\(_n\), and we construct graphs where Alice controls some of the edges, and Bob controls some of the edges, depending on their respective input of the Disj\(_n\) problem, and some parts of the graph are fixed. The aim is to create a gap in the diameter of the graph, that is, the answer to Disj\(_n\) is YES if and only if the diameter is above or below a certain value. The lower bound then follows from Proposition 7. Here n may be the number of vertices in the graph construction, but may also be different (possibly forming a different lower bound). Our lower bounds hold for connected graphs. An overview of all hardness results for Diameter is given in Table 2.

Table 2 An overview of the lower bounds for Diameter, with the parameter (k) on the left. These results hold for connected graphs. \((\mathcal {M}, m, p)\)-hard means that any algorithm using p passes in model \(\mathcal {M}\) (or weaker) requires \(\Omega (m)\) bits of memory. FVS stands for Feedback Vertex Set number, FEN for Feedback Edge Set number

Full size table

We start by proving simple lower bounds for the VA model when our problem is parameterized by the vertex cover number, and when our problem is parameterized by the distance to \(\ell \) cliques. This shows that we actually need the AL model to achieve the upper bounds in Sect. 3. The constructions are illustrated in Fig. 1 and Fig. 2. We use the convention that a-vertices (b-vertices) and their incident edges are controlled by Alice (Bob).

Theorem 17

Any streaming algorithm for Diameter on graphs of vertex cover number at least 3 in the VA model that uses p passes over the stream requires \(\Omega (n/p)\) bits of memory.

Proof

Let x, y be the input to \(\textsc {Disj}_n\) of Alice and Bob, respectively. Assume we have a streaming algorithm for Diameter in the VA model. We construct a graph as illustrated in Fig. 1. First Alice reveals to the stream n vertices \(v_1, \ldots , v_n\) with no edges, then she reveals the vertex c which is connected to all the n vertices. Now we associate the n vertices with the indices of x and y, associate vertex \(v_i\) with index i. Alice reveals a vertex a, and for each index i she reveals an edge between a and \(v_i\) when the entry at the index i in x is a 1. After this, she passes the memory state of the algorithm to Bob. Bob now reveals a vertex b and similar to Alice, reveals an edge between b and \(v_i\) when the entry at index i in y is a 1. This completes the construction of the graph, and thus the stream. Let it be clear that this is a VA stream that Alice and Bob can construct without knowing input of the other. The graph is always connected because if either Alice or Bob has an all-zeroes input, the problem of \(\textsc {Disj}_n\) is trivially solvable (so \(\textsc {Disj}_n\) is equally hard ignoring this case).^{Footnote 2}

We now claim that the diameter of this graph is at most 3 when the answer to \(\textsc {Disj}_n\) is NO, and otherwise the diameter is 4.

Let us assume that the answer to \(\textsc {Disj}_n\) is NO, that is, there is an index i such that \(x_i = y_i = 1\). Then clearly, the distances between a, b and c are all 2, by viewing the paths using \(v_i\) as an intermediate vertex. Hence the diameter is at most 3, which can be formed by some path from e.g. a to \(v_i\) to c to some \(v_j\) that is non-adjacent to a.

Now assume the answer to \(\textsc {Disj}_n\) is YES, that is, there is no index i such that \(x_i = y_i = 1\). Now consider the distance between a and b. To get from a to b, we need to go from a to some \(v_i\) (which is non-adjacent to b by the assumption), then go to c and to another \(v_j\) which is adjacent to b (but not a), to go to b. This path has length 4, and must exist by the non-all-zeroes input assumption, and forms a shortest path from a to b in this graph. So the diameter of the graph is 4.

In conclusion, we constructed a connected graph of size \(n+3\) with a vertex cover number of 3 (taking \(\{a,b,c\}\) suffices) that can be given as a VA stream to a streaming algorithm for Diameter to solve the \(\textsc {Disj}_n\) problem. The theorem follows from Proposition 7. \(\square \)

Theorem 18

Any streaming algorithm for Diameter on graphs of distance 2 to \(\ell = 1\) clique in the VA model that uses p passes over the stream requires \(\Omega (n/p)\) bits of memory.

Proof

Let x, y be the input to \(\textsc {Disj}_n\) of Alice and Bob, respectively. Assume we have a streaming algorithm for Diameter in the VA model. We construct a graph as illustrated in Fig. 2. Start with a clique on \(n+2\) vertices, \(v_0, \ldots , v_{n+1}\). Let a, b be two vertices not in the clique, and add the edges \((a,v_0)\) and \((b,v_{n+1})\). Then, for any i, Alice adds the edge \((a,v_i)\) when \(x_i=1\) and Bob adds the edge \((b,v_i)\) when \(y_i = 1\). This completes the construction. Alice and Bob construct the VA stream as follows. First Alice reveals \(v_0, \ldots , v_{n+1}\) and then reveals a (for which she knows what edges should be present). Then Alice passes the memory of the algorithm to Bob, who reveals b, which completes the stream. Notice that the graph is always connected by the fixed edges \((a,v_0)\) and \((b,v_{n+1})\).

We now claim that the diameter of this graph is 2 when the answer to \(\textsc {Disj}_n\) is NO, and otherwise the diameter is at least 3.

Let us assume that the answer to \(\textsc {Disj}_n\) is NO, that is, there is an index i such that \(x_i = y_i = 1\). Notice how the distance between a and b is now 2, because both are connected to \(v_i\). The distance between any other pair of vertices is also at most 2, because all vertices except a and b form a clique.

Now assume the answer to \(\textsc {Disj}_n\) is YES, that is, there is no index i such that \(x_i = y_i = 1\). The shortest path between a and b in this instance must use some edge in the clique, as these vertices do not have a common neighbour. Hence, the distance between a and b is at least 3.

In conclusion, we constructed a connected graph of size \(n+4\) with a distance 2 to \(\ell = 1\) clique (taking \(\{a,b\}\) suffices) that can be given as a VA stream to a streaming algorithm for Diameter to solve the \(\textsc {Disj}_n\) problem. The theorem follows from Proposition 7. \(\square \)

The lower bounds in Figs. 1 and 2 do not work for the AL model because there are vertices that may be adjacent to both a and b, so neither Alice nor Bob can produce the adjacency list of such a vertex alone. For the ‘Simple VA’ construction, we can ‘fix’ this by extending these vertices to edges but this is destructive to the small vertex cover number of the construction, see Fig. 3. It should be clear that AL reductions require care: the set of edges incident to a vertex has to be fully determined when Alice or Bob wants to reveal it.

Theorem 19

Any streaming algorithm for Diameter that works on the ‘Simple AL’ construction in the AL model using p passes over the stream requires \(\Omega (n/p)\) bits of memory.

Proof

Let x, y be the input to \(\textsc {Disj}_n\) of Alice and Bob, respectively. We construct a graph as illustrated in Fig. 3. The graph consist of \(2n + 3\) vertices. This is a matching M on 2n vertices. We make a vertex c adjacent to all vertices of M. Next to this, we have vertices a and b of which the adjacencies towards one end of M are dependent on the input of Alice and Bob, respectively. An edge between a and ith edge on the ‘left’ side of M is present when Alice has a 1 on index i in x. An edge between b and ith edge the ‘right’ side of M is present when Bob has a 1 on index i in y. Assuming we have an algorithm that can work on this graph, to construct an AL stream for the algorithm, Alice can first reveal a, c, and all vertices on the left side of M, then pass the memory of the algorithm to Bob, who reveals b and all the vertices on the right side of M. This completes one pass of the stream. The graph is always connected because if either Alice or Bob has an all-zeroes input, the problem of \(\textsc {Disj}_n\) is trivially solvable (so \(\textsc {Disj}_n\) is equally hard ignoring this case).

We now claim that the diameter of the ‘Simple AL’ graph is at most 3 when the answer to \(\textsc {Disj}_n\) is NO, and otherwise the diameter is at least 4.

Let us assume that the answer to \(\textsc {Disj}_n\) is NO, that is, there is an index i such that \(x_i = y_i = 1\). Notice that the distance from c to any vertex is at most 2. The distance from a to b is 3 by taking the ith matching edge. The distances from a to any other vertex is at most 3 by going through c, and the same holds for b. So the diameter is at most 3.

Now assume the answer to \(\textsc {Disj}_n\) is YES, that is, there is no index i such that \(x_i = y_i = 1\). Consider the distance between a and b. As there is no index where both have a 1, the shortest path from a to b must use c as an intermediate vertex. But then this path has length at least 4. So the diameter is at least 4.

We conclude that any algorithm that can solve the Diameter problem on the graph construction ‘Simple AL’ in the AL model in p passes, must use \(\Omega (n/p)\) bits of memory by Proposition 7. \(\square \)

The following follows from Theorem 19 by observing some properties of the ‘Simple AL’ construction in Fig. 3.

Corollary 20

Any streaming algorithm for Diameter in the AL model that uses p passes over the stream must use \(\Omega (n/p)\) bits of memory, even on graphs for which the algorithm is given a

1.
Deletion Set to Matching of size at least 3,
2.
Deletion Set to \(\ell \) components of diameter x of size at least 2, \(x\ge 2\),
3.
Dominating Set of size at least 3,
4.
Deletion Set to a depth \(\ell \) tree of size at least 3, \(\ell \ge 2\).

Proof

The corollary follows from Theorem 19, together with observing that the construction of ‘Simple AL’ is

1.
a matching when removing \(\{a, b, c\}\),
2.
one component of diameter 2 when removing \(\{a,b\}\),
3.
dominated by the set of vertices \(\{a,b,c\}\),
4.
a tree of depth 2 when we add a vertex d adjacent to the left part of M,^{Footnote 3} and remove \(\{a, b, c\}\).

\(\square \)

Next, we construct a lower bound for a special case, when the input graph is a tree, see Fig. 4.

Theorem 21

Any streaming algorithm for Diameter that works on the ‘Windmill’ construction in the AL model using p passes over the stream requires \(\Omega (n/p)\) bits of memory.

Proof

Let x, y be the input to \(\textsc {Disj}_n\) of Alice and Bob, respectively. We construct a graph as illustrated in Fig. 4. The graph consist of \(5n + 6\) vertices. We start with a path P on 6 vertices, call the vertex on one of the ends the center. To this center, we will ‘glue’ n gadgets, which may vary depending on the input of Alice and Bob, so associate an index i with each gadget. Each gadget adds 5 vertices to the graph. Let us describe one such gadget. Consider two triplets of vertices \(a_{i,1},a_{i,2},a_{i,3}\) (Alice) and \(b_{i,1},b_{i,2},b_{i,3}\) (Bob), and connect \(a_{i,3}\) to \(b_{i,1}\) with an edge. For Alice, if the entry at index i is a 0, she inserts the edges \((a_{i,1},a_{i,2})\) and \((a_{i,1},a_{i,3})\), and if it is a 1, she inserts \((a_{i,1},a_{i,2})\) and \((a_{i,2},a_{i,3})\). Bob does the same for his triplet. We ‘glue’ this gadget into the graph by identifying \(a_{i,1}\) to be the same vertex as the center vertex in the graph. Assuming we have an algorithm that works on this graph, to construct an AL stream containing this graph, Alice first reveals the path P, and all her own vertices \(a_{i,2}, a_{i,3}\) for all i, with all the incident edges. Then she passes the memory of the algorithm to Bob who reveals the vertices \(b_{i,1}, b_{i,2}, b_{i,3}\) for all i, with all the incident edges. This completes one pass of the stream. Notice that the graph is connected.

We now claim that the diameter of the ‘Windmill’ graph is at least 10 when the answer to \(\textsc {Disj}_n\) is NO, and otherwise the diameter is at most 9.

Let us assume that the answer to \(\textsc {Disj}_n\) is NO, that is, there is an index i such that \(x_i = y_i = 1\). Then, the distance from the end of the path P to \(b_{i,3}\) is exactly 10, and this is the only simple path between these vertices, so it is the shortest path. Hence, the diameter is at least 10.

Now assume the answer to \(\textsc {Disj}_n\) is YES, that is, there is no index i such that \(x_i = y_i = 1\). Then, the distance from the center vertex to any other vertex \(a_{i,j}\) or \(b_{i,j}\) is at most 4, as at least one of the triplets for each index forms a tree-like shape and not a path. Therefore, the diameter is at most 9, formed by the shortest path from the end of the path P to some \(b_{i,3}\).

We conclude that any algorithm that can solve the Diameter problem on the graph construction ‘Windmill’ in the AL model in p passes, must use \(\Omega (n/p)\) bits of memory by Proposition 7. \(\square \)

The following follows from Theorem 21 by observing properties of the ‘Windmill’ construction.

Corollary 22

Any streaming algorithm for Diameter in the AL model that uses p passes over the stream must use \(\Omega (n/p)\) bits of memory, even on graphs for which the algorithm is given

1.
that the input is a bounded depth tree,
2.
that the Maximum Degree is a constant of at least 3.

Proof

The corollary follows from Theorem 21, together with observing that ‘Windmill’ is

1.
a bounded depth tree,
2.
a lower bound that still works when we convert the center vertex into a binary tree of depth \(c = \mathcal {O}(\log n)\) and extend the path to size \(5 + c\) accordingly (this makes the diameter distinction to be \(9 + 2c\) or \(10 + 2c\)).

\(\square \)

Next we look at the case when the input graph is close to a path.

Theorem 23

Any streaming algorithm for Diameter that works on the ‘Diamond’ construction in the AL model using p passes over the stream requires \(\Omega (n/p)\) bits of memory.

Proof

Let x, y be the input to \(\textsc {Disj}_n\) of Alice and Bob, respectively. We construct a graph as illustrated in Fig. 5. We construct a graph on \(10n+3\) vertices. We start with two vertices a and b, connected by an edge. Create \(n + 1\) vertices connected to a and b with an edge and label them \(c_0, \ldots c_n\). We create a gadget for index i between \(c_{i-1}\) and \(c_i\) for every \(1\le i \le n\). For index i, we insert a path \(P_i = p_{i,1}, \ldots , p_{i,9}\) on 9 vertices with \(p_{i,1}\) connected to \(c_{i-1}\) and \(p_{i,9}\) to \(c_i\) with an edge. The edges \((a,p_{i,2})\) and \((a,p_{i,6})\) are present if and only if \(x_i = 0\) for Alice, and the edges \((b, p_{i,4})\) and \((b,p_{i,8})\) are present if and only if \(y_i = 0\) for Bob. Assuming we have an algorithm that works on this construction, to construct an AL stream containing this graph, Alice first reveals all vertices except b and \(p_{i,4}\) and \(p_{i,8}\) for all i (the edges are fixed, or the input of Alice decides the edges, for these vertices), then passes the memory state of the algorithm to Bob who reveals exactly b and \(p_{i,4}\) and \(p_{i,8}\) for all i. This completes one pass of the stream. Notice that the graph is connected.

We now claim that the diameter of the ‘Diamond’ graph is at least 8 when the answer to \(\textsc {Disj}_n\) is NO, and otherwise the diameter is at most 7.

Let us assume that the answer to \(\textsc {Disj}_n\) is NO, that is, there is an index i such that \(x_i = y_i = 1\). Then, consider \(p_{i,5}\). The distance from this vertex to a or b is exactly 6. So, the distance from \(p_{i,5}\) to some other \(p_{j,5}\) for \(j\ne i\) is at least 8.

Now assume the answer to \(\textsc {Disj}_n\) is YES, that is, there is no index i such that \(x_i = y_i = 1\). Then for any \(1\le i \le n\) and \(1\le j\le 9\) the distance from \(p_{i,j}\) to one of either a or b is at most 3. So every vertex in the graph has distance at most 3 to either a or b. But then, as a and b are connected with an edge, the diameter must be at most 7.

We conclude that any algorithm that can solve the Diameter problem on the graph construction ‘Diamond’ in the AL model in p passes, must use \(\Omega (n/p)\) bits of memory by Proposition 7. \(\square \)

The following follows from Theorem 23 by observing properties of the ‘Diamond’ construction.

Corollary 24

Any streaming algorithm for Diameter in the AL model that uses p passes over the stream must use \(\Omega (n/p)\) bits of memory, even on graphs for which the algorithm is given a Deletion Set to a path of size at least 2.

Proof

The corollary follows from Theorem 23, together with observing that ‘Diamond’ is a path when we remove \(\{a,b\}\). \(\square \)

Next we show that the Diameter problem in the AL model is hard on split graphs.

Theorem 25

Any streaming algorithm for Diameter that works on split graphs in the AL model using p passes over the stream requires \(\Omega (n/p)\) bits of memory.

Proof

Let x, y be the input to \(\textsc {Disj}_n\) of Alice and Bob, respectively. We construct a graph as illustrated in Fig. 6. We construct a graph on \(4n+2\) vertices. The split graph we construct has as a clique on \(2n+2\) vertices, let this be the vertices \(a_0, \ldots , a_n\), \(b_0, \ldots , b_n\). The independent set consists of the vertices \(a_1',\ldots , a_n'\), \(b_1',\ldots , b_n'\). The following edges are present regardless of the input: we connect \(a_0\) to all \(b_i'\), \(1\le i \le n\), and similarly \(b_0\) to all \(a_i'\), \(1\le i \le n\). We also connect each \(a_i\) to all \(b_j'\) where \(j\ne i\) for \(1 \le i \le n\). Similarly, we connect each \(b_i\) to all \(a_j'\) where \(j\ne i\) for \(1 \le i \le n\). The other edges are input dependent. For each index i, the edge \((a_i, a_i')\) is inserted when \(x_i = 1\) and otherwise the edge \((a_0, a_i')\) is inserted. Similarly, for each i, the edge \((b_i, b_i')\) is inserted when \(y_i = 1\) and otherwise the edge \((b_0, b_i')\) is inserted. This completes the construction. Note that it is a split graph as the vertices \(a_0, \ldots , a_n\), \(b_0, \ldots , b_n\) form a clique and \(a_1',\ldots , a_n'\), \(b_1',\ldots , b_n'\) form an independent set. Assuming we have an algorithm that works on split graphs, to construct an AL stream containing this graph, Alice reveals all the a-vertices. Then she passes the memory of the algorithm to Bob, who reveals all the b-vertices. This completes one pass of the stream. Notice that Alice and Bob do not require information on the input of the other, as only input-independent edges connect a-vertices to b-vertices.

We claim that the diameter of this graph is at most 2 if the answer to \(\textsc {Disj}_n\) is YES and otherwise the diameter is at least 3.

Let us assume the answer to \(\textsc {Disj}_n\) is NO, that is, there is an index i such that \(x_i = y_i = 1\). Consider the distance between \(a_i'\) and \(b_i'\) in this instance. Notice that because \(x_i = y_i = 1\), there is no vertex in the clique connected to both vertices. As both vertices are connected to some vertex in the clique, the distance between them must be 3.

Now assume the answer to \(\textsc {Disj}_n\) is YES, that is, there is no index i such that \(x_i = y_i = 1\). We show that the diameter is at most 2. The distance between vertices in the clique is at most 1. The distance from any \(a_i^{\prime }\) or \(b_i'\) to some vertex in the clique is at most 2, because each \(a_i'\) or \(b_i'\) is always connected to at least one vertex in the clique. The distance from any \(a_i'\) to another \(a_j'\) is 2 because of \(b_0\), similarly, the distance from any \(b_i'\) to another \(b_j'\) is 2 because of \(a_0\). Let \(1 \le i,j \le n\) be two (possibly the same) indices, and consider the distance between \(a_i'\) and \(b_j'\). If either \(x_i = 0\) or \(y_j = 0\) then the distance is 2 because of \(a_0\) or \(b_0\). Otherwise, both have a 1 at the corresponding index. But then we know that \(i \ne j\), and so \(a_i', a_i, b_j'\) is a path in the graph of length 2. Hence, the diameter of the graph is at most 2.

We conclude that any algorithm that can solve the Diameter problem on split graphs in the AL model in p passes, must use \(\Omega (n/p)\) bits of memory by Proposition 7. \(\square \)

We can now prove Theorems 2 and 3. Intuitively, if H contains a cycle or a vertex of degree 3, a modification of ‘Windmill’ is H-free; if H is a linear forest, a modification of ‘Split’ is (almost) H-free.

Proof of Theorem 2

If H contains a cycle as a subgraph, then the result follows from Theorem 21. Hence, we may assume that H does not contain a cycle as a subgraph, and thus is a forest.

If H contains a tree with a vertex of degree at least three, then the result follows from a slight modification of Theorem 21. Start from the version of the construction where each vertex has degree at most 3 (per Corollary 22) and let c denote the center vertex. Note that the diametral pair in the construction is the other end \(v\not =c\) of the path P (recall that c is one of its ends) and another leaf of the tree. Hence, we can add edges (which shorten distances) as long as this property is preserved. Consider the tree to be rooted at v. Make the two children of c (which are not on P by the choice of the root) adjacent, and recurse down the tree, consistently making children adjacent if there are two. The resulting graph has no \(K_{1,3}\) as an induced subgraph, and thus is H-free. Hence, we may assume that H also does not contain a vertex of degree at least three, and thus is a linear forest.

We now reduce the open cases to just \(H=4P_1\) and \(H=P_4+P_1\) and later show hardness for those cases. If H contains a \(2P_2\) as an induced subgraph, then the result follows from Theorem 25, as split graphs are \(2P_2\)-free. Hence, H does not contain \(2P_2\) as an induced subgraph. In particular, we may assume that all paths in H are of length at most 3. If H is the union of a \(P_4\) and either at least two other paths or another path of length at least 2, then it contains a \(4P_1\). In the other cases when H is the union of a \(P_4\) and other paths, it is \(P_4+P_2\) (which contains a \(2P_2\), and thus was already excluded) or \(P_4+P_1\). If H is the union of a \(P_3\) and at least two other paths, it contains a \(4P_1\). If H is the union of a \(P_3\) and another path of length at least 1, then it contains a \(2P_2\) and thus was already excluded. The case when H is \(P_3+P_1\) has been excluded by assumption. Hence, we may assume that H contains only paths of length at most 1. If H contains a \(P_2\), then it cannot contain another \(P_2\), as \(2P_2\) would be an induced subgraph, nor can it be \(P_2+P_1\) which is an induced subgraph of \(P_4\), nor can it be \(P_2+sP_1\) for \(s \ge 2\) which is \(P_2+2P_1\) (excluded by assumption) or contains a \(4P_1\). If H does not contain a \(P_2\), then it is \(sP_1\) for some s. However, \(P_1\) and \(2P_1\) are induced subgraphs of \(P_4\), \(3P_1\) is excluded by assumption, and \(sP_1\) for \(s \ge 4\) contains \(4P_1\) as an induced subgraph. Hence, the open cases have been successfully reduced.

To tackle the remaining cases, \(H=4P_1\) or \(H=P_4+P_1\), we modify the construction of Theorem 25. Let (C, I) be the split partition implied by the construction. In that construction, it can be readily seen that the vertices \(a'_1,\ldots ,a'_n\) can be turned into a clique \(A'\) and the vertices \(b'_1,\ldots ,b'_n\) can be turned into a clique \(B'\) without affecting the correctness of the reduction. Observe that the resulting graph is \(4P_1\)-free, as it is a union of three cliques. To see that it is also \(P_4+P_1\)-free, note that in any induced subgraph isomorphic to \(P_4+P_1\), the \(P_4\) must contain two consecutive vertices in \(A'\), say \(a'_i\) and \(a'_j\) for some \(i \not = j\), and two consecutive vertices \(c,c'\) in C (the case when it contains two consecutive vertices in \(B'\) and C is symmetric). Note that \(c,c' \not \in \{b_0,b_1,\ldots ,b_n\}\) or \(c'\) (the end of the \(P_4\)) would be adjacent to \(a'_i\) or \(a'_j\). Moreover, \(c,c' \not \in \{a_0,a_1,\ldots ,a_n\}\), as they would jointly cover \(B'\), leaving no room for the \(P_1\). The theorem follows. \(\square \)

Proof of Theorem 3

We start with the case when \(G-X\) has to be a connected graph. By Theorem 2, only the cases when \(H \subseteq _i P_4\) and \(H= 3P_1\), \(H =P_3+P_1\), \(H=P_2+2P_1\) still need to be proven.

If H is a \(P_4\), then the result follows from Theorem 19, because in that construction for some X of size 2, \(G-X\) is a union of triangles where the triangles have a single common vertex c. Hence, we may assume H has only paths of length at most 2. If H is \(P_3+P_1\), then the result follows again from Theorem 19, because in that construction the vertex c is dominating, yet must be in any induced \(P_3\). We also note that by assumption, \(H \not = P_3\). Hence, we may assume H only has paths of length at most 1. These cases are all resolved by assumption.

In case \(G-X\) does not have to be connected, the only relevant case is when \(H = P_3\). In that case, the result follows from Theorem 19, because in that construction for some X of size 3, \(G-X\) is a matching. \(\square \)

We can also prove a quadratic bound for general graphs; see Fig. 7 for the construction.

Theorem 26

Any streaming algorithm for Diameter on general (dense) graphs in the AL model using p passes over the stream requires \(\Omega (n^2/p)\) bits of memory.

Proof

Let \(N = n^2\). Let x, y be the input to \(\textsc {Disj}_N\) of Alice and Bob, respectively. We construct a graph as illustrated in Fig. 7. We construct a graph on \(7n + 2\) vertices, but \(\mathcal {O}(n^2)\) edges. The input of Alice and Bob will control the edges in a bipartite graph each. Let \(a_1, \ldots a_n\) and \(a_1', \ldots , a_n'\) be the bipartite graph A for Alice. Alice now views her input x as an adjacency matrix for the \(n^2\) potential edges in A, but inverse, so an edge is present if and only if the corresponding entry is a 0. We also add a (universal) vertex \(u_A\) which we connect to all vertices in A. For Bob, do the same with vertices \(b_1,\ldots ,b_n\) and \(b_1',\ldots ,b_n'\) forming a bipartite graph B. Bob also views his input y as an adjacency matrix (in exactly the same order as Alice!) for the \(n^2\) potential edges in B, but inverse, so an edge is present if and only if the corresponding entry is a 0. We also add a vertex \(u_B\) which we connect to all vertices in B. To complete the construction, we create a set S of n vertices \(s_1,\ldots ,s_n\) and we connect \(s_i\) to \(a_i\) and \(b_i\) for each \(1\le i\le n\). Then, we also create a set T of 2n vertices \(t_1,\ldots ,t_n\) and \(t_1',\ldots ,t_n'\), where we connect \(t_i'\) with \(t_i\) and \(a_i'\) and \(b_i'\). This completes the construction. Given an algorithm which works on such a graph, Alice and Bob can construct an AL stream by having Alice first reveal all vertices in S, A, T and \(u_A\) with their incident edges, then passing the memory state to Bob who reveals all vertices in B and \(u_B\) with their incident edges. This completes one pass of the stream. Notice that the graph is connected.

We now claim that the diameter of this construction is at least 5 when the answer to \(\textsc {Disj}_N\) is NO, and otherwise the diameter is at most 4.

Let us assume that the answer to \(\textsc {Disj}_n\) is NO, that is, there is an index \(i'\) such that \(x_{i'} = y_{i'} = 1\). Let i, j be the pair of indices such that the edges \((a_i,a_j')\) and \((b_i,b_j')\) are decided by \(x_{i'}\) and \(y_{i'}\) respectively. In this case, both these edges are not present in the graph. Hence, the shortest path from \(s_i\) to \(t_j\) must use either \(u_A\) or \(u_B\), or use at least 3 edges in A or B (because A and B are bipartite graphs). Hence, the distance from \(s_i\) to \(t_j\) must be at least 5.

Now assume the answer to \(\textsc {Disj}_n\) is YES, that is, there is no index \(i'\) such that \(x_{i'} = y_{i'} = 1\). Then, for every \(1\le i,j\le n\) pair there exists a path from \(s_i\) to \(t_j\) of length 4, because either or both of the edges \((a_i,a_j')\), \((b_i,b_j')\) are present in the graph. One can check that all other distances are at most 4 as well.^{Footnote 4}

We conclude that any algorithm that can solve the Diameter problem on general (dense) graphs in the AL model in p passes, must use \(\Omega (N/p) = \Omega (n^2/p)\) bits of memory by Proposition 7. \(\square \)

Splitting up \(u_A\) and \(u_B\) into two vertices each, and making the tails from \(t'_i\) to \(t_i\) at least three edges longer for each i makes the lower bound work for bipartite graphs.

Corollary 27

Any streaming algorithm for Diameter on bipartite graphs in the AL model using p passes over the stream requires \(\Omega (n^2/p)\) bits of memory.

Proof

The proof follows from adjusting the construction in Theorem 26. If we split up \(u_A\) and \(u_B\) into two vertices \(u_A, u_A'\) and \(u_B,u_B'\) each and have each only connect to one side of A, B respectively, the graph construction forms a bipartite graph. To make the diameter distinction still work, we have to extend the paths in T such that the distance from \(t_i\) to \(t_i'\) is at least 4 (this makes the diameter always be formed by a \(s_i, t_j\)-path and not some other path). \(\square \)

4.1 Permutation Lower Bounds

In this section, we extend the list of our lower bounds by showing some reductions from \(\textsc {Perm}_n\), which prove lower bounds for 1-pass algorithms, showing that they must use \(\Omega (n\log n)\) bits. In particular, we show that there are constructions similar to the ‘Windmill’ and ‘Diamond’ constructions from the previous section that work for the Permutation problem.

Theorem 28

Any streaming algorithm for Diameter that works on the ‘Windmill-Perm’ construction in the AL model using 1 pass over the stream requires \(\Omega (n \log n)\) bits of memory.

Proof

Let \(\pi , j\) be the input to \(\textsc {Perm}_n\), and let \(\gamma \) and \(\psi \) be the values associated with j (see the definition of \(\textsc {Perm}_n\)). We create a graph construction called ‘Windmill-Perm’ on \(7n + 8\) vertices, see Fig. 8. Start with a path \(t = t_1, \ldots , t_8\) on 8 vertices. For each index i, let \(u_{i,1}, u_{i,2}, u_{i,3}\) and \(v_{i,1}, v_{i,2}, v_{i,3}\) be two triplets. If i is the \(\psi \)-th index, Bob inserts the edges \((u_{i,1},u_{i,2})\) and \((u_{i,2},u_{i,3})\), otherwise, Bob inserts the edges \((u_{i,1},u_{i,2})\) and \((u_{i,1},u_{i,3})\). For the other triplet, if the \(\gamma \)-th bit of i is a 1, Bob inserts the edges \((v_{i,1},v_{i,2})\) and \((v_{i,2},v_{i,3})\), and otherwise Bob inserts the edges \((v_{i,1},v_{i,2})\) and \((v_{i,1},v_{i,3})\). Now add two vertices \(u_{i,4}\) and \(v_{i,4}\) and the edges \((u_{i,3}, u_{i,4})\) and \((v_{i,4}, v_{i,1})\). Now Alice inserts edges depending on her permutation \(\pi \). For each i, Alice inserts the edge \((u_{i,4}, v_{\pi (i),4})\), i.e. the first triplet of index i gets connected to the second triplet of index \(\pi (i)\). We complete the construction by ‘glueing’ the vertices \(u_{i,1}\) onto \(t_1\), these are the same vertex. Notice that this is a connected graph because a permutation is a bijective function. Given a streaming algorithm that works on such a graph, Alice and Bob can construct an AL stream corresponding to this graph by having Alice first reveal the vertices \(u_{i,4},v_{i,4}\) for each i (and their edges, which Alice knows), and then passing the memory state of the algorithm to Bob who reveals the rest of the vertices and their edges. This completes one pass of the stream.

We now claim that the diameter of this graph is at least 14 when the answer to \(\textsc {Perm}_n\) is YES, and otherwise the diameter is at most 13.

Assume the answer to \(\textsc {Perm}_n\) is YES, that is, the \(\gamma \)-th bit of the image of the \(\psi \)-th index under \(\pi \) is 1. The triplet \(u_{\psi ,1}, u_{\psi ,2}, u_{\psi ,3}\) must have the edges \((u_{\psi ,1},u_{\psi ,2})\) and \((u_{\psi ,2},u_{\psi ,3})\) by construction. Then the edge \((u_{\psi ,4}, u_{\pi (\psi ),4})\) is present because of Alice, and this leads to a triplet with the edges \((v_{\pi (\psi ),1},v_{\pi (\psi ),2})\) and \((v_{\pi (\psi ),2},v_{\pi (\psi ),3})\) by the assumption that this is a YES instance. Therefore, the distance from \(v_{\pi (\psi ),3}\) to \(t_8\) is 14, and so the diameter is at least 14.

Now assume that the answer to \(\textsc {Perm}_n\) is NO, that is, the \(\gamma \)-th bit of the image of the \(\psi \)-th index under \(\pi \) is not 1. As the \(\psi \)-th index is the only u-triplet that forms a path and not a tree, and by the assumption the v-triplet it is connected to does not have the shape of a path, the distance from all vertices (except \(t_8\)) to \(t_1\) is at most 6. Hence, the diameter is at most 13 because \(t_8\) lies at distance 7 from \(t_1\).

We conclude that any 1-pass algorithm in the AL model that can solve the Diameter problem on the ‘Windmill-Perm’ construction, must use \(\Omega (n \log n)\) bits of memory by Proposition 8. \(\square \)

Corollary 29

Any streaming algorithm for Diameter in the AL model that uses 1 pass over the stream must use \(\Omega (n \log n)\) bits of memory, even on graphs for which the algorithm is given

1.
that the input is a bounded depth tree,
2.
that the Maximum Degree is a constant of at least 3.

Proof

The corollary follows from Theorem 28 together with observing that ‘Windmill-Perm’ is

1.
a tree of constant depth,
2.
a lower bound construction that still works if we extend \(t_1\) to a binary tree and extend the tail t to consist of \(8 + \log n\) vertices (this makes the diameter distinction to be between \(13 + 2\log n\) and \(14 + 2\log n\)).

\(\square \)

Next we show a similar adaptation of the ‘Diamond’ construction.

Theorem 30

Any streaming algorithm for Diameter that works on the ‘Diamond-Perm’ construction in the AL model using 1 pass over the stream requires \(\Omega (n \log n)\) bits of memory.

Proof

Let \(\pi , j\) be the input to \(\textsc {Perm}_n\), and let \(\gamma \) and \(\psi \) be the values associated with j (see the definition of \(\textsc {Perm}_n\)). We create a graph construction called ‘Diamond-Perm’ on \(14n + 3\) vertices, see Fig. 9. The construction starts of with two vertices \(b,b'\) connected with an edge. We then add \(n+1\) vertices \(c_0,\ldots ,c_n\) each of which we connect to both b and \(b'\). For each index i we do the following. We create four (disjoint) paths \((u_{i,1}, u_{i,2}, u_{i,3})\), \((u_{i,4}, u_{i,5}, u_{i,6})\), \((v_{i,1}, v_{i,2}, v_{i,3})\), and \((v_{i,4}, v_{i,5}, v_{i,6})\). If i is not the \(\psi \)-th index, Bob inserts the edges \((b',u_{i,2})\) and \((b',u_{i,5})\). If the \(\gamma \)-th bit of i is a 0, Bob inserts the edges \((b,v_{i,2})\) and \((b,v_{i,5})\). The rest of the edges depend on the permutation of Alice. For each i, Alice creates the vertex \(a_i\) and inserts the edges \((c_{i-1}, u_{i,1})\), \((u_{i,3}, v_{\pi (i),1})\), \((v_{\pi (i),3}, a_i)\), \((a_i, u_{i,4})\), \((u_{i,6}, v_{\pi (i),4})\), and \((v_{\pi (i),6}, c_{i+1})\). Notice that, ignoring \(b,b'\), this graph forms a path, as a permutation is a bijective function. This completes the construction. Given a streaming algorithm that works on this construction, Alice and Bob can produce an AL stream by having Alice reveal the vertices \(c_0, \ldots , c_n\), and \(a_i, u_{i,1}, u_{i,3}, v_{i,1}, v_{i,3}, u_{i,4}, u_{i,6}, v_{i,4}, v_{i,6}\) for every \(1 \le i \le n\). Then she passes the memory state of the algorithm to Bob who reveals the rest of the vertices including \(b,b'\). This completes one pass of the stream.

We now claim that the diameter of this graph is at least 10 when the answer to \(\textsc {Perm}_n\) is YES, and otherwise the diameter is at most 9.

Assume the answer to \(\textsc {Perm}_n\) is YES, that is, the \(\gamma \)-th bit of the image of the \(\psi \)-th index under \(\pi \) is 1. Consider \(a_\psi \). The edges \((b',u_{\psi ,2})\) and \((b',u_{\psi ,5})\) are not present by construction. Also, the edges \((b,v_{\pi (\psi ),2})\) and \((b,v_{\pi (\psi ),5})\) are not present by construction. Therefore, the distance from \(a_\psi \) to b and \(b'\) is 8. Then, the diameter is at least 10 because there are always vertices at distance at least 2 from both b and \(b'\).

Now assume that the answer to \(\textsc {Perm}_n\) is NO, that is, the \(\gamma \)-th bit of the image of the \(\psi \)-th index under \(\pi \) is not 1. Then, for every \(1\le i\le n\) between \(c_{i-1}\) and \(c_i\) either or both of the edge pairs \((b',u_{i,2}),(b',u_{i,5})\) and \((b,v_{\pi (i),2}), (b,v_{\pi (i),5})\) must be present, because the edges \((b',u_{i,2}),(b',u_{i,5})\) are only absent for the \(\psi \)-th index, and the \(\gamma \)-th bit under the image of \(\pi \) is not a 1. Hence, the distance from any vertex to one of b or \(b'\) is at most 4. As b and \(b'\) are connected with an edge, the diameter is at most 9.

We conclude that any 1-pass algorithm in the AL model that can solve the Diameter problem on the ‘Diamond-Perm’ construction, must use \(\Omega (n \log n)\) bits of memory by Proposition 8. \(\square \)

Corollary 31

Any streaming algorithm for Diameter in the AL model that uses 1 pass over the stream must use \(\Omega (n \log n)\) bits of memory, even on graphs for which the algorithm is given a Deletion Set to a path of size at least 2.

Proof

The corollary follows from Theorem 30, together with observing that ‘Diamond-Perm’ is a path when we remove \(\{b, b'\}\). \(\square \)

5 Connectivity

In this section, we show results for Connectivity. Connectivity is an easier problem than Diameter, that is, solving Diameter solves Connectivity as well, but not the other way around. Hence, lower bounds in this section also imply lower bounds for Diameter (in non-connected graphs). In general graphs, a single pass, \(\mathcal {O}(n \log n)\) bits of memory algorithm exists by maintaining connected components in a Disjoint Set data structure [6], which is optimal in general graphs [5]. The interesting part about Connectivity is that some graph classes admit fairly trivial algorithms by a counting argument. For example, if the input is a forest, we can decide on Connectivity by counting the number of edges, which is a 1-pass, \(\mathcal {O}(\log n)\) bits of memory, algorithm. An overview of the results in this section is given in Table 3.

Table 3 Overview of the results for Connectivity. All hardness results listed here are through reductions from Disjointness. \((\mathcal {M}, m, p)\)-hard means that any algorithm using p passes in model \(\mathcal {M}\) (or weaker) requires \(\Omega (m)\) bits of memory. \((\mathcal {M}, m, p)\)-str. means that there is an algorithm that uses p passes in model \(\mathcal {M}\) (or stronger) using \(\mathcal {O}(m)\) bits of memory. FVS stands for Feedback Vertex Set number, FEN for Feedback Edge Set number. We state most upper bounds only as observations

Full size table

The following upper bounds follow from applications of the Disjoint Set data structure.

Observation 32

Given a graph G as an AL stream with vertex cover number k, we can solve Connectivity [k] in 1 pass and \(\mathcal {O}(k \log n)\) bits of memory.

Proof

When the vertex cover is known, we can keep track of a Disjoint Set data structure on the k vertices of the vertex cover. Seeing any vertex that connects two or more vertices of the vertex cover in the stream translates directly to taking the union of the corresponding sets in the data structure. If at the end of the stream the data structure contains only one set and we have not seen a degree-0 vertex, the graph is connected.

When the vertex cover is not given, and only its size, we can greedily maintain an approximate vertex cover of size at most 2k by maintaining a maximal matching, while executing the above procedure on this set. \(\square \)

Observation 33

Given a graph G as an AL stream with a deletion set X of size k to \(\ell \) cliques, we can solve Connectivity \([k, \ell ]\) in 1 pass and \(\mathcal {O}((k + \ell ) \log n)\) bits of memory.

Proof

We use a Disjoint Set data structure on all vertices in X and a representative vertex for each clique, say the lowest numbered vertex of that clique. The space used by the data structure is \(\mathcal {O}((k+\ell )\log n)\) bits. For a vertex in X we only process the edges to other vertices in X in the data structure. For a vertex in a clique (\(\notin X\)) we register the edges to vertices in X and its lowest number neighbour \(\notin X\). This takes at most \((k+1)\log n\) bits, and is enough to take unions in the data structure corresponding to the connections seen by this vertex.

At the end of the stream, if the data structure contains only one set, the graph is connected. \(\square \)

Next is a simple lower bound for the AL model.

Theorem 34

Any streaming algorithm for Connectivity that works on the ‘Simple AL-Conn’ construction in the AL model using p passes over the stream requires \(\Omega (n/p)\) bits of memory.

Proof

Let x, y be the input to \(\textsc {Disj}_n\) of Alice and Bob, respectively. We construct a graph as illustrated in Fig. 10. Let M be a matching on \(n+1\) edges, and associate each edge with an index \(1\le i \le n\). Now we add two vertices a, b of Alice and Bob respectively, and connect a and b to the \((n+1)\)-th edge. The edge from a to the i-th edge in M is present if and only if \(x_i = 0\), for all \(1 \le i \le n\). The same happens for b, where the edge from b to the i-th edge in M is present if and only if \(y_i = 0\), for all \(1 \le i \le n\). This completes the construction, it has \(2n+4\) vertices. Given a streaming algorithm that works on a family including this construction, Alice and Bob construct the AL stream as follows. First, Alice reveals a and the vertices on the left of M. Then she passes the memory state of the algorithm to Bob who reveals b and the vertices on the right of M, which completes one pass of the stream.

We now claim that the graph is connected if and only if the answer to \(\textsc {Disj}_n\) is YES.

Let us assume that the answer to \(\textsc {Disj}_n\) is NO, that is, there is an index i such that \(x_i = y_i = 1\). Then clearly the i-th edge is not connected to the rest of the graph (which includes a and b).

Now assume the answer to \(\textsc {Disj}_n\) is YES, that is, there is no index i such that \(x_i = y_i = 1\). Notice that there is always a path from a to b via the \((n+1)\)-th edge of M. Furthermore, by the assumption, for each index \(1\le i \le n\) either Alice or Bob (or both) has a 0, which means the i-th edge is connected to either a or b. Hence, the graph is connected.

We conclude that any algorithm that can solve the Connectivity problem on the graph construction ‘Simple AL-Conn’ in the AL model in p passes, must use \(\Omega (n/p)\) bits of memory by Proposition 7. \(\square \)

The following follows from Theorem 34 by observing properties of the ‘Simple AL-Conn’ construction.

Corollary 35

Any streaming algorithm for Connectivity in the AL model that uses p passes over the stream must use \(\Omega (n/p)\) bits of memory, even on graphs for which the algorithm is given a

Feedback Vertex Set of size at least 1,
Deletion Set to Matching of size at least 2,
Dominating Set of size at least 2.

Proof

The corollary follows from Theorem 34, together with observing that the construction of ‘Simple AL-Conn’ is

a forest when removing \(\{a\}\),
a matching when removing \(\{a, b\}\),
dominated by the set of vertices \(\{a,b\}\).

\(\square \)

An interesting lower bound is for a unique case: graphs of maximum degree 2. We mentioned that for a forest we have a simple counting algorithm for Connectivity, so the hardness must be for some graph which consists of one or more cycles. Although Theorem 34 implies Connectivity is hard for graphs with a Feedback Vertex Set of constant size, we now show that in the specific case of maximum degree 2-graphs, the problem is still hard, see Fig. 11 for an illustration of the construction. We note that this reduction is similar to the problem tackled by Verbin and Yu [27] and Assadi et al. [29], but our result is slightly stronger in this setting, as it concerns a distinction between 1 or 2 disjoint cycles.

Theorem 36

Any streaming algorithm for Connectivity that works on graphs of maximum degree 2 in the AL model using p passes over the stream requires \(\Omega (n/p)\) bits of memory.

Proof

Let x, y be the input to \(\textsc {Disj}_n\) of Alice and Bob, respectively. We create a construction as shown in Fig. 11. We create a graph on \(8n + 4\) vertices. Associate 8 vertices \(a_{i,1}, \ldots , a_{i,4}\) and \(b_{i,1}, \ldots , b_{i,4}\) with each index i. Let us call the remaining four vertices \(a_0, b_0, a_{n+1}, b_{n+1}\), and insert the edges \((a_0,b_0)\) and \((a_{n+1}, b_{n+1})\). Then for each index i, we do the following. Insert the edges \((a_{i,2}, b_{i,2})\), \((a_{i,3},b_{i,3})\), \((a_{i,1}, a_{i-1,4})\), \((b_{i,1}, b_{i-1,4})\), \((a_{i,4}, a_{i+1,1})\), \((b_{i,4}, b_{i+1,1})\), where \(a_{i-1,4}, b_{i-1,4}\) are replaced with \(a_0,b_0\) when \(i=1\), and \(a_{i+1,1}, b_{i+1,1}\) are replaced with \(a_{n+1}, b_{n+1}\) when \(i=n\). These are all the fixed edges. For each i, Alice also inserts \((a_{i,1},a_{i,3})\) and \((a_{i,2}, a_{i,4})\) when \(x_i = 0\) or inserts \((a_{i,1},a_{i,2})\) and \((a_{i,3}, a_{i,4})\) when \(x_i = 1\). Bob inserts \((b_{i,1}, b_{i,4})\) and \((b_{i,2}, b_{i,3})\) when \(y_i = 0\), or inserts \((b_{i,1}, b_{i,2})\) and \((b_{i,3}, b_{i,4})\) when \(y_i = 1\). This completes the construction. Given an algorithm that works on a family including this construction, Alice and Bob construct an AL stream as follows. First, Alice reveals the vertices \(a_0, a_{n+1}\) and \(a_{i,k}\) for all \(1\le i\le n, 1\le k\le 4\), then passes the memory state to Bob who reveals the vertices \(b_0, b_{n+1}\) and \(b_{i,k}\) for all \(1\le i\le n, 1\le k\le 4\). This completes one pass of the stream. Notice that Alice and Bob do not need information about the input of the other to do this, as there are only fixed edges between a- and b-vertices. Also notice that this graph always consists of (a disjoint union of) one or more cycles regardless of the input to \(\textsc {Disj}_n\), as every vertex in the graph has degree 2.

We now claim that the graph is connected if and only if the answer to \(\textsc {Disj}_n\) is YES.

Let us assume that the answer to \(\textsc {Disj}_n\) is NO, that is, there is an index i such that \(x_i = y_i = 1\). It is easy to see that there is no path between \(a_{i,1}\) and either \(a_{i,4}\) or \(b_{i,4}\), and similarly, there is no path between \(b_{i,1}\) and either \(a_{i,4}\) or \(b_{i,4}\). Hence the graph is not connected.

Now assume the answer to \(\textsc {Disj}_n\) is YES, that is, there is no index i such that \(x_i = y_i = 1\). We will construct a simple path from \(a_0\) to either \(a_{n+1}\) or \(b_{n+1}\). If this succeeds, then the graph must be a single cycle, as we can continue to the other of \(a_{n+1}\) or \(b_{n+1}\) and walk the other way to \(b_0\), never crossing the first path because every vertex has degree 2. These two paths together with the edges \((a_0, b_0)\), \((a_{n+1}, b_{n+1})\) form a single cycle. Starting at \(a_0\), we can view a path going ‘right’, crossing each index i step by step. At an \(a_{i,1}\), there are only two possible cases: either we walk through \(a_{i,2}, a_{i,3}, b_{i,2}, b_{i,3}\) in some order and end in \(a_{i,4}\), or we have a path to \(b_{i,4}\) (using only vertices of index i). In both cases, we can advance to the next i. At an \(b_{i,1}\), there are also only two cases: either there is an edge to \(b_{i,4}\) or there is a path through \(b_{i,2}, a_{i,2}\) to \(a_{i,4}\). In both cases, we can advance to the next i. Hence, we can find a path walking through each i advancing to the next, which must mean we end up in either \(a_{n+1}\) or \(b_{n+1}\), and we are done.

We conclude that any algorithm that can solve the Connectivity problem on the graph construction ‘Cycles’ in the AL model in p passes, must use \(\Omega (n/p)\) bits of memory by Proposition 7. \(\square \)

We note that we can make the result of Theorem 36 (and 39) hold for bipartite graphs of maximum degree 2 by subdividing every edge, making the graph odd cycle-free, and thus bipartite.

The proofs of Theorems 5 and 6 follow.

Proof of Theorem 5

If H contains a cycle of length not equal to 6 as a subgraph, then the result follows from Theorem 34, because that construction is \(C_\ell \)-free for any \(\ell \not = 6\). By subdividing the middle (matching) edges, the construction can be made \(C_\ell \)-free for any fixed \(\ell > 2\). Hence, we may assume that H does not contain a cycle and thus is a forest. If H contains a vertex of degree at least 3, then the result follows from Theorem 36, because that construction has maximum degree 2. Hence, we may assume that H is a linear forest. If H contains a \(P_7\), then the result follows from a slight adaptation of the construction of Theorem 34. By making the vertices a and b of that construction adjacent, the resulting graph cannot have a \(P_7\) as an induced subgraph, while not affecting the correctness of the construction. \(\square \)

Proof of Theorem 6

This is an immediate corollary of Theorem 34. In that construction, after removing vertices a and b, the remainder is a disjoint union of \(P_2\)’s. \(\square \)

Interval and split graphs are hard in the VA model, see Figs. 12 and 13.

Theorem 37

Any streaming algorithm for Connectivity that works on interval graphs in the VA model using p passes over the stream requires \(\Omega (n/p)\) bits of memory.

Proof

Let x, y be the input to \(\textsc {Disj}_n\) of Alice and Bob, respectively. We create a construction as shown in Fig. 12. We create an interval graph on 4n vertices. For each i, we create the vertices \(u_i, v_i, a_i, b_i\). We insert the edges \((u_i, a_i)\), \((v_i, b_i)\), and \((v_{i-1},u_i)\) (for \(i=1\) we do not insert this last edge). Alice inserts the edge \((a_i, v_i)\) if and only if \(x_i = 0\). Bob inserts the edges \((b_i, u_i)\) and \((b_i, a_i)\) if and only if \(y_i = 0\). This completes the construction. Notice that this is an interval graph, as illustrated by the interval representation of index i in Fig. 12. Given an algorithm that works on a family including this construction, Alice and Bob construct an VA stream as follows. First, Alice reveals all vertices \(u_i, a_i, v_i\) (and the edges between them) for each i. Then she passes the memory of the algorithm to Bob who reveals each \(b_i\). This completes one pass of the stream. Notice that Alice does not need to know the input of Bob for \(\textsc {Disj}_n\), and neither does Bob have to know the input of Alice, as it is a VA stream.

We now claim that the graph is connected if and only if the answer to \(\textsc {Disj}_n\) is YES.

Let us assume that the answer to \(\textsc {Disj}_n\) is NO, that is, there is an index i such that \(x_i = y_i = 1\). Then clearly, there is no path between \(u_i\) and \(v_i\), and so the graph is not connected.

Now assume the answer to \(\textsc {Disj}_n\) is YES, that is, there is no index i such that \(x_i = y_i = 1\). Now we claim that there is a path from \(u_1\) to \(v_{n}\) using every \(u_i\) and \(v_i\), and hence the graph is connected. Indeed, if there is such a path then the graph is connected, as each \(a_i\) and \(b_i\) are always adjacent to \(u_i\) and \(v_i\), respectively. There is such a path because, for each \(1 \le i \le n\) at least one of the edges \((u_i, b_i)\) or \((a_i, v_i)\) is present, creating a path between \(u_i\) and \(v_i\). Combining these paths for each i gives us the path we were looking for, as the edges \((v_{i-1},u_i)\) exist for each \(2\le i\le n\).

We conclude that any algorithm that can solve the Connectivity problem on the graph construction ‘Interval’ in the VA model in p passes, must use \(\Omega (n/p)\) bits of memory by Proposition 7. \(\square \)

Theorem 38

Any streaming algorithm for Connectivity that works on split graphs in the VA model using p passes over the stream requires \(\Omega (n/p)\) bits of memory.

Proof

Let x, y be the input to \(\textsc {Disj}_n\) of Alice and Bob, respectively. We create a construction as shown in Fig. 13. We create an split graph on \(n + 2\) vertices. Let \(v_1,\ldots , v_n\) be n vertices in the independent set. Let a, b be two vertices that form the clique. Alice inserts the edges \((a,v_i)\) when \(x_i = 0\), and Bob inserts the edges \((b,v_i)\) when \(y_i = 0\), for each \(1\le i \le n\). This completes the construction. Given an algorithm that works on the construction, Alice and Bob construct a VA stream as follows. First, Alice reveals \(v_1,\ldots , v_n\) (without edges at this point) and then reveals a. She then passes the memory of the algorithm to Bob, who reveals b, which completes one pass of the stream.

It can be easily seen that there is an isolated vertex if and only if there is an index i such that \(x_i = y_i = 1\). Split graphs are connected if and only if there is no isolated vertex.

We conclude that any algorithm that can solve the Connectivity problem on split graphs in the VA model in p passes, must use \(\Omega (n/p)\) bits of memory by Proposition 7. \(\square \)

For split graphs, in any model, Connectivity admits a one-pass, \(\mathcal {O}(n)\) bits of memory algorithm by counting if there is a vertex of degree 0 (and so also for any p a p-pass algorithm using \(\mathcal {O}(n/p)\) bits by splitting up the work in p parts).^{Footnote 5} If there can be no isolated vertices, then a split graph is always connected.

5.1 Permutation Lower Bound

As an additional result, we give another reduction for graphs of maximum degree 2. Sun and Woodruff [5] have already shown a Permutation lower bound for Connectivity in general graphs. We show that the ‘Cycles’ construction can be extended to lower bounds using the \(\textsc {Perm}_n\) problem, showing hardness for graphs of maximum degree 2 (Fig. 14).

Theorem 39

Any streaming algorithm for Connectivity that works on graphs of maximum degree 2 in the AL model using 1 pass over the stream requires \(\Omega (n \log n)\) bits of memory.

Proof

Essentially, we adapt the construction of Theorem 36 to work for \(\textsc {Perm}_n\). Let \(\pi , j\) be the input to \(\textsc {Perm}_n\), and let \(\gamma \) and \(\psi \) be the values associated with j (see the definition of \(\textsc {Perm}_n\)). We construct a graph on \(16n+4\) vertices, consisting of one or more cycles, see Fig. 14. For each index i, we create the vertices \(a_{i,1}, a_{i,2}, a_{i,3}, a_{i,4}, a_{i,1}', a_{i,2}', a_{i,3}', a_{i,4}'\) and \(b_{i,1}, b_{i,2}, b_{i,3}, b_{i,4}, b_{i,1}', b_{i,2}', b_{i,3}', b_{i,4}'\), where we connect the each a-vertex with its corresponding b-vertex (i.e. \((a_{i,2}', b_{i,2}')\) is an edge). Next to this, we also create 4 vertices \(a_0, a_0', a_{n+1}, a_{n+1}\), where \((a_0, a_0')\) and \((a_{n+1}, a_{n+1}')\) are edges. The rest of the edges are dependent on the input to \(\textsc {Perm}_n\). For an index i, if it is the \(\psi \)-th index, Bob inserts the edges \((b_{i,1}, b_{i,2})\) and \((b_{i,3}, b_{i,4})\). If it is not the \(\psi \)-th index, Bob inserts the edges \((b_{i,1}, b_{i,3})\) and \((b_{i,2}, b_{i,4})\). Also, for an index i, if its \(\gamma \)-th bit is a 1, Bob inserts the edges \((b_{i,1}', b_{i,2}')\) and \((b_{i,3}', b_{i,4}')\). If the \(\gamma \)-th bit is a 0, Bob inserts the edges \((b_{i,1}', b_{i,4}')\) and \((b_{i,2}', b_{i,3}')\). Alice links ‘the top’ of i with ‘the bottom’ of \(\pi (i)\). For each index i, Alice inserts the following edges, \((a_{i-1,4}, a_{i,1})\) (or \((a_0, a_{i,1})\) when \(i=1\)), \((a_{i,4}, a_{i+1,4})\) (or \((a_{i,4}, a_{n+1})\) when \(i=n\)), \((a_{i,2}, a_{\pi (i), 2}')\), \((a_{i,3}, a_{\pi (i), 3}')\), \((a_{\pi (i-1),4}', a_{\pi (i),1}')\) (or \((a_0', a_{\pi (i),1}')\) when \(\pi (i)=1\)), \((a_{\pi (i),4}', a_{\pi (i+1),4}')\) (or \((a_{\pi (i),4}', a_{n+1}')\) when \(\pi (i)=n\)).^{Footnote 6} This concludes the construction. The graph consists of one or more cycles because every vertex has degree 2. Given an algorithm that works on a family including this construction, Alice and Bob construct an AL stream as follows. First, Alice reveals all a-vertices, then passes the memory of the algorithm to Bob, who reveals all b-vertices, which completes one pass of the stream. This is correct, as a-vertices are only connected to b-vertices with edges independent of the input to \(\textsc {Disj}_n\).

We claim that the graph is not connected if and only if the answer to \(\textsc {Perm}_n\) is YES. The graph is not connected if and only if there is an index i such that ‘the top’ and ‘the bottom’ are both a 1-construction. This can only be the case when an index i is the \(\psi \)-th index on ‘the top’, and the \(\gamma \)-th bit is a 1 on ‘the bottom’. However, ‘the bottom’ corresponds to the index \(\pi (i)\) because of the edges of Alice, which means that the \(\gamma \)-th bit of the image under \(\pi \) of \(\psi \) is a 1. Hence, this occurs if and only if the answer to \(\textsc {Perm}_n\) is YES.

We conclude that any 1-pass algorithm in the AL model that can solve the Connectivity problem on graphs of maximum degree 2, must use \(\Omega (n \log n)\) bits of memory by Proposition 8. \(\square \)

The results of Theorem 39 also holds for bipartite graphs of degree 2. To see this, subdivide every edge, making the graph odd cycle-free, and thus bipartite.

6 Vertex Cover Kernelization

In this section, we parameterize the Vertex Cover problem by the solution size k. We now show how our insights into parameterized, streaming graph exploration can aid in producing a new kernelization algorithm for Vertex Cover [k].^{Footnote 7} The basis for our result is a well-known kernel for the Vertex Cover [k] problem of Buss and Goldsmith [22], consisting of \(\mathcal {O}(k^2)\) edges. Constructing this kernel is simple: find all vertices with degree bigger than k, and remove them from the graph, and decrease the parameter with the number of vertices removed, say to \(k'\). Then, there is no solution if there are more than \(k\cdot k'\) edges. Therefore, we have a kernel consisting of \(\mathcal {O}(k^2)\) edges. We are able to achieve this same kernel in the AL model, as counting the degree of a vertex is possible in this model. Interestingly, we do not require \(\mathcal {O}(k^2 \log n)\) bits of memory to produce a stream corresponding to the kernel of \(\mathcal {O}(k^2)\) edges. This result is also possible in the EA model, by allowing vertices up to degree 2k.

Theorem 40

Given a graph G as an AL stream, we can make an AL stream corresponding to an \(\mathcal {O}(k^2)\)-edge kernel for the Vertex Cover [k] problem using two passes and \(\mathcal {O}(k \log n)\) bits of memory. When we work with an EA stream, we can make an EA stream corresponding to an \(\mathcal {O}(k^2)\)-edge kernel using four passes and \(\mathcal {O}(k \log n)\) bits of memory.

Proof

Let G be a graph, with n vertices and m edges, given as an AL stream, and let k be the solution size parameter for the Vertex Cover [k] problem. Note that we can count the degree of every vertex when it appears in the stream, as we are given all adjacencies of a vertex consecutively. Therefore, in one pass over the stream we can count the degree of every vertex, and save each vertex with a degree bigger than k in a set S, as long as \(|S|\le k\). In this same pass, we keep track of two more counters: the total number of edges in the stream \(m'\) (which is 2m), and the number of unique edges we remove r. We find r by incrementing a local (to a vertex) counter \(r'\) when we see edges towards vertices not in S, and add \(r'\) to r if we decide to add the vertex to S. If \(\frac{m'}{2} - r > k \cdot (k - |S|)\), return NO. Otherwise, make a pass over the stream, and output only those edges between vertices not in S.

The output must be an AL stream, as we only remove edges from an AL stream to produce it. Let is also be clear that we use two passes over the stream.

The set S takes \(\mathcal {O}(k \log n)\) bits of memory, as finding more than k vertices will result in returning NO. Counting the total number of edges takes \(\mathcal {O}(\log m) = \mathcal {O}(\log n)\) bits of space, and other constant number of counters are the same size or smaller. Therefore, this procedure uses \(\mathcal {O}(k \log n)\) bits of memory.

The behaviour of this procedure is equivalent of the kernelization algorithm of Buss and Goldsmith [22], as it finds exactly those vertices with degree higher than k, and ‘removes’ them by adding them to S and ignoring edges incident to them in the output. Checking the instance size is done correctly, as the new parameter \(k'\) is equivalent to \(k - |S|\), and the number of remaining edges is equal to \(m - r\), which is \(\frac{m'}{2} - r\). The value of r is counted correctly because we only count unique edges by ignoring those towards vertices already in S. Therefore, this kernelization procedure is correct.

For the case of the EA stream, we essentially do the same as in the AL model, but in this model we end up with a slightly larger kernel. In our first pass, we greedily construct a vertex cover X of size 2k or conclude that there is no solution to Vertex Cover [k]. Now, we know that all vertices not in X must have degree at most 2k. So, in the second pass we count the degree of all vertices in X, and if a degree exceeds 2k we the add vertex to S. This takes only \(\mathcal {O}(k \log n)\) bits of memory. If at some point \(|S|> k\) we also stop and conclude there is no solution to Vertex Cover [k], as all vertices of degree at least 2k must be in any solution. In a third pass, we count the total number m of edges in the stream, and the number of unique edges we remove r (this is not possible during the second pass because we might count edges double when it has both endpoints in S). Now if \(m - r > 2k \cdot (k - |S|)\), we can conclude there is no solution to Vertex Cover [k], as each vertex has maximum degree 2k.

Now to output the kernel as an EA stream, we make a fourth pass and only output those edges not incident to vertices in S. In this procedure we only use \(\mathcal {O}(k \log n)\) bits of memory for counting and X. The resulting kernel has at most \(2k \cdot (k - |S|) = \mathcal {O}(k^2)\) edges. \(\square \)

Next, we show how to use Theorem 40 to produce a kernel of even smaller size, using only \(\mathcal {O}(k \log n)\) bits of memory. This requires Theorem 40 to convert the original graph stream into the kernel input for the next theorem, which only increases the number of passes by a factor 2 or 4 (we have to apply Theorem 40 every time the other procedure uses a pass).

Chen et al. [23] show a way to convert the kernel of Buss and Goldsmith into a 2k-vertex kernel for Vertex Cover [k], using the NT-Theorem by Nemhauser and Trotter [51]. We will adapt this method in the streaming setting, and give a concise description of this procedure below. The following theorem, as formulated this way by Chen et al. [23], is due to Nemhauser and Trotter [51] and Bar-Yehuda and Even [52].

Proposition 41

[NT-Theorem] There is an \(\mathcal {O}(\sqrt{n}m)\) time algorithm that, given a graph G of n vertices and m edges, constructs two disjoint subsets \(C_0\) and \(V_0\) of vertices in G such that

(1)
The union of any minimum vertex cover of \(G[V_0]\) and \(C_0\) forms a minimum vertex cover for G.
(2)
Any minimum vertex cover of \(G[V_0]\) contains at least \(|V_0 |/2\) vertices.

The proof of the NT-Theorem by Bar-Yehuda and Even [52] shows us how to do find \(V_0\) and \(C_0\) for any arbitrary graph G. One creates a bipartite graph B from G by making two copies of all vertices \(V, V'\), and an edge (x, y) in G translates to the edges \((x,y')\), \((x',y)\) in B. One then finds a maximum matching M of B to find a minimum vertex cover of B, as described by Bondy and Murty [53, Page 74, Theorem 5.3]. Let us shortly go over what it entails. If our bipartite graph has vertex sets \(V, V'\) and a maximum matching M, then we can find a minimum vertex cover X with \(|X|= |M|\) in the following manner. Denote all unmatched vertices in V with U, and let \(Z \subseteq V \cup V'\) be the set of vertices connected to U with an M-alternating path (a path such that edges in M and not in M alternate). Denoting \(S = Z \cap V\) and \(T = Z \cap V'\), then X is given by \(X = (V \setminus S) \cup T\). Now, the set \(C_0\) is given by all vertices \(v\in G\) for which both \(v,v'\in B\) are contained in X, and \(V_0\) contains the vertices \(v\in G\) for which exactly one of \(v,v'\in B\) is contained in X.

Chen et al. [23] describe how to use the above procedure to get a smaller kernel from the kernel by Buss and Goldsmith [22]. Start with G as the kernel by Buss and Goldsmith [22], and execute the above procedure to find the sets \(C_0\) and \(V_0\) in G. Then, the kernel is given by \(G' = G[V_0]\) with new parameter value \(k' = k_1 - |C_0 |\), where \(k_1\) is the parameter value of the kernel by Buss and Goldsmith. Chen et al. [23] show that \(G'\) has at most 2k vertices and is a kernel with parameter value \(k'\).

We now show how to execute this procedure in the streaming setting, both in the AL and EA models. Note that in the EA model, Theorem 40 yields a kernel which is not exactly the kernel by Buss and Goldsmith, but still has \(\mathcal {O}(k^2)\) edges. This property is sufficient for the above procedure to work, even though it is not exactly the kernel by Buss and Goldsmith.

Lemma 42

Given a graph G as a stream in model AL or EA, we can produce a stream in the same model corresponding to the Phase 1 bipartite graph B of [52, Algorithm NT] using two passes and \(\mathcal {O}(\log n)\) bits of memory.

Proof

Given a graph \(G = (V,E)\), Phase 1 of [52, Algorithm NT] asks for the bipartite graph B with vertex sets \(V, V'\) and edges \(E_B\) such that \(V' = \{v' \mid v\in V \}\) and \(E_B = \{ (x,y') \mid (x,y) \in E\}\). This is essentially two copies of all vertices and each edge in the original graph makes two edges, between the corresponding (original,copy)-pairs.

The process of creating a stream corresponding to B is quite simple: first we use a pass and, for every edge (x, y), we output \((x,y')\), and then we use another pass and, for every edge (x, y), we output \((x',y)\). If the input is an EA stream, then the output must be as well, as no edge is output twice. If the input stream is an AL stream, the output must be an AL stream too, as we are consistent in which copy of the vertex we address. That is, the output AL stream first reveals all vertices in V and then all those in \(V'\). All adjacencies of these vertices are present in the stream, as all the adjacencies were present in the input stream.

We can see that this uses two passes and \(\mathcal {O}(\log n)\) bits of memory (to remember what pass we are in, and to read an edge). It is trivial that the bipartite graph B is constructed correctly, as for every edge (x, y) we output the edges \((x,y')\) and \((x',y)\). \(\square \)

Before we continue to find the maximum matching in such a graph B produced by Lemma 42, we need a few observations to restrict the size of the matching we want to find. From the conversion to a 2k kernel by Chen et al. [23], we can conclude that for the sets \(C_0\) and \(V_0\) of the NT-Theorem it must hold that \(|V_0|\le 2k - 2 |C_0|\) (as this shows the kernel size). But then it must also be that \(|V_0 |+ |C_0|\le 2k\), and \(V_0\) and \(C_0\) together include all vertices in the found minimum vertex cover in B. So, the maximum matching M of B we search for has size \(\mathcal {O}(k)\).

To find the maximum matching we execute a DFS procedure, which can be done with surprising efficiency in this restricted bipartite setting.

Theorem 43

Given a bipartite graph B as an AL stream with \(\mathcal {O}(k^2)\) vertices, we can find a maximum matching of size at most \(\mathcal {O}(k)\) using \(\mathcal {O}(k^2)\) passes and \(\mathcal {O}(k\log n)\) bits of memory. For the EA model this can be done in \(\mathcal {O}(k^3)\) passes.

Proof

We first use a pass to find a maximal matching M in the graph. This can be done in a single pass because we can construct a maximal matching in a greedy manner, picking every edge that appears in the stream for which both vertices are unmatched.

Then, we iteratively find an M-augmenting path P (a path starting and ending in a unmatched vertices, alternating between edges in and not in M), and improve the matching by switching all edges on P (i.e., remove from M the edges on P in M, and add to M the edges on P not in M). Note that any such P has length \(\mathcal {O}(k)\), as otherwise M would exceed size \(\mathcal {O}(k)\). It is known that a matching M in a bipartite graph is maximum when there is no M-augmenting path [54]. We can also find an M-augmenting path only \(\mathcal {O}(k)\) times, as the size of the matching increases by at least 1 for each M-augmenting path.

Let us now describe how we find an M-augmenting path, given some matching M of size \(\mathcal {O}(k)\). We find M-augmenting paths by executing a Depth First Search (DFS) from each unmatched vertex. Note that we alternate between traversing edges in M and not in M in this search. In contrary to a normal DFS, we do not save which vertices we visited, as this would cost too much memory. Instead, we mark edges in M as visited, together with the vertex from which we started the search. If an edge \(e \in M\) has been visited once in the search tree, there is no need to visit it again, as the search that visited e would have found an M-augmenting path containing e if it exists. Let us discuss the exact details on the size of the search tree and recursion.

As any M-augmenting path has length at most \(\mathcal {O}(k)\), the depth of the search tree is also \(\mathcal {O}(k)\). Looking at any vertex, it might have \(\mathcal {O}(k^2)\) neighbours in the given bipartite graph. However, only \(\mathcal {O}(k)\) of its neighbours can be in M. As visiting an unmatched vertex must end the M-augmenting path, the search tree size is only increased by visiting matched vertices. Therefore, the search comes down to the following process. From the initial unmatched vertex, we can explore to at most \(\mathcal {O}(k)\) vertices (those in the matching) or any unmatched vertex which would end the search. If we explore to a matched vertex, the next step must traverse the edge in the matching to make an M-augmenting path, which is deterministic. Then we again can explore to \(\mathcal {O}(k)\) matched vertices, or any unmatched vertex which would end the search. This process continues. As we only visit each matched vertex at most once, we can see that the number of vertices the search visits is bounded by \(\mathcal {O}(k)\). In each node along the currently active path of the search tree, we can keep a counter with value \(\mathcal {O}(k^2)\) (using \(\mathcal {O}(\log (k^2)) = \mathcal {O}(\log k)\) bits) to keep track of what edge we consider next. These counters take up \(\mathcal {O}(k \cdot \log k) = \mathcal {O}(k \log n)\) space. In any node, if we wish to consider the next edge incident to a vertex v with a counter value x, we inspect the x-th edge incident to v in the stream. If it turns out we cannot visit that vertex (have already visited it), we can increment the counter and find the next edge to consider in the same pass (as the \((x+1)\)-th edge incident to v must be later in the stream than the x-th edge incident to v). Therefore, finding the next edge to visit in the search only takes a single pass. Notice that we return to nodes in the search tree at most \(\mathcal {O}(k)\) times in total, because only visiting matched vertices can result in a ‘failed’ search recursion. So, a search that visits all matched vertices uses \(\mathcal {O}(k)\) passes, and this is the maximum number of passes for a single search.

We start our search at most once from each unmatched vertex that has at least one edge (for which we can keep another counter to keep track), which means we do at most \(\mathcal {O}(k^2)\) searches. However, for each of these searches we start from different vertices, we still keep saved the set of visited matched vertices. If a search from a vertex visits a matched vertex and does not find an M-augmenting path, then neither will a search from a different vertex by visiting that matched vertex again. In particular, this is because the graph is bipartite, because, when we start from an unmatched vertex on e.g. the ‘right’ side, we have to end on an unmatched vertex on the ‘left’ side, while all vertices we visit on the path are matched vertices. So, the current path does not interfere with the ability to successfully find an endpoint, which makes another search visiting the same matched edge have exactly the same result. Having to do \(\mathcal {O}(k^2)\) searches would indicate that we need to use \(\mathcal {O}(k^2)\) passes, at least one for every search. However, in the AL model, if we consider the x-th vertex, and in the pass we use for it, we do no successful visit to a vertex (all adjacencies are matched and already visited), then in the same pass we can consider the \((x+1)\)-th vertex, because all edges incident to the \((x+1)\)-th vertex in the stream appear later than the edges incident to the x-th vertex in the stream. Hence, in the AL model, over all \(\mathcal {O}(k^2)\) starting vertices, we only use \(\mathcal {O}(k)\) passes, because only (partially) successful searches increase the number of passes, and we can only visit \(\mathcal {O}(k)\) vertices in total. In the EA model, we require at least one pass for each vertex we want to start searching from, and so the total number of passes is \(\mathcal {O}(k^2)\).

We conclude that with \(\mathcal {O}(k)\) passes in the AL model, and \(\mathcal {O}(k^2)\) passes in the EA model, and \(\mathcal {O}(k \log n)\) bits of memory we can execute a DFS to find an M-augmenting path (if it exists).

As mentioned, we can search for an M-augmenting path only \(\mathcal {O}(k)\) times, as the existence of more M-augmenting paths would result in returning NO. Therefore, we can find a maximum matching in B using \(\mathcal {O}(k^2)\) passes and \(\mathcal {O}(k \log n)\) bits of memory in the AL model. In the EA model, we require \(\mathcal {O}(k^3)\) passes to accomplish this. \(\square \)

Next, we show how to convert such a maximum matching as found by Theorem 43 into a minimum vertex cover for B, as asked by [52, Algorithm NT], for which we can use a DFS procedure as in Theorem 43 as a subroutine.

Lemma 44

Given a bipartite graph B as an AL stream and a maximum matching M of size \(\mathcal {O}(k)\), we can find a minimum vertex cover X for B with \(|X|= |M|\), using \(\mathcal {O}(k)\) passes and \(\mathcal {O}(k \log n)\) bits of memory. For the EA model, this takes \(\mathcal {O}(k^2)\) passes.

Proof

We adapt a theorem by Bondy and Murty [53, Page 74, Theorem 5.3] to the streaming setting to achieve this lemma. Let us repeat again what it entails. If our bipartite graph has vertex sets \(V, V'\) and a maximum matching M, then we can find a minimum vertex cover X with \(|X|= |M|\) in the following manner. Denote all unmatched vertices in V with U, and let \(Z \subseteq V \cup V'\) be the set of vertices connected to U with an M-alternating path (a path such that edges in M and not in M alternate). If \(S = Z \cap V\) and \(T = Z \cap V'\), then X is given by \(X = (V {\setminus } S) \cup T\).

As \(T \subseteq X\) and \(|X|= |M|\), we can find and save T by executing a DFS procedure just like in Theorem 43, without exceeding \(\mathcal {O}(k \log n)\) bits of memory. This takes \(\mathcal {O}(k)\) passes in the AL model and \(\mathcal {O}(k^2)\) passes in the EA model, and \(\mathcal {O}(k \log n)\) bits of memory. Also, \(V \setminus S\) must only contain matched vertices, as \(U \subseteq S\). Therefore, in the same DFS procedure to find T, we can also save for every matched vertex in V if it is reachable through an M-alternating path. Then \(V \setminus S\) is simply given by all matched vertices in M for which we did not save that they were reachable. We conclude that we can find X, the minimum vertex cover such that \(|X|= |M|\), in \(\mathcal {O}(k)\) passes (AL model) or \(\mathcal {O}(k^2)\) passes (EA model) and \(\mathcal {O}(k \log n)\) bits of memory. \(\square \)

The final result is as follows, which consists of putting the original stream through each step for every time we require a pass, i.e. the number of passes of each of the parts of this theorem combine in a multiplicative fashion.

Theorem 45

Given a graph G as an AL stream, we can produce a kernel of size 2k for the Vertex Cover [k] problem using \(\mathcal {O}(k^2)\) passes and \(\mathcal {O}(k \log n)\) bits of memory. In the EA model, this procedure takes \(\mathcal {O}(k^3)\) passes.

Proof

We execute Theorem 43 on the stream produced by applying Theorem 40 and then Lemma 42 on the input stream (we have to apply these transformations every time that we require a pass). Notice that these applications increase the number of passes by a constant factor. On the result of Theorem 43 we apply Lemma 44 to obtain a minimum vertex cover for the specific bipartite graph B. Now, \(C_0\) contains the vertices v for which both \(v,v' \in B\) are contained in the minimum vertex cover of B, and \(V_0\) contains the vertices v where either \(v,v' \in B\) is contained in the minimum vertex cover of B, but not both. Finding \(C_0\) and \(V_0\) from B and its minimum vertex cover requires no passes over the stream, as they are simply given by analysing the minimum vertex cover of B. These sets \(C_0\) and \(V_0\) are exactly the sets in the NT-Theorem (Proposition 41). The kernel by Chen et al. [23] is given by \(G' = G[V_0]\), which we can find with a pass (we can output the kernel as a stream), and parameter \(k' = k_1 - |C_0|\), where \(k_1\) is the parameter after application of Theorem 40. All in all, this process takes \(\mathcal {O}(k^2)\) (AL) or \(\mathcal {O}(k^3)\) (EA) passes and \(\mathcal {O}(k\log n)\) bits of memory. \(\square \)

7 Conclusion

We studied the complexity of Diameter and Connectivity in the streaming model, from a parameterized point of view. In particular, we considered the viewpoint of an H-free modulator, showing that a vertex cover or a modulator to the disjoint union of \(\ell \) cliques effectively forms the frontier of memory- and pass-efficient streaming algorithms. Both problems remain hard for almost all other H-free modulators of constant size (often even of size 0). We believe that this forms an interesting starting point for further investigations into which other graph classes or parameters might be useful when computing Diameter and Connectivity in the streaming model.

On the basis of our work, we propose four concrete open questions:

What is the streaming complexity of computing Distance to \(\ell \) Cliques? On the converse of Vertex Cover [k], we are not aware of any algorithms to compute this parameter, even though it is helpful in computing Diameter and Connectivity.
Are there algorithms or lower bounds for Diameter or Connectivity in the AL model for interval graphs?
Assuming isolated vertices are allowed in the graph, can we solve Connectivity in the AL model on split graphs using \(O(\log n)\) bits of memory?
Is there a streaming algorithm for Vertex Cover [k] using \(\mathcal {O}(\textrm{poly}(k))\) passes and \(\mathcal {O}(\textrm{poly}(k, \log n))\) bits of memory, or can it be shown that one cannot exist? This result would be relevant in combination with our kernel.

Notes

See Sect. 2 for the notation.
In fact, there are stronger assumptions that one can make while the complexity of \(\textsc {Disj}_n\) remains the same, see [50] or [49].
Notice that adding d does not change any of the distances.
If the reader is not convinced, notice that we could always extend the tails \(t_i,t_i'\) to consist of a longer path, making paths other that those that originate from T negligible.
This assumes the vertices are labelled \(1 \ldots n\) and do not have arbitrary labels.
We note that formally, we insert many edges twice, but this is to make the description more understandable. Alice does not actually insert these edges twice.
This section is based on the master thesis “Parameterized Algorithms in a Streaming Setting” by the first author.

References

Oostveen, J.J., van Leeuwen, E.J.: Parameterized complexity of streaming diameter and connectivity problems. In: Dell, H., Nederlof, J. (eds.) 17th International Symposium on Parameterized and Exact Computation, IPEC 2022, September 7–9, 2022, Potsdam, Germany. LIPIcs, vol. 249, pp. 24–12416. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, Dagstuhl (2022). https://doi.org/10.4230/LIPICS.IPEC.2022.24
Henzinger, M.R., Raghavan, P., Rajagopalan, S.: Computing on data streams. In: Abello, J.M., Vitter, J.S. (eds.) External Memory Algorithms, Proceedings of a DIMACS Workshop, New Brunswick, New Jersey, USA, May 20–22, 1998. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 50, pp. 107–118. DIMACS/AMS, Providence (1998). https://doi.org/10.1090/dimacs/050/05
Feigenbaum, J., Kannan, S., McGregor, A., Suri, S., Zhang, J.: On graph problems in a semi-streaming model. Theor. Comput. Sci. 348(2–3), 207–216 (2005). https://doi.org/10.1016/j.tcs.2005.09.013
Article MathSciNet Google Scholar
Feigenbaum, J., Kannan, S., McGregor, A., Suri, S., Zhang, J.: Graph distances in the data-stream model. SIAM J. Comput. 38(5), 1709–1727 (2008). https://doi.org/10.1137/070683155
Article MathSciNet Google Scholar
Sun, X., Woodruff, D.P.: Tight bounds for graph problems in insertion streams. In: Garg, N., Jansen, K., Rao, A., Rolim, J.D.P. (eds.) Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2015, August 24–26, 2015, Princeton, NJ, USA. LIPIcs, vol. 40, pp. 435–448. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, Dagstuhl (2015). https://doi.org/10.4230/LIPIcs.APPROX-RANDOM.2015.435
McGregor, A.: Graph stream algorithms: a survey. SIGMOD Rec. 43(1), 9–20 (2014). https://doi.org/10.1145/2627692.2627694
Article Google Scholar
Khan, S., Mehta, S.K.: Depth first search in the semi-streaming model. In: Niedermeier, R., Paul, C. (eds.) 36th International Symposium on Theoretical Aspects of Computer Science, STACS 2019, March 13–16, 2019, Berlin, Germany. LIPIcs, vol. 126, pp. 42–14216. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, Dagstuhl (2019). https://doi.org/10.4230/LIPIcs.STACS.2019.42
Elkin, M.: Distributed exact shortest paths in sublinear time. J. ACM 67(3), 15–11536 (2020). https://doi.org/10.1145/3387161
Article MathSciNet Google Scholar
Elkin, M., Trehan, C.: (1+\(\epsilon \))-approximate shortest paths in dynamic streams. In: Chakrabarti, A., Swamy, C. (eds.) Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2022, September 19–21, 2022, University of Illinois, Urbana-Champaign, USA (Virtual Conference). LIPIcs, vol. 245, pp. 51–15123. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, Dagstuhl (2022). https://doi.org/10.4230/LIPIcs.APPROX/RANDOM.2022.51
Reif, J.H.: Depth-first search is inherently sequential. Inf. Process. Lett. 20(5), 229–234 (1985). https://doi.org/10.1016/0020-0190(85)90024-9
Article MathSciNet Google Scholar
Downey, R.G., Fellows, M.R.: Parameterized Complexity. Monographs in Computer Science, Springer, New York (1999). https://doi.org/10.1007/978-1-4612-0515-9
Book Google Scholar
Fafianie, S., Kratsch, S.: Streaming kernelization. In: Csuhaj-Varjú, E., Dietzfelbinger, M., Ésik, Z. (eds.) Mathematical Foundations of Computer Science 2014—39th International Symposium, MFCS 2014, Budapest, Hungary, August 25–29, 2014. Proceedings, Part II. Lecture Notes in Computer Science, vol. 8635, pp. 275–286. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44465-8_24
Chitnis, R.H., Cormode, G., Hajiaghayi, M.T., Monemizadeh, M.: Parameterized streaming: maximal matching and vertex cover. In: Indyk, P. (ed.) Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, San Diego, CA, USA, January 4–6, 2015, pp. 1234–1251. SIAM, Philadelphia (2015). https://doi.org/10.1137/1.9781611973730.82
Chitnis, R., Cormode, G.: Towards a theory of parameterized streaming algorithms. In: Jansen, B.M.P., Telle, J.A. (eds.) 14th International Symposium on Parameterized and Exact Computation, IPEC 2019, September 11–13, 2019, Munich, Germany. LIPIcs, vol. 148, pp. 7–1715. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, Dagstuhl (2019). https://doi.org/10.4230/LIPIcs.IPEC.2019.7
Chitnis, R., Cormode, G., Esfandiari, H., Hajiaghayi, M., McGregor, A., Monemizadeh, M., Vorotnikova, S.: Kernelization via sampling with applications to finding matchings and related problems in dynamic graph streams. In: Krauthgamer, R. (ed.) Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, January 10–12, 2016, pp. 1326–1344. SIAM, Philadelphia (2016). https://doi.org/10.1137/1.9781611974331.ch92
Bishnu, A., Ghosh, A., Kolay, S., Mishra, G., Saurabh, S.: Fixed parameter tractability of graph deletion problems over data streams. In: Kim, D., Uma, R.N., Cai, Z., Lee, D.H. (eds.) Computing and Combinatorics—26th International Conference, COCOON 2020, Atlanta, GA, USA, August 29–31, 2020, Proceedings. Lecture Notes in Computer Science, vol. 12273, pp. 652–663. Springer, Heidelberg (2020). https://doi.org/10.1007/978-3-030-58150-3_53
Oostveen, J.J., van Leeuwen, E.J.: Streaming deletion problems parameterized by vertex cover. In: Bampis, E., Pagourtzis, A. (eds.) Fundamentals of Computation Theory—23rd International Symposium, FCT 2021, Athens, Greece, September 12–15, 2021, Proceedings. Lecture Notes in Computer Science, vol. 12867, pp. 413–426. Springer, Heidelberg (2021). https://doi.org/10.1007/978-3-030-86593-1_29
Goel, A., Kapralov, M., Khanna, S.: On the communication and streaming complexity of maximum bipartite matching. In: Rabani, Y. (ed.) Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto, Japan, January 17–19, 2012, pp. 468–485. SIAM, Philadelphia (2012). https://doi.org/10.1137/1.9781611973099.41
Kapralov, M.: Better bounds for matchings in the streaming model. In: Khanna, S. (ed.) Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013, New Orleans, Louisiana, USA, January 6–8, 2013, pp. 1679–1697. SIAM, Philadelphia (2013). https://doi.org/10.1137/1.9781611973105.121
McGregor, A., Vorotnikova, S., Vu, H.T.: Better algorithms for counting triangles in data streams. In: Milo, T., Tan, W. (eds.) Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2016, San Francisco, CA, USA, June 26–July 01, 2016, pp. 401–411. ACM, New York (2016). https://doi.org/10.1145/2902251.2902283
Dell, H., van Melkebeek, D.: Satisfiability allows no nontrivial sparsification unless the polynomial-time hierarchy collapses. J. ACM 61(4), 23–12327 (2014). https://doi.org/10.1145/2629620
Article MathSciNet Google Scholar
Buss, J.F., Goldsmith, J.: Nondeterminism within P. SIAM J. Comput. 22(3), 560–572 (1993). https://doi.org/10.1137/0222038
Article MathSciNet Google Scholar
Chen, J., Kanj, I.A., Jia, W.: Vertex cover: further observations and further improvements. J. Algorithms 41(2), 280–301 (2001). https://doi.org/10.1006/jagm.2001.1186
Article MathSciNet Google Scholar
Guruswami, V., Onak, K.: Superlinear lower bounds for multipass graph processing. Algorithmica 76(3), 654–683 (2016). https://doi.org/10.1007/s00453-016-0138-7
Article MathSciNet Google Scholar
Assadi, S., Raz, R.: Near-quadratic lower bounds for two-pass graph streaming algorithms. In: Irani, S. (ed.) 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, Durham, NC, USA, November 16–19, 2020, pp. 342–353. IEEE, Washington, DC (2020). https://doi.org/10.1109/FOCS46700.2020.00040
Chen, L., Kol, G., Paramonov, D., Saxena, R.R., Song, Z., Yu, H.: Almost optimal super-constant-pass streaming lower bounds for reachability. In: Khuller, S., Williams, V.V. (eds.) STOC’21: 53rd Annual ACM SIGACT Symposium on Theory of Computing, Virtual Event, Italy, June 21–25, 2021, pp. 570–583. ACM, New York (2021). https://doi.org/10.1145/3406325.3451038
Verbin, E., Yu, W.: The streaming complexity of cycle counting, sorting by reversals, and other problems. In: Randall, D. (ed.) Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2011, San Francisco, California, USA, January 23–25, 2011, pp. 11–25. SIAM, Philadelphia (2011). https://doi.org/10.1137/1.9781611973082.2
Huang, Z., Peng, P.: Dynamic graph stream algorithms in o(n) space. Algorithmica 81(5), 1965–1987 (2019). https://doi.org/10.1007/s00453-018-0520-8
Article MathSciNet Google Scholar
Assadi, S., Kol, G., Saxena, R.R., Yu, H.: Multi-pass graph streaming lower bounds for cycle counting, max-cut, matching size, and other problems. In: Irani, S. (ed.) 61st IEEE Annual Symposium on Foundations of Computer Science, FOCS 2020, Durham, NC, USA, November 16–19, 2020, pp. 354–364. IEEE, Washington, DC (2020). https://doi.org/10.1109/FOCS46700.2020.00041
Assadi, S., Vishvajeet, N.: Graph streaming lower bounds for parameter estimation and property testing via a streaming XOR lemma. In: Khuller, S., Williams, V.V. (eds.) STOC’21: 53rd Annual ACM SIGACT Symposium on Theory of Computing, Virtual Event, Italy, June 21–25, 2021, pp. 612–625. ACM, New York (2021). https://doi.org/10.1145/3406325.3451110
Assadi, S., Chen, Y., Khanna, S.: Polynomial pass lower bounds for graph streaming algorithms. CoRR (2019). arXiv:abs/1904.04720
Roditty, L., Williams, V.V.: Fast approximation algorithms for the diameter and radius of sparse graphs. In: Boneh, D., Roughgarden, T., Feigenbaum, J. (eds.) Symposium on Theory of Computing Conference, STOC’13, Palo Alto, CA, USA, June 1–4, 2013, pp. 515–524. ACM, New York (2013). https://doi.org/10.1145/2488608.2488673
Bringmann, K., Husfeldt, T., Magnusson, M.: Multivariate analysis of orthogonal range searching and graph distances. Algorithmica 82(8), 2292–2315 (2020). https://doi.org/10.1007/s00453-020-00680-z
Article MathSciNet Google Scholar
Abboud, A., Williams, V.V., Wang, J.R.: Approximation and fixed parameter subquadratic algorithms for radius and diameter in sparse graphs. In: Krauthgamer, R. (ed.) Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, January 10–12, 2016, pp. 377–391. SIAM, Philadelphia (2016). https://doi.org/10.1137/1.9781611974331.ch28
Husfeldt, T.: Computing graph distances parameterized by treewidth and diameter. In: Guo, J., Hermelin, D. (eds.) 11th International Symposium on Parameterized and Exact Computation, IPEC 2016, August 24–26, 2016, Aarhus, Denmark. LIPIcs, vol. 63, pp. 16–11611. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, Dagstuhl (2016). https://doi.org/10.4230/LIPIcs.IPEC.2016.16
Coudert, D., Ducoffe, G., Popa, A.: Fully polynomial FPT algorithms for some classes of bounded clique-width graphs. ACM Trans. Algorithms 15(3), 33–13357 (2019). https://doi.org/10.1145/3310228
Article MathSciNet Google Scholar
Bentert, M., Nichterlein, A.: Parameterized complexity of diameter. In: Heggernes, P. (ed.) Algorithms and Complexity—11th International Conference, CIAC 2019, Rome, Italy, May 27–29, 2019, Proceedings. Lecture Notes in Computer Science, vol. 11485, pp. 50–61. Springer, Heidelberg (2019). https://doi.org/10.1007/978-3-030-17402-6_5
Cabello, S.: Subquadratic algorithms for the diameter and the sum of pairwise distances in planar graphs. ACM Trans. Algorithms 15(2), 21–12138 (2019). https://doi.org/10.1145/3218821
Article MathSciNet Google Scholar
Corneil, D.G., Dragan, F.F., Habib, M., Paul, C.: Diameter determination on restricted graph families. Discrete Appl. Math. 113(2–3), 143–166 (2001). https://doi.org/10.1016/S0166-218X(00)00281-X
Article MathSciNet Google Scholar
Ducoffe, G.: Beyond helly graphs: The diameter problem on absolute retracts. In: Kowalik, L., Pilipczuk, M., Rzazewski, P. (eds.) Graph-Theoretic Concepts in Computer Science—47th International Workshop, WG 2021, Warsaw, Poland, June 23–25, 2021, Revised Selected Papers. Lecture Notes in Computer Science, vol. 12911, pp. 321–335. Springer, Heidelberg (2021). https://doi.org/10.1007/978-3-030-86838-3_25
Ducoffe, G., Dragan, F.F.: A story of diameter, radius, and (almost) helly property. Networks 77(3), 435–453 (2021). https://doi.org/10.1002/net.21998
Article MathSciNet Google Scholar
Ducoffe, G., Habib, M., Viennot, L.: Fast diameter computation within split graphs. In: Li, Y., Cardei, M., Huang, Y. (eds.) Combinatorial Optimization and Applications—13th International Conference, COCOA 2019, Xiamen, China, December 13–15, 2019, Proceedings. Lecture Notes in Computer Science, vol. 11949, pp. 155–167. Springer, Heidelberg (2019). https://doi.org/10.1007/978-3-030-36412-0_13
Ducoffe, G., Habib, M., Viennot, L.: Diameter computation on H-minor free graphs and graphs of bounded (distance) vc-dimension. In: Chawla, S. (ed.) Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5–8, 2020, pp. 1905–1922. SIAM, Philadelphia (2020). https://doi.org/10.1137/1.9781611975994.117
Gawrychowski, P., Kaplan, H., Mozes, S., Sharir, M., Weimann, O.: Voronoi diagrams on planar graphs, and computing the diameter in deterministic õ(n\({}^{{5/3}}\)) time. SIAM J. Comput. 50(2), 509–554 (2021). https://doi.org/10.1137/18M1193402
Article MathSciNet Google Scholar
Cygan, M., Fomin, F.V., Kowalik, L., Lokshtanov, D., Marx, D., Pilipczuk, M., Pilipczuk, M., Saurabh, S.: Parameterized Algorithms. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21275-3
Book Google Scholar
Bishnu, A., Ghosh, A., Kolay, S., Mishra, G., Saurabh, S.: Fixed-parameter tractability of graph deletion problems over data streams. CoRR (2019). arXiv:abs/1906.05458
Chitnis, R.H., Cormode, G., Esfandiari, H., Hajiaghayi, M., Monemizadeh, M.: Brief announcement: New streaming algorithms for parameterized maximal matching and beyond. In: Blelloch, G.E., Agrawal, K. (eds.) Proceedings of the 27th ACM on Symposium on Parallelism in Algorithms and Architectures, SPAA 2015, Portland, OR, USA, June 13–15, 2015, pp. 56–58. ACM, New York (2015). https://doi.org/10.1145/2755573.2755618
Bishnu, A., Ghosh, A., Mishra, G., Sen, S.: On the streaming complexity of fundamental geometric problems. CoRR (2018). arXiv:1803.06875
Agarwal, D., McGregor, A., Phillips, J.M., Venkatasubramanian, S., Zhu, Z.: Spatial scan statistics: approximations and performance study. In: Eliassi-Rad, T., Ungar, L.H., Craven, M., Gunopulos, D. (eds.) Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, August 20–23, 2006, pp. 24–33. ACM, New York (2006). https://doi.org/10.1145/1150402.1150410
Chakrabarti, A., Khot, S., Sun, X.: Near-optimal lower bounds on the multi-party communication complexity of set disjointness. In: 18th Annual IEEE Conference on Computational Complexity (Complexity 2003), 7–10 July 2003, Aarhus, Denmark, pp. 107–117. IEEE Computer Society, Washington, DC (2003). https://doi.org/10.1109/CCC.2003.1214414
Nemhauser, G.L., Trotter, L.E., Jr.: Vertex packings: structural properties and algorithms. Math. Program. 8(1), 232–248 (1975). https://doi.org/10.1007/BF01580444
Article MathSciNet Google Scholar
Bar-Yehuda, R., Even, S.: A local-ratio theorem for approximating the weighted vertex cover problem. In: Nagl, M., Perl, J. (eds.) Proceedings of the WG’83, International Workshop on Graphtheoretic Concepts in Computer Science, June 16–18, 1983, Haus Ohrbeck, Near Osnabrück, Germany, pp. 17–28. Universitätsverlag Rudolf Trauner, Linz (1983)
Bondy, J.A., Murty, U.S.R.: Graph Theory with Applications. Macmillan Education UK, London (1976). https://doi.org/10.1007/978-1-349-03521-2
Book Google Scholar
Hopcroft, J.E., Karp, R.M.: An n\({}^{5/2}\) algorithm for maximum matchings in bipartite graphs. SIAM J. Comput. 2(4), 225–231 (1973). https://doi.org/10.1137/0202019
Article MathSciNet Google Scholar

Download references

Funding

Jelle J. Oostveen is partially supported by the NWO Grant OCENW.KLEIN.114 (PACAN).

Author information

Authors and Affiliations

Department Information and Computing Sciences, Utrecht University, Princetonplein 5, 3584 CC, Utrecht, The Netherlands
Jelle J. Oostveen & Erik Jan van Leeuwen

Authors

Jelle J. Oostveen
View author publications
You can also search for this author in PubMed Google Scholar
Erik Jan van Leeuwen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Both authors made contributions to all aspects of the paper, but Jelle J. Oostveen did a clear majority of the conceptual work and writing, and created the figures. Both authors reviewed and commented on previous versions of the manuscript, and read and approved the final manuscript.

Corresponding authors

Correspondence to Jelle J. Oostveen or Erik Jan van Leeuwen.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

An extended abstract of this work appeared in the proceedings of IPEC 2022 [1].

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Oostveen, J.J., van Leeuwen, E.J. Parameterized Complexity of Streaming Diameter and Connectivity Problems. Algorithmica 86, 2885–2928 (2024). https://doi.org/10.1007/s00453-024-01246-z

Download citation

Received: 09 December 2022
Accepted: 03 June 2024
Published: 19 June 2024
Issue Date: September 2024
DOI: https://doi.org/10.1007/s00453-024-01246-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Parameterized Complexity of Streaming Diameter and Connectivity Problems

Abstract

Similar content being viewed by others

Small Vertex Cover Helps in Fixed-Parameter Tractability of Graph Deletion Problems over Data Streams

Dynamic Graph Stream Algorithms in o(n) Space

Fixed Parameter Tractability of Graph Deletion Problems over Data Streams

1 Introduction

Theorem 1

Theorem 2

Theorem 3

Theorem 4

Theorem 5

Theorem 6

2 Preliminaries

Proposition 7

Proposition 8

3 Upper Bounds for Diameter

Lemma 9

Proof

Lemma 10

Proof

Theorem 11

Proof

Theorem 12

Proof

Proof of Theorem 1

Lemma 13

Proof

Lemma 14

Proof

Theorem 15

Proof

Theorem 16

Proof

Proof of Theorem 4

4 Lower Bounds for Diameter

Theorem 17

Proof

Theorem 18

Proof

Theorem 19

Proof

Corollary 20

Proof

Theorem 21

Proof

Corollary 22

Proof

Theorem 23

Proof

Corollary 24

Proof

Theorem 25

Proof

Proof of Theorem 2

Proof of Theorem 3

Theorem 26

Proof

Corollary 27

Proof

4.1 Permutation Lower Bounds

Theorem 28

Proof

Corollary 29

Proof

Theorem 30

Proof

Corollary 31

Proof

5 Connectivity

Observation 32

Proof

Observation 33

Proof

Theorem 34

Proof

Corollary 35

Proof

Theorem 36

Proof