1 Introduction

Algorithms for graph classes that exhibit bounded expansion structure [12,13,14,15] offer a promising framework for efficiently solving many NP-hard problems on real-world networks. The structural restrictions of bounded expansion, which allow for pockets of localized density in globally sparse graphs, are compatible with properties of many real-world networks such as clustering and heavy-tailed degree distributions. Moreover, multiple random graph models designed to mimic these properties have been proven to asymptotically almost surely belong to classes of bounded expansion [5].

From a theoretical perspective, one of the main strengths of the notion of bounded expansion is the abundance of numerous equivalent characterizations of classes of bounded expansion (see e.g. the textbook [12] of the lecture notes [18]); while working with them, one can choose the characterization most suitable for the application at hand. For algorithmic applications, one of the most versatile characterizations are low-treedepth colorings.

A p-treedepth coloring of a graph G is a function \(\phi : V(G) \rightarrow [k]\) for an integer k such that for every set \(I \subseteq [k]\) of size less than p, \(\phi ^{-1}(I)\) induces a subgraph of G of treedepth at most |I|. (Treedepth is a structural property stronger than treewidth; see Definition 3.) The elements of the codmain of \(\phi \) are often refered to as colors and the size of a coloring is the number of used colors. A graph class has bounded expansion if and only if there exists a function f such that for every G in the graph class and every positive integer p, G admits a p-treedepth coloring of size at most f(p) [13].

This definition naturally implies an algorithmic pipeline [5, 6, 14] for classes of bounded expansion involving four stages: computing a low-treedepth coloring, using the coloring to decompose the graph into subgraphs of small treedepth, solving the problem efficiently on each such subgraph, and combining the subsolutions to construct a global solution. Let us illustrate this pipeline on the example of a Subgraph Isomorphism problem where, given a (large) graph G, called the host, and a (small) graph H, called the pattern, one asks if the host H contains a subgraph isomorphic to the pattern H. If G is from a class of bounded expansion and H is small, one can proceed as follows. First, find a p-treedepth coloring \(\phi \) of G for \(p = |V(H)| + 1\); let [k] be the codomain of \(\phi \). Second, for every subset I of [k] of size less than p, check if there is a subgraph of \(G[\phi ^{-1}(I)]\) isomorphic to H. Since \(G[\phi ^{-1}(I)]\) is of treedepth less than p, this can be done by a standard dynamic programming algorithm on graphs of bounded treewidth in time \(2^{O(p \log p)} \cdot |V(G)|\) (see e.g. [3, Chapter 7]). Since \(|V(H)| < p\), if there is a subgraph of G isomorphic to H, the answer will be positive for at least one choice of I. The running time of this algorithm is bounded by \(\left( {\begin{array}{c}k\\ <p\end{array}}\right) \cdot 2^{O(p \log p)} \cdot |V(G)|\) plus the time needed to compute the p-treedepth coloring \(\phi \), where the \(\left( {\begin{array}{c}k\\ <p\end{array}}\right) = \sum _{i=0}^{p-1} \left( {\begin{array}{c}k\\ i\end{array}}\right) \) factor comes from the number of choices of \(I \subseteq [k]\).

A recent implementation [16] and experimental evaluation [17] of this pipeline has identified that the coloring size (and thus the \(\left( {\begin{array}{c}k\\ <p\end{array}}\right) \) factor) has a much larger effect on the run time than the \(2^{O(p \log p)}\) factor in practice. Furthermore, although graphs in classes of bounded expansion are guaranteed to admit colorings of constant size with respect to the number of vertices, the only known polynomial-time algorithms for computing these colorings are approximations [12]. Consequently it is unclear to what extent our current coloring algorithms can be altered to reduce the coloring size. A more viable approach to improving the performance of the algorithmic pipeline without significant high-level changes would be to develop a new type of low-treedepth coloring that uses fewer colors but potentially has weaker guarantees about the treedepth of the subgraphs.

A slightly stronger notion than a p-treedepth coloring is a p-centered coloring of a graph G. This name stems from the property that on any subgraph H of G, a p-centered coloring either uses at least p colors on H or is a centered coloring: a coloring where every connected subgraph contains a vertex of a unique color. This property restricts the multiplicity of colors in induced subgraphs. It is not hard to prove that every p-centered coloring is a p-treedepth coloring while any p-treedepth coloring can be turned into a p-centered coloring with only a small increase in its size.

In this paper we introduce an alternative that closely mirrors this paradigm but only extends the color multiplicity guarantees to path subgraphs. For this reason we refer to them as p-linear colorings and linear colorings. More formally, a coloring \(\phi : V(G) \rightarrow [k]\) is a p-linear coloring if for every simple path P in G, \(\phi \) uses at least p colors on P or P contains a vertex of a unique color, that is, there exists a color \(i \in [k]\) with \(|\phi ^{-1}(i) \cap V(P)| = 1\).

Our goal is to study the (theoretical) usability of the notion of p-linear colorings in the aforementioned algorithmic pipeline. Clearly, every p-centered coloring is also a p-linear coloring. Hence, graphs of bounded expansion admit p-linear colorings with bounded number of colors and such colorings can be found efficiently (in theory). In the case of p-centered colorings, the preimage of any subset of less than p colors induces a subgraph of treedepth less than p, and this very strong structural property can be used algorithmically. The natural question arises: what can we say about such a preimage in the case of a p-linear coloring?

Understanding the tradeoffs between coloring size and treedepth in switching between p-centered and p-linear colorings fundamentally depends on bounding the maximum treedepth of a graph that admits a linear coloring with k colors. Equivalently, we frame this problem as determining the gap between the minimum number of colors needed for a linear versus a centered coloring in any given graph. Here, a centered coloring (linear coloring) of a graph G is a coloring \(\phi : V(G) \rightarrow [k]\) such that every connected subgraph H (every path H, respectively) of G admits a vertex of a unique color. By \(\chi _{\text {cen}}(G)\) (\(\chi _{\text {lin}}(G)\)) we denote the minimum number of colors needed in a centered (linear, respectively) coloring of G. We remark that \(\chi _{\text {cen}}(G)\) equals the treedepth of G. Using a grid minors approach, we prove that there exists a polynomial g such that for every graph G, \(\chi _{\text {cen}}(G) \le g(\chi _{\text {lin}}(G))\). That is, the minimum size of a centered coloring is polynomially bounded in the minimum size of a linear coloring. Consequently, there exists a polynomial g such that if \(\phi \) is a p-linear coloring with codomain [k] and \(I \subseteq [k]\) is of size smaller than p, then the treedepth \(G[\phi ^{-1}(p)]\) is bounded by g(p). The proof in Sect. 5 gives a bound of 190 on the degree of the polynomial g; we remark that subsequent improvements to the tools used in the proof decreased this exponent to 19 [4].

Because the “heavy machinery” of this approach likely does not give a tight bound, we give stronger upper bounds on the gap in trees and interval graphs and a matching lower bound for binary trees.

  • In Sect. 4 we show a general lower bound: a family of graphs where the ratio \(\chi _{\text {cen}}(G) / \chi _{\text {lin}}(G)\) is arbitrarily close to 2.

  • In Sect. 4 we also show that if \(B_\ell \) is the complete binary tree width \(\ell \) levels, then \(\liminf _{\ell \rightarrow \infty } \chi _{\text {cen}}(B_\ell ) / \chi _{\text {lin}}(B_\ell ) \ge \log _2 3\).

  • In Sect. 6 we show a tight upper bound for binary trees: for a tree G of maximum degree \(\varDelta \) it holds that \(\chi _{\text {cen}}(G) \le (\log _2 \varDelta ) \cdot \chi _{\text {lin}}(G)\).

  • In Sect. 7 we show that if G is an interval graph, then \(\chi _{\text {cen}}(G) \le (\chi _{\text {lin}}(G))^2\).

All aforementioned upper bounds are algorithmic in the following sense: for trees, the classic algorithm of Schäffer computes a minimum-sized centered coloring in linear time [21], while in the last result we give a polynomial-time algorithm that turns a linear coloring of size k of a given interval graph G into a centered coloring of G of size at most \(k^2\).

Finally, in Sect. 8 we focus on the task of verifying if a given coloring is a p-linear coloring. An efficinent algorithm for this task would be useful for evaluating heuristics that find a (supposedly) p-linear coloring of an input graph. Unfortunately, we show that this task cannot be done in polynomial time unless \(\text {P} = \text {co-NP}\). This section also discusses the practical implications of these findings.

Some results in this paper appeared previously in WG 2018 [10]. Compared to the conference version, this version adds a polynomial treedepth upper bound for general graphs (Sect. 5), as well as tighter lower and upper bounds for trees (Sect. 6 and the second half of Sect. 4).

2 Definitions and Background

In this section we detail the background and terminology necessary to understand p-linear colorings.

2.1 Graph Terminology

We denote the vertices and edges of a graph G as V(G) and E(G), respectively, and assume all graphs are simple and undirected except where specifically noted otherwise. The open neighborhood of a vertex v, denoted N(v), is the set of vertices u such that \(uv\in E(G)\), while the closed neighborhood, N[v] is defined as \(N(v)\cup \{v\}\). Vertex a is an apex with respect to a subgraph H if \(V(H) \subseteq N(a)\).

We say P is a \(v_1v_\ell \)-path if \(V(P) = \{v_1,\dots , v_\ell \}\) for distinct \(v_1,\dots , v_\ell \) and \(E(P) = \{v_iv_{i+1} : 1\le i \le \ell -1\}\); we will notate this as \(P=v_1,\dots , v_\ell \). Given disjoint paths \(P = v_1,\dots , v_\ell \) and \(Q = u_1,\dots , u_{\ell '}\), the path \(P\cdot Q = v_1,\dots , v_\ell , u_1,\dots , u_{\ell '}\) is the concatenation of P and Q if \(v_\ell \) and \(u_1\) are adjacent. A path is Hamiltonian with respect to subgraph H if \(V(P) = V(H)\).

In a rooted tree T, we let \(T_v\) be the subtree of T rooted at v and the leaf paths of \(T_v\) be the set of paths from a leaf of \(T_v\) to v. We label the levels of T from bottom to top starting from 1; that is, if D is the maximum distance from a leaf to the root then the root is the only vertex in level \(D+1\) and level i consists of all vertices whose parents are in level \(i+1\). Vertices u and v are unrelated in T if u is neither an ancestor nor a descendant of v.

A coloring \(\phi \) of a graph G is a mapping of the vertices of G to colors \(1,\dots , k\) and has size \(|\phi |=k\). A coloring is proper if no pair of adjacent vertices have the same color. For any subgraph H and color c, if there is exactly one vertex \(v\in H\) such that \(\phi (v) = c\) we say c appears uniquely in H and v is a center of H. A subgraph with no unique color is said to be non-centered.

We use the notation \(X = Y_1\uplus \dots \uplus Y_\ell \) to denote that \(Y_1,\dots , Y_\ell \) form a partition of X; that is, \(X = Y_1\cup \dots \cup Y_\ell \) and the sets \(Y_1,\dots Y_\ell \) are pairwise disjoint.

2.2 p-Centered Colorings and Bounded Expansion

Definition 1

A p-centered coloring \(\phi \) of graph G is a coloring such that for every connected subgraph H, H has a center or \(\phi |_{H}\) uses at least p colors.

Nešetřil and Ossona de Mendez defined the notion of graph classes of bounded expansion. In this paper we will not need the exact definition of this notion; we refer an interested reader to the textbook [12] or (newer) lecture notes [18]. We only need the following characterization via p-centered colorings.

Proposition 1

[13] A class of graphs \({\mathcal {C}}\) has bounded expansion iff there exists a function f such that for all \(G\in {\mathcal {C}}\) and all \(p\ge 1\), G admits a p-centered coloring with f(p) colors.

There are varying methods to compute p-centered colorings, such as transitive-fraternal augmentations [7, 13] and generalized coloring numbers [22], we focus here on distance-truncated transitive-fraternal augmentations (DTFAs) [19], which iteratively augment the graph with additional edges to impose constraints on proper colorings. This linear time algorithm guarantees that after \((2\log p)^p\) DTFA iterations, any proper coloring of the augmented graph is a p-centered coloring whose size is bounded in classes of bounded expansion.

2.3 Centered Colorings and Treedepth

Note that if \(\phi \) is a p-centered coloring of G and H is a subgraph of G whose vertices use at most \(p-1\) colors in \(\phi \), H must have a center. This relates p-centered colorings to a more restricted class of graphs defined by centered colorings.

Definition 2

A centered coloring \(\phi \) of graph G is a coloring such that every connected subgraph has a center. The minimum size of a centered coloring of G is denoted \(\chi _{\text {cen}}(G)\).

Note that a centered coloring is also proper, or else there would be a connected subgraph of size two with no center. Observe that if X is the set of all centers of G, then \(G\backslash X\) must either be empty or disconnected. This implies that if \(|G|\gg \chi _{\text {cen}}(G)\), then G breaks into many components after only a few vertex deletions. This property is captured by treedepth decompositions.

Definition 3

A treedepth decomposition \({\mathcal {T}}\) of graph G is a rooted forest with the same vertex set as G such that \(uv\in E(G)\) implies u is an ancestor of v in \({\mathcal {T}}\) or vice versa. The depth of \({\mathcal {T}}\) is the length of the longest path from a leaf of \({\mathcal {T}}\) to the root of its component. The treedepth of G, \({{\,\mathrm{td}\,}}(G)\), is the minimum depth of a treedepth decomposition of G.

Given a centered coloring of size k, we can generate a treedepth decomposition of depth at most k by choosing any center v to be the root and setting the children of v to be the roots of the treedepth decompositions of the components of \(G\backslash \{v\}\). Likewise, given a treedepth decomposition of depth k, we can generate a centered coloring using k colors by bijectively assigning the colors to levels of the tree and coloring vertices according to their level. We refer to the colorings and decompositions resulting from these procedures as canonical; together they imply that the treedepth and centered coloring numbers are equal for all graphs.

3 p-Linear and Linear Colorings

We introduce p-linear colorings as an alternative to p-centered colorings.

Definition 4

A p-linear coloring is a coloring \(\psi \) of a graph G such that for every pathFootnote 1P, either P has a center or \(\psi |_P\) uses at least p colors.

It is proven in [19] that after performing \(2^p\) DTFA iterations, any proper coloring of the augmented graph is a p-linear coloring. This implies that p-linear colorings indeed have constant size in bounded expansion classes and can be constructed in polynomial time (like p-centered colorings).

In the interest of maintaining consistency with prior terminology, we define linear colorings analogously to centered colorings.

Definition 5

A linear coloring is a coloring \(\psi \) of a graph G such that every path has a center. The linear coloring number is the minimum number of colors needed for a linear coloring and is denoted \(\chi _{\text {lin}}(G)\).

Note that linear colorings must also be proper. A simple recursive argument shows that every path of length d requires at least \(\log _2(d+1)\) colors in a linear coloring; thus a graph of linear coloring number k has no path of length \(2^k\).

Proposition 2

If G is a path, and \(\psi \) is a linear coloring of G with k colors, then \(|V(G)| < 2^k\).

Proof

The proof goes by induction on k. For the base case \(k=0\), we have \(V(G) = \emptyset \). For the inductive case, assume \(k \ge 1\). By the properties of a linear coloring, there exists a color \(i \in [k]\) with \(|\psi ^{-1}(i) \cap V(G)| = 1\); let \(v \in V(G)\) be the vertex with \(\psi (v) = i\). Then, for every connected component C of \(G-\{v\}\), \(\psi \) is a linear coloring of C with at most \(k-1\) colors. There are at most two such components, each being a path, and thus each such component is of size at most \(2^{k-1}-1\) by the inductive hypothesis. Consequently, \(|V(G)| \le 1 + 2 \cdot (2^{k-1}-1) = 2^k - 1\), as claimed. \(\square \)

Because every depth-first search tree is a treedepth decomposition, \({{\,\mathrm{td}\,}}(G) \le 2^{\chi _{\text {lin}}(G)}\), proving that small numbers of colors in p-linear colorings induce graphs of bounded treedepth.Footnote 2

Our study of the divergence between linear and centered coloring numbers will naturally focus on linear colorings that are not also centered colorings. We say \(\psi \) is a non-centered linear coloring (NCLC) of graph G if G contains a connected induced subgraph with no center. For NCLC \(\psi \), we say a connected induced subgraph H is a witness to \(\psi \) if H is non-centered but every proper connected subgraph of H has a center. For the sake of completeness, we prove in Lemma 1 that many simple graph classes do not admit NCLCs.

Recall that the class of cographs is the maximal hereditary class of graphs whose every element is either a single vertex, not connected, or its complement is not connected. That is, cographs are graphs built from one-vertex graphs via the operations of (i) disjoint union, and (ii) complementation.

Lemma 1

If G is a cograph, has maximum degree 2, or has independence number 2, any linear coloring of G is also a centered coloring.

Proof

We analyze each graph class separately below.

  • Maximum degree 2 Let G be a graph of maximum degree 2. Each connected induced subgraph of G is either a path or a cycle, both of which have a Hamiltonian path. Thus every connected subgraph has a center, making any linear coloring centered.

  • Cographs Let \(\psi \) be an NCLC of cograph G and H be a witness to \(\psi \). If \(\psi |_H\) only contains one color, H is an isolated vertex and the coloring is centered. Thus, we may assume \(\psi |_H\) has at least two colors. Because H is a cograph, we can partition its vertices into nonempty sets XY such that xy is an edge in H for all \(x\in X\) and \(y\in Y\). But since \(\psi \) is proper, every pair of vertices with the same color must lie in the same set X or Y. Since every color in \(\psi |_H\) appears at least twice, there are vertices \(\{v,v'\}\in X\) and \(\{u,u'\}\in Y\) such that \(\psi (v) = \psi (v')\) and \(\psi (u) = \psi (u')\) but \(\psi (v)\ne \psi (u)\). But then \(v,u,v'u'\) form a path with no center and thus \(\psi \) is not a linear coloring.

  • Independence number 2 Since independence number is hereditary, it is sufficient to show every connected graph of independence number 2 has a Hamiltonian path. We prove this by induction on the number of vertices, observing that an isolated vertex has a trivial Hamiltonian path. Let G be a graph of independence number 2 and \(v\in G\) a vertex such that \(G{\setminus } \{v\}\) is connected, e.g., v is a leaf in a minimum spanning tree of G. If \(G{\setminus } \{v\}\) has a Hamiltonian cycle, then G must have a Hamiltonian path. Otherwise, by the inductive hypothesis \(G{\setminus } \{v\}\) has a Hamiltonian path whose endpoints are some non-adjacent pair of vertices uw. Either v is adjacent to one of uw, in which case G has a Hamiltonian path, or \(\{u, w, v\}\) form an independent set of size 3.

\(\square \)

The classes described in Lemma 1 are maximal in the sense that there are graphs with independence number 3 (graph \(R_3\) described in Lemma 3) and binary trees (Lemma 4) that admit NCLCs.

4 Treedepth Lower Bounds

To understand the tradeoff between the number of colors and treedepth of small color sets when using p-linear colorings in lieu of p-centered colorings, it is important to understand to what extent \(\chi _{\text {cen}}(G)\) and \(\chi _{\text {lin}}(G)\) may differ. In Lemmas 3 and 4 , we prove lower bounds on the ratio \(\chi _{\text {cen}}(G) / \chi _{\text {lin}}(G)\) through explicit constructions of graph families. In order to show that these graphs have large treedepth, we first establish assumptions about the structure of treedepth decompositions that can be made without loss of generality.

Lemma 2

Let G be a connected graph and \(S\subset V(G)\) such that G[S] is connected and with respect to some component \(C\in G\backslash S\), every vertex in S is an apex of C. Then for any treedepth decomposition \({\mathcal {T}}\) of G with depth k, we can construct a treedepth decomposition \({\mathcal {T}}'\) such that:

  1. 1.

    \({\text {depth}}({\mathcal {T}}') \le k\)

  2. 2.

    Each vertex in S is an ancestor of every vertex in C in \({\mathcal {T}}'\)

  3. 3.

    For each pair of vertices \(\{u,w\}\subseteq V(C)\) or \(\{u,w\}\subseteq V(G{\setminus } C)\), u is an ancestor of w in \({\mathcal {T}}'\) iff it is an ancestor of w in \({\mathcal {T}}\).

Proof

Let \(\phi \) be a canonical centered coloring of G with respect to \({\mathcal {T}}\). Let \({\mathcal {T}}'\) be a canonical treedepth decomposition with respect to \(\phi \); if there are multiple vertices of unique color, prioritize removing those outside C before members of C, and then small colors over large colors, i.e., remove color 2 before color 5. Since \({\mathcal {T}}'\) is derived from a centered coloring with k colors, its depth is at most k, satisfying condition 1.

Condition 2 is satisfied as long each member of S is removed in the construction of \({\mathcal {T}}'\) before any member of C. Note that since S contains apex vertices with respect to C and every vertex \(v\in V(C)\) satisfies \(N[v]\subseteq V(C)\cup V(S)\), the removal of any vertex from C cannot disconnect a previously connected component if S has not been removed. Thus at any point in the algorithm before the removal of S if a vertex in C has a unique color in its remaining component H, there must be another vertex in \(H\backslash C\) of unique color as well. Consequently, we will never be forced to remove any vertex of C before S.

To prove condition 3 is satisfied, observe that u is an ancestor of w in \({\mathcal {T}}'\) iff there is a connected subgraph H containing u and w and no vertex with color smaller than \(\psi (u)\). As stated previously, \(G\backslash C\) is a connected subgraph, which means that there is a subgraph witnessing this ancestor-descendant relationship between u and w such that \(H\cap C = \emptyset \) if \(u\notin C\) and \(H\cap (G\backslash C) = \emptyset \) if \(u\in C\). Thus the relationships in \({\mathcal {T}}\) are preserved in \({\mathcal {T}}'\). \(\square \)

Using Lemma 2, we now show that \(\chi _{\text {cen}}(G)/\chi _{\text {lin}}(G)\) can be arbitrarily close to 2; see Fig. 1.

Fig. 1
figure 1

Linear colorings of graph \(R_6\) in Lemma 3

Lemma 3

There exists an infinite sequence of graphs \(R_1,R_2,\dots \) such that

$$\begin{aligned} \liminf _{i\rightarrow \infty } \frac{\chi _{\text {cen}}(R_i)}{\chi _{\text {lin}}(R_i)} \ge 2. \end{aligned}$$

Proof

Define \(R_{i}\) recursively such that \(R_0\) is the empty graph and \(R_{i}\) is a complete graph on vertices \(v_1,\dots , v_{i}\) along with i copies of \(R_p\) for \(p=\lfloor \frac{i-1}{2}\rfloor \), call them \(H_1,\dots , H_i\), such that \(v_j\) is an apex with respect to \(H_j\) (Fig. 1). We prove that \(\chi _{\text {lin}}(R_i) \le i\) and \(\lim _{i\rightarrow \infty } \chi _{\text {cen}}(R_i)/i = 2\).

With respect to the linear coloring number, note that \(\chi _{\text {lin}}(R_i)\ge i\) since the clique of size i requires i colors by Lemma 1. We prove the upper bound \(\chi _{\text {lin}}(R_i)\le i\) by induction on i. The case of \(i=1\) is trivial; assume it is true for \(1,\dots , i-1\). From the inductive hypothesis, we can assume each \(H_j\) only requires p colors for a linear coloring. Consider the coloring \(\psi \) of \(R_i\) such that \(\psi (v_j) = j\) and \(\psi |_{H_j}\) is a linear coloring of \(H_j\) using colors \(\{1+(j \bmod i), 1+((j+1)\bmod i),\dots , 1+((j+p - 1)\bmod i)\}\). If \(\psi \) is not a linear coloring, there is some path Q without a center. Since \(\psi (v_j)\notin \psi |_{H_j}\), Q must contain vertices from at least two \(H_j\)s; each \(v_j\) is a cut vertex, so Q cannot contain vertices from more than two \(H_j\)s. However, \(\psi ^{-1}(1)\subseteq \{v_{1}\} \cup V(H_{i-p+1}) \cup \dots \cup V(H_{i})\), but \(\{i-p+1,\dots , i\}\notin \psi |_{H_1}\), which means \(Q\cap H_1 = \emptyset \). Based on the symmetry of \(\psi \) we can apply the same argument to the remaining colors, which means that no such non-centered path Q exists and \(\psi \) is indeed a linear coloring of size i.

With respect to the centered coloring number, by Lemma 2 there is an minimum-depth treedepth decomposition in which \(v_j\) is an ancestor of \(H_j\). This implies there is a j such that no vertex in \(H_j\) shares a color in the canonical coloring with any of the vertices in the clique. Thus \(\chi _{\text {cen}}(R_i) = i + \chi _{\text {cen}}(R_p)\), implying \(\lim _{i \rightarrow \infty } \chi _{\text {cen}}(R_i)/i = 2\). \(\square \)

The graphs in Lemma 3 contain large cliques. We now show that this is not a necessary condition for the linear and centered coloring numbers to diverge.

Lemma 4

Let \(B_\ell \) be the complete binary tree with \(\ell \) levels. Then

$$\begin{aligned} \lim _{\ell \rightarrow \infty } \frac{\chi _{\text {cen}}(B_\ell )}{\chi _{\text {lin}}(B_\ell )} \ge \log _2 3. \end{aligned}$$

Proof

Fix an integer \(a \ge 1\) and let b be the smallest integer such that

$$\begin{aligned} 2^a < 3^b. \end{aligned}$$
(1)

Our proof proceeds by first constructing a coloring pattern \(\varPsi _a\) of \(B_a\) and then using \(\varPsi _a\) to create a linear coloring for an arbitrarily large complete binary tree. Some vertices of \(B_a\) will be left uncolored (we will call them local), while some vertices will be colored with one of the b colors [b] (we will call these colors global). Let \(C_1,C_2,\ldots ,C_{2^b}\) be the sequence of all subsets of [b] in order of nonincreasing size (in particular, \(C_1 = [b]\) and \(C_{2^b} = \emptyset \)) and let \(\ell \) be such that \(\sum _{i=1}^\ell 2^{|C_i|-1} = 2^{a-1}\). Note that such an index \(\ell \) exists due to Equation (1):

$$\begin{aligned} \sum _{i=1}^{2^b} 2^{|C_i|-1} = \frac{1}{2} \cdot 3^b > 2^{a-1} \end{aligned}$$

and the fact that the sets \(C_i\) are ordered in the nonincreasing order of their sizes. Furthermore, we have \(\ell < 2^b\).

Let \(v_1,v_2,\ldots ,v_{2^{a-1}}\) be an ordering of the leaves of \(B_a\) corresponding to an in-order traversal. Consider an index \(1 \le i \le \ell \). By construction, there exists a vertex \(u_i\in B_a\) at level \(|C_i|\) that is the root of a subtree \(T_{v_i}\) whose leaves are exactly \(v_j\) for \(\sum _{i'=1}^{i-1} 2^{|C_{i'}|-1} < j \le \sum _{i'=1}^i 2^{|C_{i'}|-1}\). We color the vertices of \(T_{u_i}\) level by level with (global) colors of \(C_i\); that is, we order the colors of \(C_i\) arbitrarily and color level k of \(T_{u_i}\) with the k-th color of \(C_i\) for every \(k \in [|C_i|]\). All remaining vertices of \(B_a\) (that is, those that lie in none of the subtrees \(T_{u_i}\) for \(1 \le i \le \ell \)) remain local.

The following claim summarizes the properties of the above coloring.

Claim 1

For every path P in \(B_a\) that either

  • has both endpoints in a leaf or the root of the tree \(B_a\), or

  • does not contain a local vertex,

there exists a global color \(c \in [b]\) such c appears uniquely on P.

Proof

If a path P does not contain a local vertex, then it is contained in a single tree \(T_{u_i}\). For such a path, the unique vertex on P of maximum level is colored with a global color that appears uniquely on P. Similarly, if P is a leaf path in \(B_a\), then any globally colored vertex of the tree \(T_{u_i}\) containing the leaf endpoint of P satisfies the desired property. Otherwise, a path P that has both endpoints in leaves of \(B_a\) but contains a local vertex needs to start in a leaf of one subtree \(T_{u_i}\) and end in a leaf of a different subtree \(T_{u_i'}\). Then, observe that any (global) color of \(C_i \triangle C_{i'}\) appears exactly once on P. \(\square \)

Let \(p < 2^a\) be the number of local vertices in the pattern \(\varPsi _a\). For an even integer \(d \ge 2\), consider a coloring \(\psi \) of \(B_{ad}\) defined as follows. Fix a palette [db] of global colors and a palette [2p] of local colors. For every \(1 \le i \le d\), the i-th stripe consists of a levels \((i-1)a+1,\ldots ,ia\). In \(B_{ad}\), such a stripe consists of \(2^{(d-i)a}\) copies of \(B_a\). Color every such copy using the pattern \(\varPsi _a\) with global colors \((i-1)b+1, \ldots , ib\) as the b global colors of \(\varPsi _a\) and color each local vertex with a different local color from the set \(\{1,2,\ldots ,p\}\) if i is odd and from the set \(\{p+1,p+2,\ldots ,2p\}\) if i is even.

We claim that the above is a linear coloring of \(B_{ad}\) with \(db+2p < db+2^{a+1}\) colors. Consider a path P in \(B_{ad}\) and let i be the index of the highest stripe intersected by P. By the choice of i, P intersects exactly one of the copies of \(B_a\) in the i-th stripe. If P contains a leaf-to-leaf path in this copy, then Claim 1 asserts that P contains a center in this copy (recall that every stripe uses a different set of b global colors). Otherwise, P intersects at most one copy of \(B_a\) in every stripe. If P intersects at least three stripes, then P contains a root-to-leaf path in the single copy of \(B_a\) intersected by P at stripe \((i-1)\), and we are again done by Claim 1. Similarly, Claim 1 finishes the proof if P does not contain a local vertex at the i-th stripe. Finally, in the remaining case P intersects at most two stripes (the i-th one and possibly the \((i-1)\)-th one) and contains a local vertex in the i-th stripe. Since we used different set of local colors for odd and even stripes, any such local vertex in i-th stripe is a center of P.

Consequently, we have exhibited a linear coloring of \(B_{ad}\) with less than \(db+2^{a+1}\) colors, where b is defined as in Equation (1). If we let d go to \(\infty \), then the ratio \((ad)/(db+2^{a+1})\) approaches a/b. This ratio, in turn, approaches \(\log _2(3)\) as \(a \rightarrow \infty \) due to the choice of b at Equation (1). This finishes the proof of the lemma. \(\square \)

In Sect. 6 we show that the bound in Lemma 4 is tight for binary trees (Theorem 4). We conjecture that the construction in Lemma 3 is also tight for general graphs.

Conjecture 1

For any graph G, \(\chi _{\text {cen}}(G)\le 2\chi _{\text {lin}}(G)\).

While the exclusion of a path of length \(2^k\) indicates \(\chi _{\text {cen}}(G) \le 2^{\chi _{\text {lin}}(G)}\), this nonetheless leaves a large gap between the upper and lower bounds on the possible gap between \(\chi _{\text {cen}}(G)\) and \(\chi _{\text {lin}}(G)\). To move towards a proof of Conjecture 1, we establish a polynomial upper bound on \(\chi _{\text {cen}}(G)\) in terms of \(\chi _{\text {lin}}(G)\) in general graphs in the next section (Theorem 1). Because this proof uses “heavy machinery”, we consider two restricted graph classes—namely, trees and interval graphs—in Sects 6 and 7 .

5 Treedepth Upper Bounds on General Graphs

This section is devoted to proving a polynomial upper bound on \(\chi _{\text {cen}}(G)\) in terms of \(\chi _{\text {lin}}(G)\).

Theorem 1

There exists a polynomial p such that every graph G satsifies \(\chi _{\text {cen}}\le \chi _{\text {lin}}^{190} p(\log \chi _{\text {lin}})\).

Our starting point is the following theorem of Kawarabayashi and Rossman [9]:

Theorem 2

[9] There is an absolute constant C such that every graph G of treedepth at least \(C k^5 \log ^2 k\) satisfies at least one of the following:

  1. 1.

    the treewidth of G is at least k;

  2. 2.

    G contains a complete binary tree of height k as a minor;

  3. 3.

    G contains a path on \(2^k\) vertices.

Assume that the treedepth of G is at least \(C k^5 \log ^2 k\). If G contains a path on \(2^k\) vertices (condition 3), then clearly \(\chi _{\text {lin}}(G) \ge k\). If G contains a complete binary tree of height k as a minor (condition 2), then G also contains a subdivision of a complete binary tree of height k as a subgraph. Since \(\chi _{\text {lin}}(H) \le \chi _{\text {lin}}(G)\) for any subgraph H of G, Theorem 4 asserts that \(\chi _{\text {lin}}(G) \ge k / \log _2(3)\). Thus, in the proof of Theorem 1, we are left with the case when G has large treewidth.

Here, we use the celebrated grid minor theorem, with the best known bound due to Chuzhoy [2].

Theorem 3

[2] There is a polynomial \(p'\) such that every graph G with treewidth at least \(k^{19} p'(\log k)\) contains a \(k \times k\) grid as a minor.

We slightly relax the notion of a \(k \times k\) grid minor to a k-pseudogrid, defined as follows.

Definition 6

A graph G contains a k-pseudogrid if there exist two sequences of vertex-disjoint paths in G, \({\mathcal {P}} = (P_1,P_2,\ldots ,P_k)\) and \({\mathcal {Q}} = (Q_1,Q_2,\ldots ,Q_k)\) such that

  • for every \(i \in [k]\), the path \(P_i\) is a concatenation of paths \(P_{i,0}\), \(P_{i,1}^Q\), \(P_{i,1}\), \(P_{i,2}^Q\), \(P_{i,2}\), \(\ldots \), \(P_{i,k}^Q\), \(P_{i,k}\) in this order such that each path \(P_{i,j}^Q\) for \(j \in [k]\) is a subpath of \(Q_j\) (possibly consisting of a single vertex) and every path \(P_{i,j}\), \(0 \le j \le k\) does not contain any edge nor internal vertex on any path \(Q_j\) (we explicitly allow \(P_{i,0}\) and \(P_{i,k}\) to be paths of length 0);

  • a symmetric condition holds with the roles of \({\mathcal {P}}\) and \({\mathcal {Q}}\) swapped.

In what follows, the paths \(P_{i,j}\), \(P_{i,j}^Q\), \(Q_{i,j}\), and \(Q_{i,j}^P\) are considered empty for pairs of indices (ij) not defined above.

Clearly, if G contains a \(k \times k\)-grid as a minor, it contains a k-pseudogrid: just let the paths \({\mathcal {P}}\) follow the rows of the grid and the paths of \({\mathcal {Q}}\) follow the columns. To finish the proof of Theorem 1, it suffices to show the following technical result.

Lemma 5

If G contains a k-pseudogrid, then \(\chi _{\text {lin}}(G) = \varOmega (\sqrt{k})\).

Proof

Fix a linear coloring \(\psi \) of G. Let \(({\mathcal {P}}, {\mathcal {Q}})\) be a k-pseudogrid in G. Let \(V({\mathcal {P}}) = \bigcup _{P \in {\mathcal {P}}} V(P)\) and similarly define \(V({\mathcal {Q}})\). Let \(\mu ({\mathcal {P}})\) be the number of distinct colors \(\psi \) uses on \(V({\mathcal {P}})\) and similarly define \(\mu ({\mathcal {Q}})\). To prove the lemma, it suffices to show for any k-pseudogrid \(({\mathcal {P}},{\mathcal {Q}})\) in G, \(k \le 100 \cdot (\mu ({\mathcal {P}}) + \mu ({\mathcal {Q}}))^2\). We shall prove it by induction over k.

The statement is trivial for \(k \le 100\). For an inductive step, we proceed as follows. For a vertex \(v \in V({\mathcal {P}}) \cup V({\mathcal {Q}})\), the grid coordinate of v is (ij) if \(v \in V(P_{i,j}^Q) \cup (V(P_{i,j}) {\setminus } V(P_{i,j+1}^Q)) \cup V(Q_{i,j}^P) \cup (V(Q_{i,j}) {\setminus } V(Q_{i+1,j}^P))\). A vertex v is marginal if its grid coordinates (ij) satisfy \(i \le 3\), \(j \le 3\), \(i \ge k-2\), or \(j \ge k-2\). A color c is infrequent on \({\mathcal {P}}\) if it appears on \(V({\mathcal {P}})\), but there exists a family \({\mathcal {P}}_c \subseteq {\mathcal {P}}\) of at size at most \(50(\mu ({\mathcal {P}}) + \mu ({\mathcal {Q}}))\) such that every vertex \(v \in V({\mathcal {P}})\) with \(\psi (v) = c\) is either marginal or lies on one of the paths in \({\mathcal {P}}_c\). The definition of a color infrequent on \({\mathcal {Q}}\) is analogous.

For an inductive step, it suffices to show that there is always an infrequent color on \({\mathcal {P}}\) or an infrequent color on \({\mathcal {Q}}\). Indeed, assume that c is infrequent on \({\mathcal {P}}\) (the arguments for \({\mathcal {Q}}\) are symmetrical) and let \({\mathcal {P}}_c \subseteq {\mathcal {P}}\) be as in the above definition. Construct a \(k'\)-pseudogrid \(({\mathcal {P}}',{\mathcal {Q}}')\) from \(({\mathcal {P}},{\mathcal {Q}})\) as follows. Start with \(({\mathcal {P}}',{\mathcal {Q}}') = ({\mathcal {P}},{\mathcal {Q}})\). First, delete from \({\mathcal {P}}'\) the first and last 3 paths, and similarly for \({\mathcal {Q}}'\). Second, shorten every path \(P_i \in {\mathcal {P}}'\) by deleting the edges of \(P_{i,j}\) and \(P_{i,j}^Q\) for \(j \le 3\) and \(j \ge k-2\); similarly shorten every path \(Q_j \in {\mathcal {Q}}'\). Finally, delete all (shortened) paths of \({\mathcal {P}}_c\) from \(\mathcal {P'}\), and delete a matching number of paths from \({\mathcal {Q}}'\). In this manner, we obtain a \(k'\)-pseudogrid \(({\mathcal {P}}',{\mathcal {Q}}')\) such that \(k-k' \le 6 + 50(\mu ({\mathcal {P}}) + \mu ({\mathcal {Q}}))\) and such that the color c no longer appears on \(V({\mathcal {P}})\). Therefore, \(\mu ({\mathcal {P}}') + 1 \le \mu ({\mathcal {P}})\) and \(\mu ({\mathcal {Q}}') \le \mu ({\mathcal {Q}})\). The inductive step follows.

In the remainder of the proof, assume that there is no infrequent color on \({\mathcal {P}}\) nor an infrequent color on \({\mathcal {Q}}\). We shall reach a contradiction by exhibiting a simple noncentered path \(P\subseteq {\mathcal {P}} \cup {\mathcal {Q}}\).

We perform the following selecting and marking scheme. Initially, no vertex is selected and no path is marked. For every color c that appears on \(V({\mathcal {P}})\), perform the following operation twice.

  1. 1.

    Pick a vertex \(v \in V({\mathcal {P}})\) such that \(\psi (v) = c\), v is not marginal, and v does not lie on a marked path \(P_i\). Let the grid coordinates of v be (ij).

  2. 2.

    Select v and mark all paths \(P_{i'}\) for \(|i'-i| \le 10\) and all paths \(Q_{j'}\) for \(|j'-j| \le 10\).

Now swap the roles of \({\mathcal {P}}\) and \({\mathcal {Q}}\) and perform the above operation twice also for every color c that appears on \(V({\mathcal {Q}})\). In total, we select \(2(\mu ({\mathcal {P}}) + \mu ({\mathcal {Q}}))\) vertices. For every selected vertex we mark 21 paths of \({\mathcal {P}}\) and 21 paths of \({\mathcal {Q}}\). Since there is no infrequent color, there is always a vertex to choose at Step 1, as otherwise the so-far marked paths would witness infrequency of c. Thus, the above selecting and marking scheme is well-defined.

Let \(v,v'\) be two distinct selected vertices and let (ij) and \((i',j')\) be their grid coordinates. By the above marking scheme, we have that

$$\begin{aligned} 3< i,i',j,j' < k-2 \quad \mathrm {and} \quad |i-i'| + |j-j'| \ge 11. \end{aligned}$$
(2)

Consider now the following simple path P. We start with P being the concatenation of even-numbered paths \(P_i\) without the prefixes and suffixes \(P_{i,0} \cup P_{i,k}\) in the natural order, connected by paths \(Q_{i,1} \cup Q_{i+1,1}^P \cup Q_{i+1,1}\) for i divisible by 4 and by \(Q_{i,k} \cup Q_{i+1,k}^P \cup Q_{i+1,k}\) for \(i\equiv 2 \pmod 4\) (so that paths \(P_i\) with \(i\equiv 2 \pmod 4\) are traversed forwards and paths \(P_i\) with i divisible by 4 are traversed backwards). Then, for every selected v with grid coordinates (ij), we pick an even \(i' \in \{i,i+1\}\) and modify locally \(P \cap P_{i'}\) to pass through v. In the modification, we use only parts of paths \(P_{i,j}^Q \cup P_{i,j} \cup P_{i,j+1}^Q\), \(P_{i+1,j}^Q \cup P_{i+1,j} \cup P_{i+1,j+1}^Q\), \(Q_{i,j}^P \cup Q_{i,j} \cup Q_{i+1,j}^P\), and \(Q_{i,j+1}^P \cup Q_{i,j+1} \cup Q_{i+1,j+1}^P\). By Equation (2), two such modifications do not interfere with each other and no such modification interferes with the connections contained in paths \(Q_1\) and \(Q_k\). Consequently, the final path P is a simple path contained in \({\mathcal {P}} \cup {\mathcal {Q}}\) that visits all selected vertices. Such a path does not contain a center, which is the desired contradiction. \(\square \)

We conclude with a formal proof of Theorem 1.

Proof of Theorem 1

Let G be a graph with \(\chi _{\text {lin}}(G) \le k\). By Lemma 5, there exists a constant \(c_1\) such that G contains no \((c_1k^2)\)-pseudogrid. Since a grid minor yields a pseudogrid of the same size, by Theorem 3, the treewidth of G is bounded by \(k^{38} p''(\log k)\) for some polynomial \(p''\).

Since \(\chi _{\text {lin}}(G) \le k\), G does not contain a path on \(2^k\) vertices. Furthermore, Theorem 4, together with the fact that a complete binary tree of height k has treedepth at least k, implies that G does not contain a subdivision of of a complete binary tree of height \(\lceil k \log _2(3) \rceil \).

Hence, Theorem 2 implies that the treedepth of G is bounded by \(k^{190} \cdot p(\log k)\) for some polynomial p, as desired. \(\square \)

6 Treedepth Upper Bounds on Trees

Schäffer proved that there is a linear time algorithm for finding a minimum-sized centered coloring of a tree T [21]. In this section we prove the following theorem by showing a correspondence between the centered coloring from Schäffer’s algorithm and colors on paths in any linear coloring of T.

Theorem 4

Let T be a tree of maximum degree \(\varDelta \ge 3\), Then \(\chi _{\text {cen}}(T) \le (\log _2 \varDelta ) \cdot \chi _{\text {lin}}(T)\).

In particular, for trees of maximum degree 3 we have \(\chi _{\text {cen}}(T) \le \log _2(3) \chi _{\text {lin}}(T)\), matching the lower bound of Lemma 4. We do not have any matching lower bound for larger \(\varDelta \). In fact, we conjecture that none exists, that is, the upper bound of Theorem 4 for \(\varDelta \ge 4\) is not tight.

In the proof of Theorem 4 we look at the run of Schäffer’s algorithm on T and compare the size of the output (minimum-sized) centered coloring with the size of an arbitrary linear coloring of T.

Schäffer’s algorithm finds a particular centered coloring whose colors are ordered in a way that reflects their roles as centers. For this reason, the coloring is called a vertex ranking and the colors are referred to as ranks; it guarantees that in each subgraph, the vertex of maximum rank is also a center. We will use this terminology in this section to clearly distinguish between the ranks in the vertex ranking and colors in the linear coloring. Note that the canonical centered coloring of a treedepth decomposition is a vertex ranking if the colors are ranked decreasing from the root downwards, which implies that every centered coloring can be converted to a vertex ranking of the same size. Of central importance to Schäffer’s algorithm are what we will refer to as rank lists.

Definition 7

For a vertex ranking r of tree T, the rank list of T, denoted L(T), can be defined recursively as \(L(T) = L(T\backslash T_v)\cup \{r(v)\}\) where v is the vertex of maximum rank in T.

Schäffer’s algorithm arbitrarily roots T and builds the ranking from the leaves to the root of T, computing the rank of each vertex from the rank lists of each of its children. For brevity, we denote \(L(v) = L(T_v)\) for every v in T.

Proposition 3

[21] Let r be a vertex ranking of T produced by Schäffer’s algorithm and let \(v\in T\) be a vertex with children \(u_1,\dots , u_\ell \). If x is the largest integer appearing on rank lists of at least two children of v (or 0 if all such rank lists are pairwise disjoint) then r(v) is the smallest integer satisfying \(r(v)> x\) and \(r(v)\notin \bigcup _{i=1}^{\ell } L(u_i)\).

We root T at an arbitrary leaf of T and let r be a ranking output by Schaffers algorithm applied on (rooted) T. With a vertex v in T we associate the following potential.

$$\begin{aligned} \zeta (v) = \sum _{r \in L(v)} 2^r. \end{aligned}$$

The following is immediate from Proposition 3:

Lemma 6

For every v in T with children \(u_1,u_2,\ldots ,u_\ell \), it holds that

$$\begin{aligned} \zeta (v) \le 2 + \sum _{i=1}^\ell \zeta (u_i). \end{aligned}$$

Furthermore, the equality holds if and only if all rank lists \(L(u_i)\) are pairwise disjoint.

Proof

As in Proposition 3, let x be the largest integer appearing on rank lists of at least two children of v (or 0 if all such rank lists are pairwise disjoint). We have,

$$\begin{aligned} \zeta (v) = 2^{r(v)} + \sum _{i=1}^\ell \sum _{r \in L(u_i), r > r(v)} 2^r. \end{aligned}$$

Note that every rank \(r > r(v)\) that appears in a list \(L(u_i)\) for some \(i \in [\ell ]\) does not appear on any other list \(L(u_{i'})\), \(i' \ne i\). Hence, by Proposition 3, if \(x=0\), then

$$\begin{aligned} \sum _{i=1}^\ell \zeta (u_i) = \sum _{i=1}^\ell \sum _{r \in L(u_i), r> r(v)} 2^r + \sum _{r=1}^{r(v)-1} 2^r = 2^{r(v)} - 2 + \sum _{i=1}^\ell \sum _{r \in L(u_i), r > r(v)} 2^r. \end{aligned}$$

While if \(x > 0\), then

$$\begin{aligned} \sum _{i=1}^\ell \zeta (u_i) \ge \sum _{i=1}^\ell \sum _{r \in L(u_i), r> r(v)} 2^r + 2\cdot 2^x + \sum _{r=x+1}^{r(v)-1} 2^r = 2^{r(v)} + \sum _{i=1}^\ell \sum _{r \in L(u_i), r > r(v)} 2^r. \end{aligned}$$

This concludes the proof. \(\square \)

Let \(\psi \) be a linear coloring of T with \(k := \chi _{\text {lin}}(T)\) colors. Our proof of Theorem 4 is based on tracking sets of colors of \(\psi \) on paths terminating at the current vertex as Schäffer’s algorithm moves up the rooted tree. Given a path \(P\subseteq T\) and a linear coloring \(\psi \) of size k, we say a color set \(X\subseteq \{1,\dots , k\}\) is compatible with P if both the following conditions are true:

  1. 1.

    For every center \(v\in P\), \(\psi (v)\in X\).

  2. 2.

    For every color \(c\in X\), there is a vertex \(u\in P\) such that \(\psi (u) = c\).

In other words, a compatible set must not contain colors not found on P, must contain each color appearing uniquely in P, and may or may not contain any colors appearing multiple times on P. For each \(v\in T\), let S(v) be a set of sets defined recursively as follows. If v is a leaf, \(S(v) = \{\{\psi (v)\}\}\). Otherwise, let \(u_1,\dots , u_\ell \) be the children of v, \(S' = \bigcup _{i=1}^{\ell } S(u_i)\), and \(\xi : S'\rightarrow 2^{[k]}\) be an injective function such that

$$\begin{aligned} \xi (X) = {\left\{ \begin{array}{ll} X{\setminus }\{\psi (v)\} &{}\quad \text {if } \psi (v)\in X \text { and } X\backslash \{\psi (v)\} \in S' \\ X\cup \{\psi (v)\} &{}\quad \text {otherwise} \end{array}\right. } \end{aligned}$$

for all \(X\in S'\). Then \(S(v) = \{\xi (X): X\in S'\}\cup \{\{\psi (v)\}\}\). We start with the following straightforward observation.

Lemma 7

For every \(v \in T\) it holds that \(\emptyset \notin S(v)\). Consequently, for every nonleaf \(v \in T\), \(\xi \) is a bijection between \(S'\) and \(S(v) {\setminus } \{\{\psi (v)\}\}\).

We prove that the construction of S(v) preserves compatibility of sets.

Lemma 8

For all vertices \(v\in T\) and each \(X\in S(v)\), there is a corresponding path \(P\subseteq T_v\) with v as an endpoint such that P is compatible with X.

Proof

It is clear that the lemma holds at the leaves of T, so we proceed by inductively showing the recursive step preserves the property. Observe that the path consisting of v only is compatible with \(\{\psi (v)\} \in S(v)\). For any \(X\in S(v) {\setminus } \{\{\psi (v)\}\}\), there is a child u of v such that \(X' = \xi ^{-1}(X)\) is in S(u). By the inductive hypothesis, there must be a path \(P'\) terminating at u such that \(X'\) is compatible with \(P'\). We claim that \(P = P'\cdot \{v\}\) is compatible with X. Since \(X\triangle X'\subseteq \{\psi (v)\}\) and each color \(c\ne \psi (v)\) appears the same number of times in P and \(P'\), it is only necessary to prove the requirements for compatibility are satified with respect to \(\psi (v)\). Moreover, because \(\psi (v)\) appears at least once on P it suffices to show that if \(\psi (v)\notin X\), then \(\psi (v)\) appears multiple times on P. By the definition of \(\xi \), \(\psi (v)\notin X\) implies \(\psi (v) \in X'\) and thus v is not a center of P. \(\square \)

Define \(\rho (v) = \sum _{X\in S(v)} (\varDelta -1)^{|X|}\). We observe the following

Lemma 9

For any vertex \(v \in T\) with children \(u_1,\dots , u_\ell \), \(\rho (v)\ge (\varDelta -1) + \sum _{i=1}^{\ell } \rho (u_i)\).

Proof

First, note that \(\ell \le \varDelta -1\). Also, the lemma is straightforward for a leaf v as then \(S(v) = \{\{\psi (v)\}\}\) and \(\rho (v) = \varDelta -1\). Assume then \(\ell \ge 1\).

Recall that \(S' = \bigcup _{i=1}^\ell S(u_i)\). Let \(S_1\) be the set of all color sets that appear in exactly one \(S(u_i)\) and \(S_M\) be those that occur in multiple \(S(u_i)\)’s; we have \(S' = S_1 \uplus S_M\). Note that for each \(X\in S_M\), \(\psi (v)\notin X\) or else concatenating the corresponding compatible paths with v creates a path with no center. Likewise, if there are distinct color sets Y and \(Y' = Y\backslash \{\psi (v)\}\) such that \(\{Y,Y'\} \subseteq S_1\cup S_M\), then \(Y,Y'\) both belong to the same \(S(u_i)\); in particular, both \(Y,Y'\) belong to \(S_1\).

In particular, if \(X \in S_M\), then \(\psi (v) \notin X\) and \(X \cup \{\psi (v)\} \notin S'\). Consequently,

$$\begin{aligned} \sum _{X' \in S_M} (\varDelta -1)^{|\xi (X')|} = (\varDelta -1) \sum _{X' \in S_M} (\varDelta -1)^{|X'|}. \end{aligned}$$
(3)

By the definition of \(\xi \), for each color set \(X\in S(v) {\setminus } \{\{\psi (v)\}\}\) either \(|X| \ge |\xi ^{-1}(X)|\) or \(|X| = |\xi ^{-1}(X)| - 1\). In the latter case, there is a corresponding color set \(X' = X\cup \{\psi (v)\}\) such that \(X'\in S(v)\) and \(\xi ^{-1}(X') = X\). Also, as already discussed, \(X,X' \in S_1\). That is, if \(|X| < |\xi ^{-1}(X)|-1\) for some \(X \in S(v) {\setminus } \{\{\psi (v)\}\}\), then \(\psi (v) \notin X\), \(X' \in S(v) {\setminus } \{\{\psi (v)\}\}\) for \(X' := X \cup \{\psi (v)\}\), and \(X,X' \in S_1\) as well. Hence,

$$\begin{aligned} \sum _{X' \in S_1} (\varDelta -1)^{|\xi (X')|} \ge \sum _{X' \in S_1} (\varDelta -1)^{|X'|}. \end{aligned}$$
(4)

From (3) and (4), we infer that

$$\begin{aligned} \rho (v)&= (\varDelta -1)^{|\{\psi (v)\}|} + \sum _{X \in S(v) {\setminus } \{\{\psi (v)\}\}} (\varDelta -1)^{|X|} \\&= (\varDelta -1) + \sum _{X' \in S'} (\varDelta -1)^{|\xi (X')|} \\&= (\varDelta -1) + \sum _{X' \in S_1} (\varDelta -1)^{|\xi (X')|} + \sum _{X' \in S_M} (\varDelta -1)^{|\xi (X')|} \\&\ge (\varDelta -1) + \sum _{X' \in S_1} (\varDelta -1)^{|X'|} + (\varDelta -1)\sum _{X' \in S_M} (\varDelta -1)^{|X'|} \\&\ge (\varDelta -1) + \sum _{i=1}^\ell \sum _{X' \in S(u_i)} (\varDelta -1)^{|X'|} \\&\ge (\varDelta -1) + \sum _{i=1}^\ell \rho (u_i). \end{aligned}$$

\(\square \)

We conclude with the proof of Theorem 4.

Proof of Theorem 4

For every leaf \(v \in T\), we have \(\rho (v) = \varDelta -1 \ge 2 = \zeta (v)\). Lemmas 6 and 9 show inductively that \(\rho (v) \ge \zeta (v)\) for every \(v \in T\). If \(k'\) is the size of the centered coloring output by Schäffer’s algorithm, then for the root \(v_0\) of T we have

$$\begin{aligned} 2^{k'} \le \zeta (v_0) \le \rho (v_0) \le \sum _{X \subseteq [k]} (\varDelta -1)^{|X|} = \varDelta ^k. \end{aligned}$$

Thus \(k' \le (\log _2 \varDelta ) \cdot k\). \(\square \)

7 Treedepth Upper Bounds on Interval Graphs

Because linear colorings are equivalent to centered colorings when restricted to paths, we turn our attention to the linear coloring numbers of “pathlike” graphs. We investigate a particular class of “pathlike” graphs in this section and prove a quadratic relationship between their centered and linear coloring numbers.

Definition 8

A graph G is an interval graph if there is an injective mapping f from V(G) to intervals on the real line such that \(uv\in E(G)\) iff f(u) and f(v) overlap.

We refer to the mapping f as the interval representation of G. Since the overlap between intervals f(u) and f(v) is independent of the interval representations of the other vertices, every subgraph of an interval graph is also an interval graph. The interval representation of G implies a natural “left-to-right” layout that gives it the “pathlike” qualities, which are manifested in restrictions on the length of induced cycles (chordal) and paths between vertex triples (AT-free).

Definition 9

A graph is chordal if it has no induced cycles of length \(\ge 4\).

Definition 10

Vertices uvw are an asteroidal triple (AT) if there exist uv-, vw-, and wu-paths \(P_{uv}\), \(P_{vw}\), and \(P_{wu}\), respectively, such that \(N[w]\cap P_{uv} = N[u]\cap P_{vw} = N[v]\cap P_{uv} = \emptyset \). A graph with no AT is called AT-free.

Proposition 4

[11] A graph G is an interval graph iff G is chordal and AT-free.

Intuitively, Definition 10 is a set of three vertices such that every pair is connected by a path that avoids the neighbors of the third. Roughly speaking, in the context of linear colorings, Proposition 4 indicates that if w is a center of a “long” uv-path P in G, any vertex \(w'\) such that \(\psi (w) = \psi (w')\) must have a neighbor on P. We devote the rest of this section to proving Theorem 5.

Theorem 5

There exists a polynomial time algorithm that takes as input an interval graph G and a linear coloring of G with size k and outputs a centered coloring of G with size at most \(k^2\).

Theorem 5 implies the following immediate improvement over Theorem 1 for interval graphs.

Corollary 1

For every interval graph G, it holds that \(\chi _{\text {cen}}(G) \le (\chi _{\text {lin}}(G))^2\).

Our algorithm makes extensive use of the following well-known property of maximal cliques in interval graphs.

Proposition 5

[11] If G is an interval graph, its maximal cliques can be linearly ordered in polynomial time such that for each vertex v, the cliques containing v appear consecutively.

In particular, we identify a prevailing path in G whose vertices “span” the maximal cliques and a prevailing subgraph that consists of the prevailing path as well as vertices in maximal cliques “between” consecutive vertices on the prevailing path. We will show that any linear coloring is a centered coloring when restricted to the prevailing subgraph and that after removing the prevailing subgraph, the remaining components each use fewer colors.

Let \(C_1, \dots C_m\) be an ordering of the maximal cliques of G that satisfies Proposition 5. We say vertex v is introduced in \(C_{i}\) if \(v\in C_i\) but \(v\notin C_{i-1}\), and denote this as \(I(v) = i\). Likewise, v is forgotten in \(C_j\) if \(v\in C_{j}\) but \(v\notin C_{j+1}\), and denote this as \(F(v) = j\). The procedure for constructing a prevailing subgraph and prevailing path is described in Algorithm 1. This algorithm selects the vertex v from the current maximal clique that is forgotten “last” and adds v to the prevailing path and \(C_{F(v)}\) to the prevailing subgraph. We prove in Lemma 10 that if PQ are a prevailing path and subgraph, the vertices in \(Q\backslash P\) can be inserted between vertices of P to form a Hamiltonian path of Q.

figure a

Lemma 10

Every prevailing subgraph has a Hamiltonian path.

Proof

Let PQ be the prevailing path and subgraph constructed in Algorithm 1. We prove by constructing the Hamiltonian path of Q.

Let \(M_j\) be the set of all \(u\in Q\backslash P\), for which j is the smallest integer for which \(u\in C_{F(v_j)}\). In other words \(M_j\) contains the vertices in \(C_{F(v_j)}\) that do not appear in \(C_{F(v_{j-1})}\). If \({\mathcal {M}} = \bigcup _{1\le j\le p} M_j\) then by construction \(P\cup {\mathcal {M}} = Q\). Moreover, for each \(u\in M_j\), \(M_j\cup \{v_j,v_{j+1}\} \subseteq N[u]\). For each \(M_j\), let \(\mu ^{1}_j,\mu ^2_j,\dots , \mu ^{|M_j|}_j\) be a ordering of \(M_j\) such that \(F(\mu ^i_j) \le F(\mu ^{i+1}_j)\). Then

$$\begin{aligned} v_1,\mu ^1_1,\dots , \mu ^{|M_1|}_1, v_2,\mu ^{1}_2,\dots ,\mu ^{|M_2|}_2, \dots , v_p,\mu ^{1}_p,\dots \mu ^{|M_p|}_p \end{aligned}$$

is a Hamiltonian path. \(\square \)

Although the fact that the prevailing subgraph Q has a Hamiltonian path implies Q has a center with respect to \(\psi \), we must ensure that the proper subgraphs of Q also have a center. In Lemma 11, we prove \(\psi |_Q\) is centered by showing every proper connected subgraph of Q also has a Hamiltonian path.

Lemma 11

If Q is a prevailing subgraph of an interval graph G and \(\psi \) a linear coloring of G, \(\psi |_{Q}\) is a centered coloring.

Proof

It suffices to show that every proper, connected induced subgraph of Q has a Hamiltonian path, since the existence of a Hamiltonian path implies the subgraph has a center. Assume \(H\subseteq Q\) has a Hamiltonian path. Let w be a center and \(w_p, w_s\) be its predecessor and successor in the Hamiltonian path. It is clear that the subpath from the start of the Hamiltonian path to \(w_p\) remains a path in \(H\backslash \{w\}\); this is also true for the subpath from \(w_s\) to the end. Therefore if \(H\backslash \{w\}\) is disconnected, there are two components and both have Hamiltonian paths.

Otherwise suppose \(H\backslash \{w\}\) is connected. Note that if \(P=\{v_1,\dots , v_p\}\) is the prevailing path generated by Algorithm 1, \(C_{F(v_j)}\cap C_{F(v_{j+2})} = \emptyset \) or else \(v_{j+2}\) would be forgotten later than \(v_{j+1}\) and would have been chosen to be \(v_{j+1}\) instead. Thus, there is some \(1\le \ell \le q\le p\) such that \(H\backslash \{w\}\subseteq C_{F(v_\ell )}\cup C_{F(v_{\ell +1})}\cup \dots \cup C_{F(v_q)}\) and since \(H\backslash \{w\}\) is connected, for each \(\ell \le j< q\) the intersection of cliques \(C_{F(v_j)}\) and \(C_{F(v_{j+1})}\) is non-empty. Consequently, the ordering of the vertices in the Hamiltonian path of Q must also define a Hamiltonian path of \(H\backslash \{w\}\). \(\square \)

Since any linear coloring \(\psi \) of the prevailing subgraph Q must also be a centered coloring, \({{\,\mathrm{td}\,}}(Q)\le |\psi |\). To get a bound on the treedepth of G, we focus on the relationship between Q and \(G\backslash Q\). In particular, we show that the components of \(G\backslash Q\) use fewer than \(|\psi |\) colors by proving that each such component has an apex in the prevailing path.

Lemma 12

Let PQ be a prevailing path and subgraph of an interval graph G. For each component X of \(G\backslash Q\), there is a vertex \(a\in P\) such that \(X\subseteq N(a)\).

Proof

For \(1\le j \le p\), let \({\mathcal {X}}_j\) be the set of components of \(G[\bigcup _{i = F(v_{j-1})+1}^{F(v_j)-1} C_i]\backslash Q\), defining \(F(v_0) = 0\). By this definition and the fact that \(v_j\) is a member of both \(C_{F(v_{j-1})}\) and \(C_{F(v_j)}\), \(v_j\) is a neighbor of all vertices in X for each \(X\in {\mathcal {X}}_j\). Thus it suffices to show that \(\bigcup _{j=1}^{p} {\mathcal {X}}_j\) are the components of \(G\backslash Q\).

Since \(V(Q) = \bigcup _{j=1}^{p} C_{F(v_j)}\), \(V(G) = V(Q) \cup V({\mathcal {X}}_1) \cup \dots \cup V({\mathcal {X}}_p)\) and \(V(Q)\cap \bigcup _{j=1}^{p} {\mathcal {X}}_j = \emptyset \). Hence, if \(X\in {\mathcal {X}}_j\) is not a component of \(G\backslash Q\), then there must be some component \(X'\in {\mathcal {X}}_i\) for which \(i\ne j\) and there exists \(u\in X\) and \(u'\in X'\) and \(uu'\in E(G)\). But \(C_{F(v_j)}\cup C_{F(v_{j+1})}\) has no common vertices with X and separates it from any vertices in \({\mathcal {X}}_i\). An analogous statement for \(X'\) is true as well, so no such edges \(uu'\) exist. Therefore we conclude that \(\bigcup _{1\le j\le p} {\mathcal {X}}_j\) are the components of \(G\backslash Q\) and the lemma is proven. \(\square \)

We can now establish a polynomial upper bound on the treedepth of interval graphs, proving Theorem 5.

Proof of Theorem 5

Let \({\mathcal {A}}\) be the algorithm that constructs a treedepth decomposition \({\mathcal {T}}\) of G by finding a prevailing subgraph Q (Algorithm 1), using \(\psi |_{Q}\) to create a treedepth decomposition of Q, and recursively constructing treedepth decompositions of \(G\backslash Q\). If \({\text {depth}}({\mathcal {T}}) \le k^2\) and \({\mathcal {A}}\) runs in polynomial time, then the canonical centered coloring of \({\mathcal {T}}\) is a centered coloring of G of size at most \(k^2\). We prove \({\mathcal {A}}\) satisfies these requirements by induction on \(k = |\psi |\). At \(k = 1\), the graph consists of isolated vertices and \({\mathcal {A}}\) trivially constructs a treedepth decomposition of G of depth 1 in polynomial time.

Assume \({\mathcal {A}}\) has the desired properties for linear colorings of size at most \(k-1\). Because the maximal cliques of an interval graph can be enumerated and ordered in polynomial time (Proposition 5), identifying Q via Algorithm 1 can be done in polynomial time. By Lemma 11, the canonical treedepth decomposition of Q has depth at most k. Since every component X of \(G\backslash Q\) has an apex a in P (Lemma 12), we can assume a is an ancestor in \({\mathcal {T}}\) of each vertex in X (Lemma 2). Because \(\psi \) is proper, \(\psi (a)\) does not appear in \(\psi |_X\) and since induced subgraphs of interval graphs are themselves interval graphs, \({\mathcal {A}}\) finds a treedepth decomposition of X whose depth is at most \((k-1)^2\). Thus \({\mathcal {T}}\) has depth \(k+(k-1)^2\le k^2\). The recursion only lasts \(k\le n\) steps, so \({\mathcal {A}}\) runs in polynomial time. \(\square \)

8 Hardness of Recognizing Linear Colorings

Based on the similarity in definition between linear and centered colorings, one might assume that computing them should be roughly equally difficult. Finding a centered coloring of a fixed size is NP-hard [1], but given a coloring of a graph, we can recognize whether it is centered in polynomial time by attempting to create the canonical treedepth decomposition; this procedure will identify a non-centered subgraph if the coloring is not centered. To the contrary, we will prove that Linear Coloring Recognition, the problem of recognizing whether a coloring is linear, is co-NP-complete. In order to prove the hardness of Linear Coloring Recognition, we first define a dual problem. The Non-centered Path problem takes a graph G and coloring \(\psi \) as input and decides whether G has a non-centered path P. We focus on proving the hardness of Non-centered Path because a certificate to that problem is easily definable: a path where every color appears at least twice.

Fig. 2
figure 2

The graph G and coloring \(\psi \) for \(\varPhi =(x_1\vee x_2 \vee \lnot x_3)\wedge (\lnot x_{1}\vee x_2 \vee x_3)\wedge (\lnot x_2)\)

Theorem 6

Non-centered Path is NP-complete.

Proof

A certificate to Non-centered Path can be verified in linear time by iterating over all vertices in the path and counting color occurrences. Thus, Non-centered Path is in NP.

We prove NP-hardness by reducing from CNF-SAT. Given a CNF-SAT formula \(\varPhi \) with variables \(x_1,\dots x_n\) and clauses \(C_1,\dots C_m\), we construct a graph G and coloring \(\psi \) that will have a non-centered path if and only if \(\varPhi \) is satisfiable. We assume that \(\varPhi \) satisfies the following properties:

  1. (1)

    Every variable appears at most once in each clause.

  2. (2)

    No clause contains both a variable and its negation.

  3. (3)

    Every variable appears as a positive literal and negative literal.

We can assume (1) since the disjunction operation is idempotent. Every clause for which (2) does not hold is satisfied by any truth assignment of the variables and thus can be removed without changing the satisfiability of \(\varPhi \). If variable \(x_i\) appears only positively then assigning \(x_i\) to be false does not cause any clauses to be satisfied. Therefore, it is sufficient to set \(x_i\) to true and only consider the clauses of \(\varPhi \) that do not contain \(x_i\); since the analogous statement is true when \(x_i\) does not appear positively, we can assume (3).

We refer to Figure 2 for an illustration of the construction. The variables of \(\varPhi \) are represented by a set of vertices \(U = \{u_0,\dots , u_n\}\). For each \(x_i\), we connect \(u_{i-1}\) and \(u_i\) with two paths \(P^{T}_i\) and \(P^F_i\); we will force the non-centered path to contain vertices from exactly one of \(P^T_i\) and \(P^F_i\), which will correspond to whether \(x_i\) was set to true or false. The path \(P^T_i\) contains one vertex for each \(C_j\) in which \(x_i\) appears positively while \(P^F_i\) contains one vertex for each \(C_j\) in which the negation of \(x_i\) appears. By assumption (2), we can uniquely label the vertex on \(P^T_i\cup P^F_i\) corresponding to clause \(C_j\) as \(w_{i,j}\) and the order of the vertices on \(P^T_i\) and \(P^F_i\) can be chosen arbitrarily. To complete the construction of G, we add path \(P_{0} = u'_1, u'_2, \dots , u'_n, w'_1, w'_2, \dots , w'_m\) such that \(w'_m\) is adjacent to \(u_0\) and all other vertices on \(P_0\) have no additional edges. Finally, we attach a pendant vertex \(u'_0\) to \(u_n\). Since each vertex \(w_{i,j}\) corresponds to a unique literal in \(\varPhi \) and \(|U| + |P_0| = 2n+m+1\), G has size linear in the size of \(\varPhi \).

To encode satisfaction of clauses, we color G with coloring \(\psi : V(G)\rightarrow \{0,\dots , n+m\}\) such that \(\psi (u_i) = \psi (u'_i) = i\) and \(\psi (w'_j) = \psi (w_{i,j}) = n+j\). In this way, we force any non-centered path to contain all colors and color \(j+n\) appears twice if and only if \(C_j\) is satisfied. An example can be found in Figure 2.

We now prove that \(\varPhi \) is satisfiable iff G contains a path Q with no center. Given a satisfying assignment of \(\varPhi \), let \(P^*_i\) be \(P^T_i\) if \(x_i\) is set to true and \(P^F_i\) if \(x_i\) is set to false. Then \(Q = P_0\cdot u_0 \cdot P^*_1\cdot u_1 \cdot \dots \cdot P^*_n \cdot u_n \cdot u'_0\), is a non-centered path since it contains all pairs \(u_i, u'_i\) and \(\bigcup _{1\le i\le n} P^*_i\) contains a vertex with the same color as each vertex in \(\bigcup _{1\le j\le m} w'_j\).

To prove the reverse direction suppose G contains a non-centered path Q. Let \(U' = \{u'_0,\dots , u'_n\}\). Since each vertex in \(U'\) shares a color with exactly one other vertex and that vertex is a member of U, Q contains a vertex from U iff Q contains a vertex from \(U'\). By our construction of \(P_0\) and assumptions about \(\varPhi \), no component of \(G\backslash (U\cup U')\) contains two vertices of the same color. Thus, Q must contain vertices from U, \(U'\), and \(G\backslash (U\cup U')\). For any \(0\le i\ne j\le n\), every \(u_ju'_j\) path contains \(u_i\) or \(u'_i\), which implies that \((U\cup U')\subset Q\) and Q is a \(u'_1u'_0\) path. In order for Q to be connected, \(P_0\subseteq Q\) and in order for it to be a path, exactly one of \(P^T_i\) and \(P^F_i\) (denote it \(P^*_i\)) is a subpath of Q for each \(1\le i\le n\). Since the colors in \(w'_1,\dots w'_m\) are unique, \(\bigcup _{1\le i\le n} P^*_i\) contains at least one vertex of each color on \([n+1, n+m]\), which corresponds to a selection of truth assignments to the variables of \(\varPhi \) such that every clause is satisfied. \(\square \)

Corollary 2

Linear Coloring Recognition is co-NP-complete.

The co-NP-hardness of recognizing linear colorings is compounded by three stronger hardness implications. First, the coloring \(\psi \) given in Theorem 6 has size \(m+n+1\), which means that unless the exponential time hypothesis [8] fails, there is no \(2^{o(k)}\) algorithm to recognize a linear coloring of size k. Second, the graph G constructed in the proof of Theorem 6 is outerplanar with pathwidth two, which implies that neither treewidth-style dynamic programming nor a Baker-style layering approach is likely to solve this problem efficiently. Finally, by subdividing each edge and coloring all subdivision vertices with a (single) new color, we obtain a bipartite graph with degeneracy two, proving hardness for each of those classes. Nonetheless, the fact that \(\chi _{\text {cen}}(G) = O(\log m + \log n)\) while \(|\psi | = m+n+1\) leaves open the possibility that Linear Coloring Recognition becomes easier for colorings of minimum size.

9 Conclusion

We have introduced p-linear and linear colorings as an alternative to p-centered and centered colorings for use in algorithms for classes of bounded expansion. The p-linear colorings are computable in polynomial time and require a constant number of colors in classes of bounded expansion, while inducing graphs of bounded treedepth for all small sets of colors, allowing direct substitution in existing algorithmic pipelines. A major direction for future work is to bring the upper bound on \(\chi _{\text {cen}}(G)\) in terms of \(\chi _{\text {lin}}(G)\) from \({\text {poly}}(\chi _{\text {lin}}(G))\) closer to the lower bound of \(2\chi _{\text {lin}}(G)\). In particular, it appears our current toolkit for analyzing linear colorings must be expanded in order to prove (or disprove) Conjecture 1. We also believe it is worth studying whether recognizing linear colorings can be done in polynomial time if we assume the coloring is of size \(\chi _{\text {lin}}(G)\). Finally, using p-linear colorings in practice will require an efficient method for translating a linear coloring into a treedepth decomposition. Although there exist general-purpose algorithms to find treedepth decompositions efficiently in graphs of bounded linear coloring number (e.g. [20]), a more specialized algorithm that avoids “heavy machinery” is likely necessary to be practically useful.