1 Introduction

The development of efficient algorithms for problems on big data problems is an urgent task. Constant-time algorithms are a powerful tool for this since they run by reading only a constant-sized part of each input. In other words, the running time is invariant regardless of the size of the input. Property testing is the most popular research framework for constant-time algorithms. In property testing, an algorithm determines whether a given instance satisfies some predetermined property or is far from satisfying that property with high probability by reading a constant-sized part of the input. This section presents some results mainly concerning property testing that have recently been obtained in the ABD Project.

2 Basic Terms and Definitions for Property Testing

This section gives some of the basic terms that are needed in order to explain our results. Property testing works on many different types of models, including graphs, functions, strings, grammars, and images. Although the details of the definitions differ slightly between the different models, since the basic ideas are the same for all of models, we present only the definitions for digraphs.

Let \(\mathbb {N} = \{ 0, 1, 2, \ldots \}\) be the set of natural numbers. In this chapter, we sometimes omit floor or ceiling functions. For example, if we write \(s = \sqrt{n}\) in a context where s must be an integer and n is not necessarily a square number, then \(\sqrt{n}\) should be taken to mean \(\lfloor \sqrt{n} \rfloor \) or \(\lceil \sqrt{n} \rceil \). This allows us to disregard integrality issues that make no real difference to any of our proofs.

2.1 Graphs and the Three Models for Property Testing

A directed graph or digraph G is defined as a pair of finite sets (VE), where V is a finite set of vertices and \(E \subseteq V \times V\) is a set of directed edges, or edges for short. The vertex set V and the edge set E of a graph G are sometimes written as \(V_G\) and \(E_G\), respectively. If the direction of each edge is ignored (i.e., \((u,v) = (v,u)\) for any \(u,v \in V\)), then the digraph is called a graph (or an undirected graph if we want to indicate undirectedness explicitly). Every graph can be represented as a digraph by using reflectivity on edges; in other words if \((u,v) \in E\), then \((v,u) \in E\) for every \(u,v \in V\). Thus, graphs can be regarded as special cases of digraphs. This section mainly treats (undirected) graphs. Digraphs are considered in Sect. 2.4. Many of the terms and symbols we define for graphs are also used for digraphs.

The order of a graph G is given by \(|V_G|\) and the size of a graph G is given by \(|E_G|\). A graph (resp., digraph) of order n is also called an n-graph (resp., n-digraph). The number of vertices adjacent to a vertex v in a graph G is denoted by \(\mathrm {deg}_G(v)\). If G is clear from the context, the subscript G may be omitted. In property testing, since an algorithm reads only a part of an instance (input), it gets information about the instances through oracles, which depend on how to the graphs are represented. There are three known models for treating graphs in property testing: the dense-graph model; the bounded-degree (graph) model; and the general-graph model.

In the dense-graph model, the edge oracle is used: If an algorithm queries whether \((u,v) \in E\) or not, the oracle answers correctly: the answer is 1 if \((u,v) \in E\) and 0 otherwise. This model basically treats dense (i.e., \( |E| = \Omega (n^2)\)) graphs. This is because if \(|E| = o(n^2)\), then the edge oracle answers “0” almost every time when n is large, making the queries useless.Footnote 1

In the bounded-degree model, there is a restriction such that the degree of every vertex is bounded by a predetermined integer \(d \ge 1\), that is, \(\mathrm {deg}(v) \le d\) (\(\forall v \in V\)). From this restriction, it follows that the number of edges in a graph is at most dn/2 (or dn for a digraph); in other words, the graph is sparse (note that d is a constant). This model assumes that for every vertex v, the vertices adjacent to v are ordered. This model uses the adjacent-vertex oracle: If an algorithm queries for the ith (\(1 \le i \le d\)) adjacent vertex of v by giving a pair (vi), the oracle answers the name (ID) of the vertex if exists and returns a predetermined special symbol such as \(\perp \) otherwise. A graph where the degree is bounded by d is also called a d-bounded-degree graph.

The general-graph model is a mixed model of the dense-graph model and the bounded-degree model. Although this model does not have any maximum degree-bound, there is a fixed upper bound d on the average degree. In many cases d is a constant and the graphs in this model are sparse. However, if \(d=\Theta (n)\), graphs in the model may be dense. This model allows all oracles that are allowed in the other two models in addition to the degree oracle: If an algorithm queries the degree of a vertex v, it replies with the correct answer \(\mathrm {deg}(v)\).

2.2 Properties, Distances, and Testers

The set of graphs considered in each model—that is, the dense-graph model, the bounded-degree model, or the general-graph model—is denoted by \(\Gamma \). The subset of \(\Gamma \) such that the order of the graph is n is denoted by \(\Gamma _n\). Hence \(\Gamma = \bigcup _{n \in \mathbb {N}} \Gamma _n\).

A property is defined as a (generally infinitely large) subset of graphs closed under isomorphism.Footnote 2 For example “planarity” is defined as the set of all planar graphs. For a property \(\mathcal {P}\), we define \(\mathcal {P}_n\) as \(\mathcal {P} \cap \Gamma _n\). Thus, clearly \(\mathcal {P} = \bigcup _{n \in \mathbb {N}} \mathcal {P}_n\).

Property testing is a relaxation of a decision problem. The object of a property testing is to distinguish with high probability whether a given instance satisfies some predetermined property or the instance is “far” from satisfying the property. This requires a mathematical definition of “far.”

Let G and \(G'\) both be n-graphs; \(G,G' \in \Gamma _n\). The distance between the two graphs is defined as the Hamming distance between them divided by the largest Hamming distance in the model (for normalization). Thus, the distance depends on the models (i.e., how the graphs are represented). We explain this by using the dense-graph model. Let \(\delta _{E_G}: V\times V \rightarrow \{0,1\}\) be the characteristic function on \(E_G\), that is, \(\delta _{E_G}(u,v)=1\) if \((u,v) \in E_G\) and 0 otherwise. The distance between G and \(G'\) is defined as follows: We denote by \(m(G,G')\) the number of edges that need to be deleted from and/or inserted into G in order to make \(G=G'\), i.e.

$$\begin{aligned} m(G,G') := | \{ (u,v) \in V \times V ~|~ \delta _{E_G}(u,v) \ne \delta _{E_{G'}}(u,v) \} | \end{aligned}$$

Using this, we define the distance between G and \(G'\) as followsFootnote 3:

$$\begin{aligned} \mathrm {dist}(G,G') := \frac{m(G,G')}{n^2}. \end{aligned}$$
(2.1)

Note that \(0 \le \mathrm {dist}(G,G') \le 1\) for every G and \(G'\). In the bounded-degree model and the general-graph model, the distance is defined as followsFootnote 4:

$$\begin{aligned} \mathrm {dist}(G,G') := \frac{m(G,G')}{dn}, \end{aligned}$$
(2.2)

where d is the upper bound on the maximum (resp., the average) vertex-degrees for the bounded-degree model (resp., the general-graph model).

By using the distance between graphs, the distance beetween a graph \(G \in \Gamma _n\) and a property \(\mathcal {P}\) is defined as follows:

$$\begin{aligned} \mathrm {dist}(G,\mathcal {P}) := \left\{ \begin{array}{ll} \min _{G' \in \mathcal {P}_n} \mathrm {dist}(G,G') &{} \quad \mathrm {if}\ \mathcal {P}_n \ne \emptyset ,\\ \infty &{} \quad \mathrm {otherwise.} \end{array}\right. \end{aligned}$$

This applies to all the models. For a real value \(0 \le \epsilon \le 1\), we say that G is \(\epsilon \)-far from \(G'\) (resp., \(\mathcal {P}\)) if \(\mathrm {dist}(G,G') > \epsilon \) (resp., \(\mathrm {dist}(G,\mathcal {P}) > \epsilon \)) and \(\epsilon \)-close otherwise.

A testing algorithm for a property \(\mathcal {P}\) is an algorithm that, given query access (by the oracles) to an instance G and given \(0 < \epsilon \le 1\), accepts every \(G \in \mathcal {P}\) with probability at least 2/3, and rejects every G that is \(\epsilon \)-far from \(\mathcal {P}\) with probability at least 2/3. If a testing algorithm accepts every \(G \in \mathcal {P}\) with probability 1, then the algorithm is called a one-sided-error. The number of queries made by an algorithm to the given oracle is called the query complexity of the algorithm. If the query complexity of a testing algorithm is bounded by a constant that is independent of n (but that may depend on \(\epsilon \) and d), then the algorithm is called a tester. A property is testableFootnote 5 if there is a tester for the property.

3 Important Known Results in Property Testing on Graphs

This section gives a very brief overview of important known results in property testing on graphs, particularly on the characterization and general properties of testability. See a recent review [11] or books [4, 8] for details.

3.1 Results for the Dense-Graph Model

Alon et al. [2] found a combinatorial characterization (necessary and sufficient condition) of testable properties for the dense-graph model. We first present the theorem without defining the terms used in it.

Theorem 2.1

For the dense-graph model, a graph property is testable if and only if it is regular-reducible.

This theorem utilize the extremely powerful monumental Szeméredi’s regularity lemma, which we now introduce briefly. For a pair of subsets of vertices \(A,B \subseteq V\) of a graph \(G=(V,E)\), \(\mathrm {den}(A,B) := \frac{|E(A,B)|}{|A||B|}\) is called the density of the pair. A family of subsets \(\mathcal{V} = \{ V_1, \ldots , V_k \}\) (\(V_i \subseteq V\), \(\forall i \in \{ 1, \ldots , k \}\)) is called a partition of V if \(V_i \cap V_j = \emptyset \) for all \(1 \le i < j \le k\) and \(V = V_1 \cup \cdots \cup V_k\). A partition \(\mathcal{V} = \{ V_1, \ldots , V_k \}\) of the vertex set of a graph is called an equipartition if \(|V_i|\) and \(|V_j|\) differ by no more than 1 for all \(1 \le i < j \le k\).

Definition 2.1

(\(\epsilon \)-regular) Let \(0< \epsilon \le 1\) be a real number and \(A,B \subseteq V\). A pair (AB) is called \(\epsilon \)-regular if \(|\mathrm {den}(A,B)-\mathrm {den}(X,Y)| \le \epsilon \) for any two subsets \(X \subseteq A\) and \(Y \subseteq B\) satisfying \(|X| \ge \epsilon |A|\) and \(|Y| \ge \epsilon |B|\). An equipartition \(\mathcal{V} = \{ V_1, \ldots , V_k \}\) of the vertex set of a graph is called \(\epsilon \)-regular if all but at most \(\epsilon k^2\) of the pairs \((V_i,V_j)\) (\(i,j \in \{ 1, \ldots , k \}\)) are \(\epsilon \)-regular.

Definition 2.2

(regularity-instance) A regularity-instance R is given by an error-parameter \(0< \epsilon \le 1\), an integer k, a set of \({k \atopwithdelims ()2}\) real numbers \(0 \le \eta _{i,j} \le 1\) indexed by \(1 \le i < j \le k\), and a set \(\overline{R}\) of pairs (ij) of size at most \(\epsilon k^2\). A graph is said to satisfy the regularity-instance if it has an equipartition \(\mathcal {V}=\{ V_1, \ldots , V_k \}\) such that for all \((i,j) \notin \overline{R}\) the pair \((V_i,V_j)\) is \(\epsilon \)-regular and satisfies \(|E(V_i,V_j)| = \eta _{i,j} |V_i| |V_j|\). The complexity of the regularity instance is \(\max (k,1/\epsilon )\).

Definition 2.3

(regular-reducible) A graph property \(\mathcal{P}\) is regular-reducible if for any \(\delta >0\) there exists \(r = r_\mathcal{P} (\delta )\) such that for any n there is a family \(\mathcal{R}\) of at most r regularity-instances each of complexity at most r, such that the following holds for every \(\epsilon >0\) and every n-graph G:

  1. 1.

    If \(G \in \mathcal{P}\), then for some \(R \in \mathcal{R}\), G is \(\delta \)-close to R.

  2. 2.

    If G is \(\epsilon \)-far from \(\mathcal{P}\), then for any \(R \in \mathcal{R}\), G is \((\epsilon - \delta )\)-far from R.

Theorem 2.2

(Szeméredi’s regularity lemma [2, 17]) For every pair of an integer t and a real number \(\epsilon >0\) there exists an integer \(T=T_{2} (t, \epsilon )\) such that any graph with \(n \ge T\) vertices has an \(\epsilon \)-regular equipartition of order k, where \(t \le k \le T\).

An intuitive explanation of the regularity lemma is that, for any \(\epsilon >0\), every graph \(G =(V,E)\) has an \(\epsilon \)-approximation of a constant-sized edge-weighted graph, where the edge weight approximates the density of the corresponding vertex pair. Intuitively, a property being regular-reducible means that it can be represented by a constant number of equipartitions based on the regularity lemma; in other words, the regularity lemma holds for testing the property. See [11] also for details.

Representative regular-reducible properties are monotone or hereditary properties, which are defined as follows.

Definition 2.4

A graph property \(\mathcal {P}\) is monotone if for every \(G \in \mathcal {P}\) and \(e \in E_G\), \(G-\{ e \} \in \mathcal {P}\). A graph property \(\mathcal {P}\) is hereditary if for every \(G \in \mathcal {P}\) and \(v \in V_G\), \(G-\{ v \} \in \mathcal {P}\).

Planarity, bipartiteness, k-colorability (for any \(k \in \mathbb {N}\)), H-freeness (for any graph H),Footnote 6 and disconnectedness are all monotone. The former four properties are also hereditary, but the last one, disconnectedness, is not.Footnote 7 A well-known non-monotone and hereditary property is perfectness: A graph is said to be perfect if for every induced subgraph, the chromatic number of the subgraph equals the clique number (\(=\) the order of the largest clique) of the subgraph. Every monotone or hereditary property is regular-reducible (see [2] for details).

We can say that Theorem 2.1 solves the problem of characterizing testable properties in the dense-graph model in a sense. However, the constants that appear in the algorithms obtained by Theorem 2.1 are incredibly (maybe more than astronomically) huge! Thus, developing faster (i.e., smaller constant complexity) algorithms remains an issue for each problem.

3.2 Results for the Bounded-Degree Model

Whereas the combinatorial characterization of testable properties as shown in Theorem 2.1 was obtained for the dense-graph model, no perfect results have been obtained for the bounded-degree model despite many attempts to achieve this goal. However, progress is being made in steps. We now have an important characterization of testable properties in the bounded-degree model called “hyperfiniteness.” We also found another characterization called “forbidden configurations,” for one-sided error testability, which is explained in Sect. 2.4.

Definition 2.5

Let \(\epsilon >0\), \(t>0\), and \(d>0\). Let \(G=(V,E)\) be a d-bounded-degree n-graph. If one can remove at most \(\epsilon dn\) edges from G such that each connected component of the resulting graph has at most t vertices, then G is called \((\epsilon ,t)\)-hyperfinite (with respect to degree bound d). For a function \(\rho : \mathbb {R}^+ \rightarrow \mathbb {R}^+\), if G is \((\epsilon , \rho (\epsilon ))\)-hyperfinite for every \(\epsilon >0\), then G is called \(\rho \)-hyperfinite. A set \(\mathcal{G}\) of d-degree-bounded graphs is called \(\rho \)-hyperfinite if \(\forall G \in \mathcal{G}\) is \(\rho \)-hyperfinite. \(\mathcal{G}\) is called hyperfinite if there is a function \(\rho \) such that \(\mathcal{G}\) is \(\rho \)-hyperfinite.

Newman and Sohler [15] presented the following theorem.

Theorem 2.3

In the bounded-degree model, every graph property is testable for any hyperfinite family of graphs.

While this is a sufficient condition, the following necessary condition related to hyperfiniteness was obtained by Fichtenberger et al. [5].

Definition 2.6

A subproperty of a property \(\mathcal {P}\) is a property that is a subset of \(\mathcal {P}\). A property is non-trivially testable if it is testable and there exists \(\epsilon >0\) such that there is an infinite number of graphs that are \(\epsilon \)-far from the property.

Theorem 2.4

Every testable property of bounded-degree graphs is either finite or contains an infinite hyperfinite subproperty. Furthermore, the complement of every non-trivially testable graph property contains an infinite hyperfinite subproperty.

These theorems show that there is a deep relation between hyperfiniteness and testability on bounded-degree graphs. We have found, however, no necessary and sufficient condition of graph testability even for a one-sided error. Recently we found necessary and sufficient conditions for one-sided-error testability on subclasses of properties of digraphsFootnote 8 [12]. This was obtained through the ABD Project, and is explained in Sect. 2.4.

3.3 Results for the General-Graph Model

There were previously no general classes of testable properties for the general-graph model. Through the ABD Project, a class that models complex networks called Hierarchical Scale Free (\(\mathcal{HSF}\)) was founded that is testable. We present an outline of the result below, and the details are available in [10, 11].

Definition 2.7

For positive real numbers \(c >0\) and \(\gamma >1\), a class of scale-free (multi)graphs \(\mathcal{SF}(c,\gamma )\) consists of (multi)graphs \(G=(V,E)\) for which the following condition holds: Let \(\nu _i\) be the number of vertices v of degree i. Then:

$$\begin{aligned} \nu _i \le c n i^{- \gamma }, ~~~ \forall i \in \{ 2, 3, \ldots , \}. \end{aligned}$$
(2.3)

A clique is a subgraph in which there exists an edge between every pair of vertices. For a nonnegative integer \(c \ge 0\), a c-isolated clique is a clique such that the number of outgoing edges (edges between the clique and the other vertices) is less than ck, where k is the number of vertices of the clique. A 1-isolated clique is sometimes simply called an isolated clique (see [9] for details). Let \(\mathcal{E}(G)\) be the graph obtained from G by contracting all isolated cliques.Footnote 9

Definition 2.8

For positive real numbers \(c >0\), \(\gamma > 1\) and a positive integer \(n_0 \ge 1\), a class of hierarchical scale-free (multi)graphs \(\mathcal{HSF} = \mathcal{HSF}(c,\gamma , n_0)\) consists of (multi)graphs \(G=(V,E)\) for which the following conditions hold:

  1. (i)

    \(G \in \mathcal{SF}(c,\gamma )\),

  2. (ii)

      Consider the infinite sequence of graphs \(G_0 =G\), \(G_1 = \mathcal{E}(G_{0})\), \(G_2 = \mathcal{E}(G_{1})\), \(\ldots \). If \(|V_{G_i}| \ge n_0\), then \(G_i\) includes at least one isolated clique \(Q \subseteq V\) with \(|Q| \ge 2\). (Note that if \(G_k\) has no such isolated clique, then \(G_k = G_{k+1} = G_{k+2} =\cdots \).)

For a graph G and a nonnegative integer \(d \ge 0\), G|d is the graph obtained by deleting all edges incident to each vertex v of degree more than d. Note that G|d is a d-bounded-degree graph. The following properties were obtained by [10].

Lemma 2.1

For every \(\mathcal{SF} = \mathcal{SF}(c,\gamma )\) with \(\gamma >2\), and every positive real number \(\epsilon >0\), there exists a constant \(\delta = \delta (\epsilon ,c,\gamma )\) such that for every graph \(G \in \mathcal{SF}\), \(G | \delta \) is \(\epsilon \)-close to G.

This lemma looks useful since it means that for any \(\epsilon >0\), any scale-free graph is \(\epsilon \)-close to a bounded-degree graph. This lemma is applied in the proof of the following theorem, which is the main theorem of [10].

Theorem 2.5

Every property is testable for \(\mathcal{HSF}(c,\gamma ,n_0)\) with \(\gamma >2\).

In the general-graph model, no other universal (constant-time) tester is known, but universal testing algorithms with \(\mathrm {polylog}(n)\)-time query complexity have been found for forests [14] and outerplanar graphs [3].

4 Characterization of Testability on Bounded-Degree Digraphs

4.1 Bounded-Degree Model of Digraphs

As mentioned previously, there is no complete characterization of testable graph properties in bounded-degree graphs even for one-sided-errors. Through the ABD project, however, we have obtained a characterization for one-sided-error testable properties of monotone and hereditary properties of bounded-degree digraphs [12], which we briefly explain in this section. The set of digraphs can be regarded to include the set of undirected graphs by introducing reflexivity, i.e., \(\forall u,v \in V\), if \((u,v) \in E\), then \((v,u) \in E\).

In this section, we consider the bounded-degree model on digraphs. For a digraph \(G=(V,E)\) and a vertex \(v \in V\) we denote by \(N^+_G(v)\) the set of outgoing neighbours of v, i.e., \(N^+_G(v) := \{ u \in V ~|~ (v,u) \in E \}\). Similarly, \(N^-_G(v) := \{ u \in V ~|~ (u,v) \in E \}\) and \(N_G(v) := N^+_G(v) \cup N^-_G(v)\). The out-degree of v is \(\mathrm {deg}^+_G (v) := |N^+_G(v)|\), and the in-degree of v is \(\mathrm {deg}^-_G (v) := |N^-_G(v)|\). The subscript G can be omitted if it is clear.

For a (di)graph \(G=(V,E)\) and \(F \subseteq E\), we denote by \(G - F\) the graph \((V, E-F)\). For a (di)graph \(G=(V,E)\) and \(W \subseteq V\), we denote by G[W] the subgraph of G induced by W (i.e., G[W] contains all edges in \(E_G\) whose both endpoints are in W). \(G[V - W]\) can be denoted by \(G - W\).

In the bounded-degree model for digraphs, there are two submodels: In one, only the out-degree is bounded; in the other, both the in-degree and out-degree are bounded.Footnote 10 The former case is represented by F(d) model and the latter one by FB(d) model.Footnote 11 The F(d) model is clealy wider than the FB(d) model. Moreover, every undirected d-bounded graph can be formulated by the FB(d) model by replacing each undirected edge by a pair of anti-parallel directed edges. That is, the FB(d) model (and thus the F(d) model as well) is regarded as including the undirected d-bounded degree model.

4.2 Monotone Properties and Hereditary Properties

This section extends the monotone and hereditary properties that were defined in Definition 2.4 to digraphs.

We first introduce the following notation for characterizing the testability of these properties. Let \(\mathcal{H}\) be a set of digraphs. We call \(\mathcal{H}\) an r-set if every member \(H \in \mathcal{H}\) has at most r vertices (i.e., H is an \(r'\)-digraph for some \(r' \le r\)). A digraph G is \(\mathcal{H}\)-free if for every \(H \in \mathcal{H}\), G contains no subgraph that is isomorphic to H. A digraph G is induced \(\mathcal{H}\)-free if for every \(H \in \mathcal{H}\), G contains no induced subgraph that is isomorphic to H. We denote by \(\mathcal {P}_\mathcal{H}\) (resp., \(\mathcal {P}^*_\mathcal{H}\)) the property that contains all digraphs that are \(\mathcal{H}\)-free (resp., induced \(\mathcal{H}\)-free). \(\mathcal {P}_{\mathcal{H}, n}\) (resp., \(\mathcal {P}^*_{\mathcal{H}, n}\)) is the subproperty of \(\mathcal {P}_\mathcal{H}\) that consists of all n-digraphs in \(\mathcal {P}_\mathcal{H}\) (resp., \(\mathcal {P}^*_\mathcal{H}\)). We can easily confirm that \(\mathcal {P}_\mathcal{H}\) is monotone and \(\mathcal {P}^*_\mathcal{H}\) is hereditary for any \(\mathcal{H}\).

Let \(H=(V,E)\) be a digraph. For a subset \(W \subseteq V\), if by disregarding the directions of the edges of H, W induces a connected component in the resulting undirected graph, then we say that H[W], which is the directed subgraph of H induced by W, is a component of H. A digraph H is rooted if every component \(H'\) of H has a vertex v such that for every \(u \in V_{H'}\) there exists a dipath (\(=\) directed path) from v to u.

4.3 Characterizations

By using these terms, the characterizations of testable monotone or hereditary properties for the F(d) model were given in [12].

Theorem 2.6

Let \(P = \bigcup _{n \in \mathbb {N}} \mathcal {P}_n\) be a monotone property in the F(d)-model. Then \(\mathcal {P}\) is testable if and only if there is a function \(r: (0,1) \rightarrow \mathbb {N}\) such that for any \(0< \epsilon <1\) and \(n \in \mathbb {N}\), there is an \(r(\epsilon )\)-set of rooted digraphs \(\mathcal{H}_n\) such that the property \(\mathcal {P}_{\mathcal{H}_n,n}\) satisfies the following two conditions:

 (a) \(\mathcal {P}_n \subseteq \mathcal {P}_{\mathcal{H}_n,n}\)

 (b) \(\mathcal {P}_{\mathcal{H}_n,n}\) is \(\epsilon /2\)-close to \(\mathcal {P}_n\).

Theorem 2.7

Let \(\mathcal {P}\) be a hereditary property in the F(d)-model. Then \(\mathcal {P}\) is testable if and only if there are functions \(r: (0,1) \rightarrow \mathbb {N}\) and \(N: (0,1) \rightarrow \mathbb {N}\) such that for any \(0< \epsilon <1\), there is an \(r(\epsilon )\)-set of rooted digraphs \(\mathcal{H}\) such that for every \(n \ge N(\epsilon )\), \(\mathcal {P}^*_{\mathcal{H},n}\) satisfies the following two conditions:

 (a) \(\mathcal {P}_n \subseteq \mathcal {P}^*_{\mathcal{H},n}\)

 (b) \(\mathcal {P}^*_{\mathcal{H},n}\) is \(\epsilon /2\)-close to \(\mathcal {P}_n\).

Condition (b) in both Theorems 2.6 and 2.7 is necessary, since there exists a monotone and hereditary property that is testable with a one-sided-error and has no \(\mathcal{H}_n\) such that \(|\mathcal{H}_n|\) is bounded by a constant (\(r(\epsilon )\)) and “\(\mathcal {P}_n = \mathcal {P}_{\mathcal{H}_n,n}\) or \(\mathcal {P}_n = \mathcal {P}^*_{\mathcal{H}_n,n}\)”: One of these properties is \(\mathcal {P}_{\mathcal {C}_{\sqrt{n}}} (= \mathcal {P}^*_{\mathcal {C}_{\sqrt{n}}})\) on the F(1)-model,Footnote 12 where \(\mathcal {C}_k\) is the set of directed cycles (or dicycles, for short) of length in [3, k], i.e., \(\mathcal {P}_{\mathcal {C}_{\sqrt{n}}}\) is the property of having no dicycle of length in \([3,\sqrt{n}]\). This property is clearly monotone and hereditary. To express \(\mathcal {P}_{\mathcal {C}_{\sqrt{n}}}\) by using a set \({\mathcal H}\) of forbidden subgraphs (or forbidden induced subgraphs), \({\mathcal H}\) must includes \(\mathcal {C}_{\sqrt{n}}\), and thus \(|{\mathcal H}|\) cannot be bounded by any constant. However, this property is testable with a one-sided-error as shown below.

Lemma 2.2

\(\mathcal {P}_{\mathcal {C}_{\sqrt{n}}}\) on the F(1)-model is one-sided-error testable with query complexity \(O(\epsilon ^{-2})\).

To prove this lemma, we will use the following lemma, which is often effective for estimating the query complexity of testers.

Lemma 2.3

For any real number x, the following inequality holds:

$$\begin{aligned} e^x \ge x+1. \end{aligned}$$
(2.4)

The proof of this lemma is trivial from the differentiation of \(e^x-(x+1)\), and is omitted here.

Proof of Lemma 2.2: If \(n \le 2/\epsilon \), then we can get the complete data of the graph in time \(2/\epsilon \). Thus it is enough to consider the case of \(n > 2/\epsilon \). We use the following algorithm for the tester:

Choose \(s=2/\epsilon \) vertices \(v_1\), \(\ldots \), \(v_s\) from V uniformly at random, and denote them by S. For each \(v_i \in S\), check whether there is a dicycle of length at most s that includes \(v_i\) by following each outgoing edge successively whenever it exists. (Note that in the F(1)-model, the outgoing edge of each vertex exists uniquely if it exists.) If a dicycle of length in [3, s] is found, then it is rejected; otherwise, it is accepted.

We show that the above algorithm is the desired one-sided-error tester. It is clearly a one-sided-error, since it never rejects without finding a short (i.e., length of at most s) dicycle. Thus, it is enough to show that the algorithm rejects with probability at least 2/3 if the input is \(\epsilon \)-far from \(\mathcal {P}_{\mathcal {C}_{\sqrt{n}}}\).

Assume that the input \(G=(V,E)\) is \(\epsilon \)-far from \(\mathcal {P}_{\mathcal {C}_{\sqrt{n}}}\), i.e., that G contains more than \(\epsilon n\) dicycles of length in \([3,\sqrt{n}]\). Let \(\mathcal {C}\) be the set of such dicycles. We divide \(\mathcal {C}\) into the following two sets:

\(\mathcal {C}_{\mathrm {short}} = \{ C \in \mathcal {C} ~|~ \text{ the } \text{ length } \text{ of } C \text{ is at most } s\text{. } \}\)

\(\mathcal {C}_{\mathrm {long}} = \{ C \in \mathcal {C} ~|~ \text{ the } \text{ length } \text{ of } C \text{ is more than } s (\text{ and at most }\sqrt{n}{} \text{). } \}\)

From \(|\mathcal {C}| > \epsilon n\), \(|\mathcal {C}_{\mathrm {short}}| > \epsilon n/2\) or \(|\mathcal {C}_{\mathrm {long}}| > \epsilon n/2\) holds.

First, we assume that \(|\mathcal {C}_{\mathrm {long}}| > \epsilon n/2\). Clearly no pair of dicycles in \(\mathcal {C}\) shares a common vertex, and thus more than \(\epsilon sn/2 = n\) vertices are included in the graph contradiction. Thus, \(|\mathcal {C}_{\mathrm {long}}| \le \epsilon n/2\).

From this, it follows that \(|\mathcal {C}_{\mathrm {short}}| > \epsilon n/2\). Since no pair of dicycles in \(\mathcal {C}\) shares a common vertex and each dicycle has at least three vertices, then the dicycles in \(\mathcal {C}_{\mathrm {short}}\) contain more than \(3 \epsilon n/2\) vertices. Let W be the set of such vertices. If the algorithm finds at least one vertex from W, then it will find a short dicycle that includes the vertex and rejects the input. From \(|W| > 3 \epsilon n/2\), it follows that the probability that a chosen vertex is not in W is less than \(1- 3 \epsilon /2\). Thus, the probability that all of s vertices chosen by the algorithm are not in W is less than

$$\begin{aligned} (1- 3 \epsilon /2)^s \le e^{-3 \epsilon s/2} = e^{-3} < \frac{1}{3}. \nonumber \end{aligned}$$

Note that the first inequality above uses the inequality (2.4). The probability that the algorithm finds at least one vertex from W is, therefore, more than 2/3. The query complexity of this tester is clearly \(O(\epsilon ^{-2})\). \(\Box \)

Since \(\mathcal {P}_{\mathcal {C}_{\sqrt{n}}}\) is both monotone and hereditary, Theorems 2.6 and 2.7 hold. If we apply Theorem 2.6, then \(\mathcal{H}_n = \mathcal {C}_{\min \{2/\epsilon , \sqrt{n}\}}\) for each n. If we apply Theorem 2.7, then \(N(\epsilon ) = 4/\epsilon ^2\) and \(\mathcal{H} = \mathcal {C}_{2/\epsilon }\). From this discussion, we observe that \(N(\epsilon )\) is essential in Theorem 2.7.

4.4 An Idea to Extend the Characterizations Beyond Monotone and Hereditary

We would like to extend Theorems 2.6 and 2.7 to general properties. We denote by \(\mathcal {P}_{\mathrm {deg}^+(d-1)}\) the property consisting of digraphs having no vertex with out-degree \(d-1\) on the F(d)-model. \(\mathcal {P}_{\mathrm {deg}(d-1)}\) is one-sided-error testable as shown below.

Let \(G=(V,E)\) be an input. The algorithm for \(\mathcal {P}_{\mathrm {deg}^+(d-1)}\) chooses \(2/\epsilon \) vertices from V uniformly at random and checks their out-degrees. If it finds a vertex of degree \(d-1\), then it is rejected; otherwise, it is accepted. This algorithm is a one-sided-error, since it never rejects if there is no vertex of out-degree \(d-1\). If G is \(\epsilon \)-far from \(\mathcal {P}_{\mathrm {deg}^+(d-1)}\), then there are more than \(\epsilon n\) vertices of out-degrees \(d-1\). Thus, the probability that there is no vertex of out-degree \(d-1\) in the selected \(2/\epsilon \) vertices by the algorithm is less than

$$\begin{aligned} (1-\epsilon )^{\frac{2}{\epsilon }} \le \left( e^{-\epsilon } \right) ^{\frac{2}{\epsilon }} =e^{-2} < \frac{1}{3}.\nonumber \end{aligned}$$

Note that this also uses the inequality (2.4).

Hence, the above algorithm is a one-sided-error tester for \(\mathcal {P}_{\mathrm {deg}^+(d-1)}\). However, expressing this property by using forbidden subgraphs or forbidden induced subgraphs like Theorems 2.6 or 2.7 is impossible.Footnote 13

To extend the idea of “forbidden something” to non-monotone and non-hereditary properties, we [12] introduced the idea of “configurations,” by generalizing subgraphs and induced subgraphs. A similar idea has also appeared in [16].

Definition 2.9

A configuration is a pair \(O=(H,L)\), where \(H=(W,F)\) is a digraph in the F(d)-model, \(L: W \rightarrow \{ \mathrm {developed}, \mathrm {frontier} \}\) is a function, and the out-degree of every frontier vertex is 0. The configuration is rooted if H is rooted.

Definition 2.10

Let \(O=(H=(W,F),L)\) and \(G =(V,E)\) be a configuration and a graph respectively in the F(d)-model. We say that G has an O-appearance if there is an injective mapping \(\phi : W \rightarrow V\) satisfying the condition that \(\forall v \in W\) with \(L(v) =\mathrm {developed}\), the following two conditions hold:

  1. (i)

    \(\forall u \in W\), \((v,u) \in F\) if and only if \((\phi (v), \phi (u) ) \in E\).

  2. (ii)

     If \(( \phi (v),x) \in E\), then \(\exists u \in W\), \(\phi (u)=x\).

We say that G is O-free if G has no O-appearance. For a set \(\mathcal {O}\) of configurations, we say that G is \(\mathcal {O}\)-free if \(\forall O \in \mathcal {O}\), G is O-free. \(\Box \)

As we have already stated, \(\mathcal {P}_{\mathrm {deg}^+(d-1)}\) cannot be defined by any set of forbidden subgraphs or induced subgraphs. However, it can be defined by using \(\mathcal {O}\)-freeness. That is, let \(O_{\mathrm {deg}^+(d-1)} = (H=(W,F),L)\) be a configuration such that \(W = \{ v_0, v_1, \ldots , v_{d-1} \}\), \(E = \{ (v_0, v_1), (v_0, v_2), \ldots , (v_0, v_{d-1}) \}\), \(L(v_0) =\mathrm {developed}\), and \(L(v_1) =L(v_2) = \cdots = L(v_{d-1}) =\mathrm {frontier}\). Then \(\mathcal {P}_{\mathrm {deg}(d-1)}\) is defined by the set of \(O_{\mathrm {deg}(d-1)}\)-free graphs.

The idea of configuration-free (or forbidden configurations) may work for characterizing general one-sided-error testable properties on the F(d)-model. See [12] for details.

5 Testable EXPTIME-Complete Games

This section presents results on the testability of combinatorial games, particularly the generalized chess, Shogi (Japanese chess), and Xiangqi (Chinese chess). Given any position on a \(\sqrt{n} \times \sqrt{n}\) board with O(n) pieces, the generalized chess, Shogi, and Xiangqi problems are the problems of determining the property that “the player who moves first has a winning strategy.” These problems are known or believed to be EXPTIME-complete [1, 6, 7]. In [13], we proposed that this property is testable for chess, Shogi, and Xiangqi. The Shogi tester and Xiangqi tester are one-sided-error testers, and surprisingly, the chess tester is a no-error tester. Many problems have been revealed to be testable, but most of such problems belong to class NP. We think that this is the first result on the constant-time testability of EXPTIME-complete problems. This section presents these results. We mainly focus on chess, followed by Shogi, but omit the explanation for Xiangqi since the method is similar to the one for Shogi. See [13] for details.

5.1 Definitions

We begin by focusing mainly on generalized chess. Generalized chess is played on a \(\sqrt{n} \times \sqrt{n}\) board with O(n) pieces, including two kings. White moves first and black plays after white. A position is defined by fixing each piece to a particular cell on the board. At any given position S, the problem is to determine whether white wins if both players play optimally. The basic rules are the same as those in the original chess and are omitted here.

In chess, there are six different types of pieces: king (K), queen (Q), bishop (B), knight (N), rook (R), and pawn (P). There are only two pieces of kings; one white and one black. For each of the other piece-types (i.e., bishop, knight, rook, and pawn), there exist at most cn pieces for both white and black, respectively, where c is a constant. Piece-numbers from 1 to cn are given to each white or black piece of each piece-type; in other words, each piece has its own piece ID \((k, o, \ell )\) comprising a piece-type \(k \in \{ \mathrm {K}, \mathrm {Q}, \mathrm {B}, \mathrm {N}, \mathrm {R}, \mathrm {P} \}\), an owner-color \(o \in \{ \mathrm {white}, \mathrm {black} \}\), and a piece-number \(\ell \in \{ 1, \ldots , cn \}\).

An algorithm can find the given position through the following oracles.

  • Piece oracle: Given a piece ID \((k, o, \ell )\), the piece oracle answers an ordered pair (ij) that provides the cell (ij), \(i,j \in \{ 0, 1, \ldots , \sqrt{n} \}\) where it lies. (i and j represent the column number and row number, respectively, and if \(i=j=0\), it denotes that the piece is not in the game (such a piece is called an unused piece). This oracle is expressed as \(q_1(k, o, \ell ) = (i,j)\).

  • Coordinate oracle: Given a coordinate (ij), \(i,j \in \{ 0, 1, \ldots , \sqrt{n} \}\), the coordinate oracle answers the piece ID \((k, o, \ell )\) of the piece that lies on the cell if one exists. If no piece lies on the cell, the oracle answers \(k = o= \ell =0\). This oracle is expressed as \(q_2 (i,j) = (k, o, \ell )\).

When we explicitly identify position S, we express the oracles as \(q_1 (k, o, \ell ; S)\) and \(q_2 (i,j; S)\), respectively. We introduce the assumption that all pieces can be arranged on the board simultaneously, and it thus follows that \(2 \times (5cn + 1) \le n\). For simplicity, we assume that

$$\begin{aligned} c \le 1/11. \end{aligned}$$
(2.5)

A position S is called a winner if white has a winning strategy (i.e., white will win if the players start from S and play optimally) and a loser otherwise. Note that a loser not only includes cases where white loses but also where the game ends in a draw.

A position is fixed by querying the piece oracle for every piece. The number of different queries for the piece oracle is at most n, and thus a position is fixed by the maximum of n data. From this, we define the distance between positions S and \(S'\) as

$$\begin{aligned} \mathrm {dist}(S,S') := \frac{| \{ (i,j) ~|~ q_2 (i,j;S) \ne q_2 (i,j;S') \} |}{n}. \end{aligned}$$
(2.6)

Clearly \(0 \le \mathrm {dist}(S,S') \le 1\).

Positions S and \(S'\) are called isomorphic if we can make S identical to \(S'\) by only changing their piece-numbers (neither changing the piece-type nor owner-color is allowed). A set of positions that is closed under isomorphism is called a property. The distance between a position S and a property \(\mathcal {P}\) is defined as follows:

$$\begin{aligned} \mathrm {dist}(S,\mathcal {P}) :=\min _{S' \in \mathcal {P}} \mathrm {dist}(S,S'). \end{aligned}$$
(2.7)

For a positive \(\epsilon > 0\), S is \(\epsilon \)-far from \(\mathcal {P}\) if \(\mathrm {dist}(S,\mathcal {P}) > \epsilon \); otherwise, it is \(\epsilon \)-close. Let \(\mathcal {W}\) be the set of winners. \(\mathcal {W}\) is clearly closed under isomorphism and thus \(\mathcal {W}\) is a property.

For generalized Shogi and Xiangqi, similar definitions are used. They can be easily deduced and are omitted here. See [13] for details.

5.2 Testers for Generalized Chess, Shogi, and Xiangqi

The following theorem was presented in [13]. Note that a no-error tester is a one-sided-error tester that always rejects every input that is \(\epsilon \)-far from the property; that is, it always accepts or rejects with no-error if the input is in the property or \(\epsilon \)-far from the property.

Theorem 2.8

There exists a no-error tester with query complexity \(O(\epsilon ^{-1})\) for the generalized chess problem, there exists a one-sided-error tester with query complexity \(O(\epsilon ^{-2})\) for the generalized Shogi problem, and there exists a one-sided-error tester with query complexity \(O(\epsilon ^{-1})\) for the generalized Xiangqi problem.

Proof of the chess part of Theorem 2.8 Let S be a given position. Let \(S'\) be the position made from S by changing the pieces in cells (ij), \(i \in \{1,2,3,4 \}\) and \(j \in \{ 1,2,3,4,5 \}\), as shown in Fig. 2.1.

Fig. 2.1
figure 1

The black king will be checkmated by white’s next move, as indicated by the arrow

The pieces that were in these cells in S are changed to be unused pieces, and the pieces that appear in these cells in \(S'\) are moved from other cells or unused pieces. In \(S'\), the white king is safe and the black king will be checkmated by white’s next move (moving the queen from (3, 2) to (2, 2)), meaning that \(S'\) is a winner. The distance between S and \(S'\) is at most \(20 + 8 = 28\). Thus, if \(n \ge 28/\epsilon \), then \(\mathrm {dist}(S, S') \le 28/n \le \epsilon \). Hence, S is \(\epsilon \)-close to \(\mathcal {W}\), and it is sufficient to accept it. If \(n < 28/\epsilon \), it is sufficient to read all of the information by calling the piece oracle for all pieces, which requires \(O(\epsilon ^{-1} )\) queries.

This algorithm always accepts a winner. Moreover, if a given position S is \(\epsilon \)-far from \(\mathcal {W}\), then \(n < 28/\epsilon \) and the algorithm knows the complete information for S. Therefore, this algorithm is no-error. \(\Box \)

The algorithms for the generalized Shogi and Xiangqi problems are a little more complicated. The reason is that in Shogi and Xiangqi there are fouls based on positions. A player who plays the fouls loses. In Shogi, the following fouls need to be considered in the generalized Shogi problem.

  • Nifu (double pawn): two or more unpromotedFootnote 14 pawns that belong to the same player must not be in the same column simultaneously.

  • Dead end: pawns, lances, and knightsFootnote 15 can never be moved or dropped onto cells from which a subsequent move cannot be made. Therefore, white (resp., black) unpromoted pawns and lances can never be in the first (resp., \(\sqrt{n}\)th) row, and white (resp., black) knights can never be in the first or second (resp., \(\sqrt{n}\)th or \((\sqrt{n}-1)\)th) rows.

In a given position S, if there is a white piece that plays a fault, then white cannot win,Footnote 16 and thus the position is not a winner. However, if the number of pieces related to fouls is small (e.g., smaller than \(\epsilon n/2\)), we can remove the fouls and make white win, i.e., the position is \(\epsilon \)-close to \(\mathcal {W}\). To detect this, we need to perform preprocessing and the tester may have error when the input is \(\epsilon \)-far from \(\mathcal {W}\). For Xiangqi, a similar discussion applies. See [13] for details.

6 Summary

In this chapter, we introduced basic terminology and important results for property testing, which is the most examined framework for constant- or sublinear-time algorithms. In particular we presented two of our resent esults: The first is the complete characterization of one-sided-error testable monotone or hereditary properties on bounded-out-degree digraphs, and the other one is the testers for the generalized chess, Shogi, and Xiangqi problems, which are all EXPTIME-complete.

The 21st century can be called the era of big data, and the larger big data becomes, the more we need sublinear- and constant-time algorithms. The importance of this area will continue to grow. The number of fields in which constant-algorithms are efficiently applied will increase, and new techniques will be found accordingly. We eagerly await these developments.