1 Introduction

A large amount of data must be processed when utilizing mobility technologies. For example, in car sharing, a large amount of data are generated, including information on rented vehicles, such as who is riding in which vehicle, when and where the vehicle travels, and the history of money received and paid. A database that stores such data must be fault-tolerant. A centralized database that manages all data in a single location results in massive confusion when a failure of that database occurs. Therefore, data should be managed in a distributed manner as much as possible; one way to do this is by using a distributed database.

Although many distributed database technologies exist, blockchain technology has been attracting attention in recent years [9]. Blockchain is primarily used to realize virtual currencies without a centralized administrator. However, the simple line-type blockchain used in Bitcoin and other applications is limited by the amount of data that can be stored; it cannot withstand the large amount of data that is typical for mobility applications.

A directed acyclic graph (DAG)-type blockchain has been proposed to increase the amount of data stored in the blockchain and improve scalability [8] (see Fig. 7.1). In a line-type blockchain, transactions of blocks of a chain other than the longest blockchain are discarded, which is inefficient. In a DAG-type blockchain, fewer transactions are discarded, which is expected to be more efficient.

Fig. 7.1
An illustration of the D A G-type blockchain depicts a network of multiple interlinked blocks arranged in a random pattern.

Example of a DAG-type blockchain

In blockchain technology, dealing with malicious participants is a challenge. As a blockchain is realized using P2P network technology, all participants in the network play the same role. Any participant can create and add blocks to the blockchain that are favorable to them. In a line-type blockchain, a mechanism called “proof-of-work” [9] makes adding malicious blocks difficult; however, in a DAG-type blockchain, malicious blocks must be detected differently. One method is to identify trusted blocks by solving some type of independent set problem on a DAG. However, the independent set problem is NP-hard and is difficult to solve quickly.

In this chapter, we consider a DAG-type blockchain as a graph structure and analyze it based on graph algorithm theory. Many participants in a P2P network are expected to connect their blocks to those near the tail of the DAG-type blockchain. Therefore, a DAG-type blockchain is expected to have a structure that is similar to that of a directed path. In undirected graphs, a characteristic called the pathwidth [11] has been studied, which indicates how much a graph is close to a path. Various algorithms can deal with undirected graphs with small pathwidths efficiently. Several similar metrics exist for directed graphs [3, 6, 10], but they cannot be applied to DAGs. This chapter introduces the concept of the DAG-pathwidth introduced in the paper [7], in which we designed an algorithm to efficiently solve some independent set problems on DAGs by performing DAG-path decomposition. Note that finding a DAG-path decomposition with a minimum DAG-pathwidth is NP-hard [7].

1.1 Related Work

The concept of the pathwidth was first proposed by Neil et al. [11]. Often, the treewidth is used to measure the width of undirected graphs (proposed by Neil et al. [12] and Rudolf et al. [5]), which represents the closeness of a graph to a tree. Many algorithms using treewidth have been proposed [2]. Calculating both pathwidths and treewidths is NP-complete [1].

Researchers have proposed various width measures for directed graphs, such as the directed pathwidth [10], directed treewidth [6], and DAG-width [3]. The reason for introducing a new graph parameter is that these existing widths are unsuitable for DAGs. This is the main difference between existing widths and our proposed width. Some studies have solved problems for a directed graph with a small width, for example, deciding the reachability of DAGs [4]. Sompolinsky and Zohar [13] formulated a problem for the DAG-type blockchain; they proposed a DAG-type protocol called PHANTOM. This is called a discord k-independent set problem.

2 Preliminaries

In this section, we introduce some definitions and notation. A directed graph is a pair \(D = (N, A)\), where N is a set of elements, called nodes, and \(A \subseteq N \times N\) is a pair of nodes, called arcs. For an arc \((u, v) \in A\), u is called the tail and v is called the head. For two nodes \(v_1, v_{a} \in N\), if there are nodes \(v_2,\ldots ,v_{a-1}\) such that \((v_i, v_{i+1}) \in A\) for \(i = 1,\ldots , a - 1\), then \((\{v_1,\ldots ,v_{a}\}, \{(v_i, v_{i+1}) \mid i=1,\ldots ,a-1 \})\) is a directed path, and we say that \(v_{a}\) is reachable from \(v_1\). We also write \(v_1 \rightsquigarrow _D v_{a}\). If \((v_{a}, v_1) \in A\) additionally holds, \((\{v_1,\ldots ,v_{a}\}, \{(v_i, v_{i+1}) \mid i=1,\ldots ,a-1 \}\cup \{(v_a, v_1)\})\) is a directed cycle. A directed acyclic graph (DAG) is a directed graph that has no directed cycle. For an arc subset \(A' \subseteq A\), if every two nodes uv in \(A'\) are reachable to each other (i.e., \(u \rightsquigarrow _D v\) and \(v \rightsquigarrow _D u\)), \(A'\) is called a strongly connected component. Every strongly connected component of a DAG consists of exactly one node (otherwise, the DAG has a directed cycle).

For a node subset \(N' \subseteq N\), the induced subgraph of \(N'\) is a graph whose node set is \(N'\) and whose arc set consists of the arcs such that both the head and tail are in \(N'\). We denote by \(D[N']\) the induced subgraph of \(N'\). That is, \(D[N'] = (N', \{(u, v) \in A \mid u \in N', v \in N' \})\).

3 Blockchain

This section provides the minimum necessary modeling of the blockchain. This is a greatly simplified version of the blockchain used in Bitcoin and other virtual currencies. First, we describe a blockchain whose shape looks like a chain, used in Bitcoin and other currencies, followed by a blockchain whose shape looks like a DAG. The former is called a chain-type blockchain (called line-type in the introduction) and the latter a DAG-type blockchain.

3.1 Chain-Type Blockchain

A blockchain operates as a distributed database that holds data generated by multiple participants in a consistent manner. There are multiple participants on the network that generate data such as money transfer transactions. The generated data is called a piece. A piece can be considered as a string of arbitrary length on \(\{0,1\}\), but we will not be concerned with the contents of the piece here. There are several miners on the network, apart from the participants. A participant who generates the data may also play the role of a minor. Participants broadcast the pieces they wish to record in the blockchain on the network, i.e., they send data to the miners on the network. When a minor receives a piece from a participant, the minor temporarily stores it in the minor’s own memory, which is called a pool.

A minor attempts to create a block. A block is defined by a triple (hPc). The first element h is a natural number as described below. P is a set of pieces. The minor takes pieces from the pool when creating the block and stores them in the block as P. The third element c, called nonce, is a natural number at least 0 and less than \(U_{\textrm{N}}\), whose value must be appropriately determined by the block creator in the manner described below, where \(U_{\textrm{N}}\) is a fixed value determined by the protocol, e.g., \(U_{\textrm{N}}= 2^{32}\) in the Bitcoin blockchain.

Before we describe in detail how to create a block, we introduce the notion that for two blocks \(B_1 = (h_1, P_1, c_1)\) and \(B_2 = (h_2, P_2, c_2)\), \(B_2\) refers to \(B_1\). Consider the (cryptographic) hash function \(h^{*}(s): {\{0, 1\}}^{*} \longrightarrow \{0, 1,\ldots , U_{\textrm{H}}- 1\}\) whose domain is a string on \(\{0, 1\}\) and whose value range is any natural number at least 0 and less than \(U_{\textrm{H}}\), where \({\{0, 1\}}^{*}\) is the set of all strings on \({\{0, 1\}}\) and \(U_{\textrm{H}}\) is a fixed value determined by the protocol, e.g., \(U_{\textrm{H}}= 2^{256}\) in the Bitcoin blockchain. The hash function \(h^{*}(s)\) is a function defined by the protocol. For example, in the Bitcoin blockchain, a function based on SHA-256 is used. Consider obtaining the hash function value of the block’s data \(h_1, P_1, c_1\). \(P_1\) is a set of strings (pieces), and each element is concatenated using delimiters. The values \(h_1\) and \(c_1\) are natural numbers and are properly encoded in the string. Finally, \(h_1\), \(P_1\), and \(c_1\) are concatenated using delimiters. We omit the specific encoding method. By making the string for the input to the hash function \(h^{*}\), a natural number at least 0 and less than \(U_{\textrm{H}}\) is obtained as output. The output value is written as \(h^{*}(B_1)\) or \(h^{*}((h_1, P_1, c_1))\) with symbol abuse. We say that \(h_2 = h^{*}((h_1, P_1, c_1))\) and further that \(B_2\) refers to \(B_1\) if \(h_2 < T\) is satisfied for T that is defined later. For a block \(B_i = (h_i, P_i, c_i)\), \(i = 1,2,\ldots \), a chain in a chain-type blockchain is a sequence of blocks \(B_1,B_2,\ldots \) such that \(B_{i+1}\) refers to \(B_i\). We assume that \(h_1 = 0\) and \(c_1 = 0\). We call \(B_1\) the genesis block.

A minor creates a block and sends it by broadcast to another minor. A conscientious minor will make every effort to ensure that the block reaches the entire minor as much as possible. However, the arrival of the block may be delayed due to network congestion or other reasons. The set of blocks recognized by a minor m (i.e., generated by itself or received from other minors) is denoted by \(\mathcal {C}_m\). Blocks are data, not objects, and data are copied, not moved. That is, when a minor m sends a block B to a minor \(m'\), B is contained in both \(\mathcal {C}_m\) and \(\mathcal {C}_{m'}\).

Block creation is performed as follows. A minor m obtains the longest chain from \(\mathcal {C}_m\). That is, we consider the block sequence \(B_1,\ldots ,B_{\ell } \in \mathcal {C}_m\) with the maximum length \(\ell \) such that \(B_{i + 1}\) refers to \(B_{i}\) for \(i = 1,2,\ldots ,\ell - 1\) and \(B_1\) is the genesis block. The minor m attempts to create a block \(B_{\ell + 1} = (h_{\ell + 1}, P_{\ell + 1}, c_{\ell + 1})\). Let \(h_{\ell + 1} = h^{*}(B_{\ell })\). The set \(P_{\ell + 1}\) is created by taking several pieces from the pool of m. The value \(c_{\ell + 1}\) must be determined so that \(h^{*}((h_{\ell + 1}, P_{\ell + 1}, c_{\ell + 1})) < T\). Here, it is not computationally easy to determine \(c_{\ell + 1}\) so that the value of \(h^{*}((h_{\ell + 1}, P_{\ell + 1}, c_{\ell + 1}))\) is less than T. (The hash function \(h^{*}\) must be designed so.) We determine the value of \(c_{\ell + 1}\) by repeatedly and randomly choosing \(c_{\ell + 1}\) and calculating \(h^{*}((h_{\ell + 1}, P_{\ell + 1}, c_{\ell + 1}))\) until it is less than T. If a minor succeeds in finding such \(c_{\ell + 1}\), i.e., generating \(B_{\ell +1}\) that refers to \(B_{\ell }\), the minor m removes all pieces of \(P_{\ell + 1}\) from the pool, adds \(B_{\ell +1}\) to \(\mathcal {C}_m\), and broadcasts \(B_{\ell +1}\). Another minor \(m'\) that receives \(B_{\ell +1}\) adds \(B_{\ell +1}\) to \(\mathcal {C}_{m'}\). The minors m and \(m'\) attempt to create the next block in a similar process.

Miners are given an incentive to generate blocks. In the Bitcoin blockchain, bitcoins are awarded to a minor who creates a block B (that information is written to B). Thus, miners compete to create a block. The process of generating blocks is called mining. The value of T mentioned above is called the target. In Bitcoin, T is adjusted so that the average time interval for one of all miners to successfully create a block (to find a nonce that satisfies the condition) is ten minutes. The target T at the creation of the block \(B_{\ell +1}\) following the chain \(B_1,\ldots ,B_{\ell }\) depends on the information contained in \(B_1,\ldots ,B_{\ell }\).

If many miners are conscientious, many miners are expected to have the same chain \(B_1,\ldots ,B_{\ell }\). However, blocks near the end of the chain may not have arrived due to network congestion. There is an incentive to create blocks because there is an incentive to maintain the blockchain network. Therefore, many miners can be expected to be conscientious. The data stored in the blockchain is interpreted as \(P_1,\ldots ,P_{\ell }\) (this is not a mathematical definition).

When creating a new block \(B_{\ell +1} = (h_{\ell +1}, P_{\ell +1}, c_{\ell +1})\), a minor must check whether each piece in \(P_{\ell +1}\) is consistent with \(P_1,\ldots ,P_{\ell }\), and the minor does not make an inconsistent piece belong to \(P_{\ell +1}\). A block \(B_{\ell +1}\) whose \(P_{\ell +1}\) is inconsistent with \(P_1,\ldots ,P_{\ell }\) will be ignored by the other miners. Therefore, the miner correctly tries to create \(B_{\ell +1}\) in order to gain incentives. This mechanism ensures the consistency of data in the blockchain. Pieces in a block that are out of the longest chain are considered invalid. Note that \(P_1,\ldots ,P_{\ell }\) will never be fixed, as longer chains may be created later. In practice, however, for a sufficiently large \(\ell \), overwriting all of \(B_1,\ldots ,B_{\ell }\) would require the creation of a chain longer than \(\ell \) and would require a large amount of computing power to compute the hash function. Miners with the ability to create chains longer than \(\ell \) would have less incentive to overwrite all of \(B_1,\ldots ,B_{\ell }\) because they would be incentivized to create chains behind \(B_{\ell }\) according to the legitimate rules.

Two blocks may be generated simultaneously by different miners. Suppose that two blocks, \(B_{\ell +1}\) and \(B'_{\ell +1}\), are generated behind \(B_{\ell }\). Many miners receive both \(B_{\ell +1}\) and \(B'_{\ell +1}\), but arbitrarily choose one of them (usually the block that arrived first) and attempt to create the block following the one they chose. If another miner successfully creates and broadcasts a block \(B_{\ell +2}\) behind \(B_{\ell +1}\), for example, \(B_1,\ldots ,B_{\ell },B_{\ell +1},B_{\ell +2}\) is longer than \(B_1,\ldots ,B_{\ell },B'_{\ell +1}\). Then, most miners try to create a block following \(B_{\ell +2}\). Even if a temporary chain split (called a fork) occurs, this mechanism often converges the fork. (Of course, it is not guaranteed that forks will converge, and the continued simultaneous creation of blocks will create chaos in the chain.)

3.2 DAG-Type Blockchain

Chain-type blockchains have the disadvantage that blocks that are out of the longest chain are discarded. In order to increase the amount of data stored, DAG-type blockchains were considered [8]. A DAG-type blockchain is a type of blockchain in which the directed graph composed of block references is a DAG rather than a directed path. We allow a block to refer to multiple blocks. We extend the definition of a block to \(B = (H, P, c)\), where P is a set of pieces, c is a nonce, and H is a sequence of integer values at least 0 and less than \(U_{\textrm{H}}\). For blocks \(B^1,\ldots ,B^{a}\), we say that a block B refers to blocks \(B^1,\ldots ,B^{a}\) if \(h^i = h^{*}(B^i)\), \(i = 1,\ldots ,a\) and \(H = (h^1,\ldots ,h^a)\).

For simplicity, let \(B_{\textrm{gen}} = (\emptyset , \emptyset , 0)\) denote the genesis block in a DAG-type blockchain, and assume that all miners hold \(B_{\textrm{gen}}\). The concept of a chain in a chain-type blockchain corresponds to a connected DAG containing the genesis block in a DAG-type blockchain. For two blocks \(B,B' \in \mathcal {C}_m\) recognized by a minor m, if there exist blocks \(B_1,B_2,\ldots ,B_{a}\), where \(B_1 = B\), \(B_a = B'\), and \(B_{i+1}\) refers to \(B_{i}\) for \(i = 1,\ldots ,a-1\), we say that B is reachable from \(B'\). Strictly speaking, B is said to be reachable from \(B'\) if \(B_{i} = (H_{i}, P_{i}, c_{i})\) and for \(i = 1,\ldots ,a-1\), the value \(h^{*}(B_{i})\) is contained in the sequence \(H_{i+1}\). Consider the set of blocks in \(\mathcal {C}_m\) consisting of all blocks reachable to \(B_{\textrm{gen}}\). We write it as \(\mathcal {B}(\mathcal {C}_m)\). We also write \(D = (N, A)\) for the DAG composed of \(\mathcal {B}(\mathcal {C}_m)\) (as a graph structure). The elements of N are called nodes, and N has a one-to-one correspondence with \(\mathcal {B}(\mathcal {C}_m)\). For an arc \((u, v) \in A\), let \(B_{u}\) and \(B_{v}\) denote the blocks corresponding to u and v, respectively. Then, \(B_{u}\) refers to \(B_{v}\). D has no directed cycle because when \(B_{u}\) refers to \(B_{v}\), the time when \(B_{v}\) is generated is before the time when \(B_{u}\) is generated. From the way D is created, there is exactly one node whose outgoing degree is 0, which corresponds to \(B_{\textrm{gen}}\). We write the node as \(v_{\textrm{gen}}\).

3.3 PHANTOM Protocol

When a block B refers to a block \(B'\), the creator of the block B believes that the piece held in the block \(B'\) is correct (consistent with the other pieces). Similarly, when a block B is reachable to a block \(B''\), the creator of the block B believes that the piece held in the block \(B''\) is correct. Thus, receiving references from many blocks would increase the reliability of the block. If many conscientious miners properly select and refer to blocks behind the DAG, the DAG will become closer and closer to the chain, and blocks closer to the beginning of the chain will accumulate trust. This makes the entire DAG-type blockchain more resistant to tampering and the system more stable.

However, accepting every block in the DAG as a legitimate block causes the following problem. A malicious minor \(m_{\textrm{e}}\) creates a block B containing a piece beneficial to him/her, but does not broadcast B and keeps it secret from others. He/she generates a new piece beneficial to himself/herself, creates a block, and makes it refer to B. In this way, \(m_{\textrm{e}}\) generates a block containing a piece beneficial to himself/herself one after another as a successor to B. When enough blocks have been created, \(m_{\textrm{e}}\) broadcasts B and the subsequent blocks simultaneously. In the mechanism that recognizes all blocks in the DAG as legitimate blocks, blocks created in this way are recognized as legitimate blocks. As another example, in an extreme case, \(B_{\textrm{gen}}\) could be referred to from all blocks created. Blocks that have not earned the trust of other miners are treated as legitimate blocks without restriction.

The PHANTOM protocol [13] of the DAG-type blockchain solves this problem in the following way. Before this description, we define some terms. In the following, consider a DAG \(D = (N, A)\) corresponding to \(\mathcal {B}(\mathcal {C}_m)\) of some minor m. For a node \(v \in N\) of \(D = (N, A)\), let \(\textsf{past}(v, D)\) be the set of nodes reachable from v on D. That is, \(\textsf{past}(v, D) = \{ u \in N \mid D = (N, A), v \rightsquigarrow _D u \}\). Also, let \(\textsf{future}(v, D)\) be the set of nodes reachable to v on D. That is \(\textsf{future}(v, D) = \{ u \in N \mid D = (N, A), u \rightsquigarrow _D v \}\). Let \(\textsf{anticone}(v, D)\) be the set of nodes in N that are contained in neither \(\textsf{past}(v, D)\) nor \(\textsf{future}(v, D)\). That is, \(\textsf{anticone}(v, D) = N \setminus \textsf{past}(v, D) \setminus \textsf{future}(v, D)\).

The PHANTOM protocol limits the number of blocks that have not earned the trust from other miners. Fix the natural number parameter k. For any subset \(N' \subseteq N\) of nodes, \(N'\) is called a discord k-independent set if for any node \(v \in N'\), \(\textsf{discord}(v, N', D) := | N' \cap \textsf{anticone}(v, D)| \le k\). That is, for any node v in \(N'\), we restrict the number of nodes in \(N'\) that cannot be reached from v and cannot reach v to at most k. We consider the block corresponding to each node of \(N'\) as a valid block (for m), and consider the pieces contained in a valid block as valid. Blocks not contained in \(N'\) are considered invalid blocks, and pieces contained in invalid blocks are ignored. It is expected that this mechanism will prevent miners from creating blocks that are not referred to as blocks generated by other miners.

The PHANTOM protocol regards the discord k-independent set of D with the maximum number of nodes as the set of legitimate blocks. We need an efficient algorithm that computes the maximum discord k-independent set of D, which is discussed in Sect. 7.5.

4 Pathwidth

4.1 Pathwidth and Directed Pathwidth

Many fast methods exist for solving optimization problems. Although integer programming is a promising method, the problem we wish to solve must be solved by all minors. Therefore, fast commercial solvers for integer programming are not available. Since the problem we want to solve this time needs to be solved rigorously, approximation algorithms, heuristics, and genetic algorithms are not suitable. Here, we use parameterization algorithm techniques. Well-known techniques include the notions of pathwidth and treewidth for undirected graphs, as described in the introduction section. These are natural numbers that represent the closeness of a graph to a path or a tree. The smaller the value is, the closer the graph is to the path or tree. When pathwidth and treewidth are small, it is possible to decompose the graph into smaller pieces that do not interfere with each other, and faster algorithm design based on decomposition is expected. Since a newly generated block refers to blocks at the tail of the DAG, the shape of the DAG is expected to be close to a straight line. (Since the shape is not expected to be close to a tree, treewidth is not used.) The concept of directed pathwidth exists for directed graphs [10], but as will be explained later, it is not applicable to DAGs. We will introduce a parameter that can be applied to DAGs.

We describe path decomposition and pathwidth of an undirected graph. The path decomposition of an undirected graph \(G = (V, E)\) is a sequence \((X_1,X_2,\ldots )\) of subsets of V satisfying the following conditions, where a subset \(X_i\) of V is called a bag:

  1. (P1)

    \(\bigcup _{i=1,2,\ldots }X_i = V\).

  2. (P2)

    For every edge \((u, v) \in E\), there is a bag \(X_i\) such that \(u, v \in X_i\).

  3. (P3)

    For all ijk with \(i < j < k\), \(X_i \cap X_k \subseteq X_j\) holds.

(P1) is the condition that any element of V is contained in at least one of the bags \(X_1,X_2,\ldots \). (P2) is the condition that there exists a bag containing both endpoints of any edge in E. (P3) means that for any node \(v \in V\), the subscripts of the bags containing v are continuous as \(X_i,X_{i+1},\ldots ,X_{k-1},X_{k}\).

The pathwidth for a path decomposition \((X_1,X_2,\ldots )\) of G is defined by \(\max _{i}\{|X_i|-1\}\). Among the path decompositions of G, consider the path decomposition with the smallest pathwidth, and define the pathwidth of G as the pathwidth of the path decomposition. The \(-1\) appearing in the definition of pathwidth is for adjusting the pathwidth of a path to be 1.

An important property of path decomposition is the following property. For any \(i = 2,3,\ldots \), consider the following. Let \(X_{\textrm{L}}\) be the set of nodes in the bag before \(X_i\) that are not included in \(X_i\), and let \(X_{\textrm{R}}\) be the set of nodes in the bag after \(X_i\) that are not included in \(X_i\). That is, \(X_{\textrm{L}} = \bigcup _{j < i} X_j \setminus X_i\) and \(X_{\textrm{R}} = \bigcup _{j > i} X_j \setminus X_i\). Given any nodes \(x \in X_{\textrm{L}}\) and \(y \in X_{\textrm{R}}\), the edge (xy) is not contained in E. That is, \(X_{\textrm{L}}\) and \(X_{\textrm{R}}\) are separated by \(X_i\). This is a property derived from (P1), (P2), and (P3).

Based on path decomposition, various problems can be solved efficiently. As an example, consider solving the maximum independent set problem. An independent set \(V' \subseteq V\) of an undirected graph \(G = (V, E)\) is a subset of V satisfying the condition that for any edge \((u, v) \in E\), at least one of u and v is not contained in \(V'\). The maximum independent set problem is the problem of finding an independent set of G with the maximum number of elements.

Suppose that a path decomposition \((X_1,X_2,\ldots )\) of G is obtained, and let w be the pathwidth of the path decomposition. For any \(i = 1,2,3,\ldots \), consider the following. Let B be an arbitrary node set such that \(B \subseteq X_i\). On the subgraph \(G[\bigcup _{j \le i} X_j]\) of G induced by \(\bigcup _{j \le i} X_j\), consider an independent set with the maximum number of nodes that contains all the nodes in B, and let \(p_i(B)\) be the number of its elements (if such an independent set does not exist, \(p_i(B) = - \infty \)). For any node \(v \in X_{i+1} \setminus X_{i}\), there is no edge of E connecting v and any node in \(X_{\textrm{L}} (= \bigcup _{j < i} X_j \setminus X_i)\), by the above property. Therefore, we only need to look at the nodes of B (contained in \(X_i\)) to determine whether to add v as an element of the independent set. Although we omit the detail, from the value of \(p_i(B)\) for any node set \(B \subseteq X_i\), we can calculate the value of \(p_{i+1}(B')\) for any node set \(B' \subseteq X_{i+1}\), and by performing this calculation for \(i = 1,2,\ldots \) in that order, the maximum independent set problem can be solved. The information to be stored in each step is the value of \(p_i(B)\) for any node set \(B \subseteq X_i\), the number of which is at most \(2^{w+1}\). This algorithm runs in \(2^{O(w)} \textrm{poly}(|V|)\) time, where \(\textrm{poly}(|V|)\) is a polynomial in |V|. If w can be regarded as a sufficiently small constant, this can be regarded as the input polynomial time.

We describe path decomposition and pathwidth for directed graphs. For a directed path decomposition of a directed graph \(D = (N, A)\), condition (P2) is changed to the following condition (P2\('\)).

(P2\('\)):

For any \((u, v) \in A\), there are ij with \(i \le j\) such that \(u \in X_i\) and \(v \in X_j\).

Note that we replace V in (P1) with N for directed path decomposition.

Directed pathwidth for directed graphs can be defined in the same way as pathwidth for undirected graphs.

The directed pathwidth of the DAG is always zero. This can be shown as follows. The nodes of a DAG \(D = (N, A)\) can be ordered in a topological order. Here, a topological order is an order of the nodes of \(v_1,\ldots ,v_{|N|}\) on N such that \(i < j\) for any \((v_i, v_j) \in A\). If we construct the bag \(X_{i} = \{v_{|N|-i+1}\}\) for \(i = 1,\ldots ,|N|\), we can confirm that \((X_1,\ldots ,X_{|N|})\) satisfies (P1), (P2\('\)), and (P3). The directed pathwidth of this directed path decomposition is always 0. Therefore, the directed pathwidth provides no information on the DAG.

4.2 DAG-Pathwidth

We explain DAG-path-decomposition (DAG-PD) and DAG-pathwidth [7]. The DAG-PD is defined for a general directed graph \(D = (N, A)\). The DAG-PD of \(D = (N, A)\) is a sequence \((X_1,X_2,\ldots )\) of subsets of N satisfying the following conditions ((D1) is equivalent to (P1) and (D3) to (P3)).

  1. (D1)

    \(\bigcup _{i=1,2,\ldots }X_i = N\).

  2. (D2)

    For every arc \((u, v) \in A\), if \(u \in X_1\), \(v \in X_1\) holds. Otherwise (\(u \notin X_1\)), there is an integer \(i \ge 2\) such that \(u \in X_i\), \(u \notin X_{i-1}\) and \(v \in X_i\) hold.

  3. (D3)

    For all ijk with \(i < j < k\), \(X_i \cap X_k \subseteq X_j\) holds.

The condition (D2) is explained below. When \(u \notin X_1\), from condition (D1), u must be contained in one of the bags. Therefore, there is always an integer i such that \(u \in X_i\) and \(u \notin X_{i-1}\). Also, from condition (D3), there exists only one such i. (If there exists i, \(i'\) (\(i < i'\)) such that \(u \in X_i\), \(u \notin X_{i-1}\), \(u \in X_{i'}\), \(u \notin X_{i'-1}\), then from (D3) for any j, \(i < j < i'\), \(X_i \cap X_{i'} \subseteq X_j\), but \(u \in X_i \cap X_{i'}\) and \(u \notin X_{i'-1}\), a contradiction.) For such i, (D2) requires that the endpoint v of the arc (uv) is contained in \(X_i\). We write such i, i.e., \(\min \{i \mid v \in X_i\}\), as \(\textsf{fb}_X(v)\) or simply \(\textsf{fb}(v)\). \(\textsf{fb}\) stands for the First Bag. Intuitively, condition (D2) implies that every endpoint of an arc whose tail is u must be in \(X_{\textsf{fb}(u)}\). This direction condition differs from condition (P2) for undirected graphs.

We have the following lemma on \(\textsf{fb}\).

Lemma 7.1

([7]) Let \(D = (N, A)\) be a directed graph and \(X = (X_1,X_2,\ldots )\) be a DAG-PD of D. Then, for any nodes \(u, v \in N\), if there is a directed path from u to v, \(\textsf{fb}(v) \le \textsf{fb}(u)\) holds.

The following lemma can be obtained from Lemma 7.1.

Lemma 7.2

([7]) Let \(D = (N, A)\) be a directed graph and \(X = (X_1,X_2,\ldots )\) be a DAG-PD of D. If two nodes \(u, v \in N\) are included in the same strongly connected component, then \(\textsf{fb}(u) = \textsf{fb}(v)\) holds.

This lemma holds because uv are in the same strongly connected component, and thus there exist directed paths from u to v and from v to u.

4.3 Nice DAG-PD

Undirected path decomposition has the concept of a nice path decomposition that is useful in dynamic programming. A similar concept can be defined for DAG-pathwidth.

Let \(D = (N, A)\) be a directed graph and \(X = (X_1,X_2,\ldots )\) be a DAG-PD of D. X is called a nice DAG-PD if the following conditions hold:

  1. 1.

    \(X_1 = \emptyset .\)

  2. 2.

    For \(i = 1,2,\ldots \), exactly one of the following holds:

    1. (2-a)

      (Introduce) There is a strongly connected component S such that \(S \cap X_i = \emptyset \) and \(X_{i+1} = X_{i} \cup S\).

    2. (2-b)

      (Forget) There is a node \(v \in X_i\) such that \(X_{i+1} = X_i \setminus \{v\}\).

A nice DAG-PD starts with \(X_1 = \emptyset \). For \(i = 1,2,\ldots \), we consider constructing \(X_{i+1}\) from \(X_{i}\) by adding nodes to \(X_{i}\) or deleting nodes from \(X_{i}\). If we try to add a node \(v \in V\) to \(X_{i}\), all the nodes in the strongly connected component having v must be added to \(X_{i}\) at the same time because the \(\textsf{fb}\) values of all the nodes are equal by Lemma 7.2. This operation is called “Introduce.” Therefore, (2-a) is defined as above. On the other hand, the bag obtained by deleting any single node from \(X_{i}\) does not violate any conditions of DAG-PD. This operation is called “Forget.” Thus, (2-b) is defined as above. In this way, changes in consecutive two bags before and after are minimized to facilitate the design of dynamic programming.

For any DAG-PD, we describe how to change the DAG-PD to a nice DAG-PD without changing its DAG-pathwidth. Let X be the original DAG-PD. Let \(X_i\) be the i-th bag of X. First, if the head of X is not \(\emptyset \), add \(\emptyset \) to the beginning of X. In the following, the bag after the modification is also written as X. Next, when there are two bags such that \(X_i = X_{i+1}\), \(X_{i+1}\) is removed from X. Finally, repeat the following operations (i), (ii), and (iii) for X as many times as possible.

  1. (i)

    If there exist bags \(X_{i}, X_{i+1}\) such that \(X_i \not \subseteq X_{i+1}\) and \(X_{i+1} \not \subseteq X_{i}\), then insert \(X_{i} \cap X_{i+1}\) between \(X_{i}\) and \(X_{i+1}\). (This results in \(X_{i} \supseteq X_{i} \cap X_{i+1} \subseteq X_{i+1}\).)

  2. (ii)

    If there exist bags \(X_{i}, X_{i+1}\) such that \(X_{i+1} \subseteq X_{i}\) and \(|X_{i}| - |X_{i+1}| \ge 2\), insert \(X_{i} \setminus \{v\}\) for some \(v \in X_{i} \setminus X_{i+1}\) between \(X_i\) and \(X_{i+1}\).

  3. (iii)

    If there exist bags \(X_{i}, X_{i+1}\) such that \(X_{i} \subseteq X_{i+1}\) holds, and \(X_{i+1} \setminus X_{i}\) is not a strongly connected component of D, insert \(X_{i+1} \setminus A\) between \(X_i\) and \(X_{i+1}\), where A is a strongly connected component of \(D[X_{i+1} \setminus X_{i}]\) (subgraph induced by \(X_{i+1} \setminus X_{i}\)) such that there is no directed path from the nodes in \((X_{i+1} \setminus X_{i} \setminus A)\) to the nodes in A.

It is easy to check that the sequence of bags constructed in this way satisfies the conditions for a DAG-PD and a nice DAG-PD. Also, since there are a finite number of nodes in N, the procedure always halts (we omit the proof).

The problem of finding a DAG-PD with minimum DAG-pathwidth is proved to be NP-hard [7]. The proof is based on a reduction from the minimum cut linear arrangement problem, which is NP-hard.

5 Discord k-Independent Set Problem

In this section, we describe an efficient algorithm for solving the maximum discord k-independent set problem based on DAG-PD [7]. Assume that we are given a DAG \(D = (N, A)\) and a nice DAG-PD \(X = (X_1,X_2,\ldots ,X_r)\) of D as the input of the problem, where r is a positive integer. We design a dynamic programming-based algorithm using X. Recall that when some strongly connected component S is introduced in the DAG-PD, \(X_{i + 1} = X_i \cup S\) holds. Since D is a DAG, every strongly connected component of D consists of exactly one node. Therefore, in the nice DAG-PD, if \(X_{i+1} = X_{i} \cup S\) for some i and some strongly connected component S, \(S = \{v\}\) for some \(v \in N\). In this case, we call \(X_{i+1}\) an introduce bag. If \(X_{i+1} = X_{i} \setminus \{w\}\) for some \(w \in N\), we call \(X_{i+1}\) a forget bag. In those cases, we say that v (resp. w) is a node of the bag \(X_{i+1}\).

The process of our algorithm is considered as the construction of a discord k-independent set by deciding whether we add each node to the set or not one by one in the order of the nodes of the introduce bags of the DAG-PD. Let us consider an example shown in Fig. 7.2. An example of an introduce bag \(X_{i+1} = X_{i} \cup \{v\}\) is shown in the figure. We are constructing a discord k-independent set, denoted by Y. Now, we add the node v to Y. Let \(Y' = Y \cup \{v\}\) and \(Z_{i'} = \bigcup _{j = 1,\ldots ,i'} X_{j}\). We need to decide whether the discord condition \(\textsf{discord}(u, Y', D[Z_{i+1}]) = | Y' \cap \textsf{anticone}(u, D[Z_{i+1}])| \le k\) holds for all \(u \in X_{i+1}\) after adding v to Y. (Recall that \(D[Z_{i+1}]\) is the DAG induced by \(Z_{i+1}\).)

Fig. 7.2
A network diagram of an example of an introduce bag. A node in Y branches into three nodes, further branching into four nodes labeled V 1, V 2, V 3, V 4, V 5, V prime, V 7, V 8, and V 9. The last column of nodes denotes X i + 1, X i, and T v. Nodes V 8 and V 9 combine to form V.

Example of an introduce bag. The DAG shown in the figure is \(D[Z_{i+1}] = D[\bigcup _{j = 1,\ldots ,i+1} X_{j}]\)

First, consider \(\textsf{discord}\) of the newly introduced node v. There is no arc whose tail belongs to \(Z_{i+1}\) and whose head is v because for every arc \((w, v) \in A\) for \(w \in N\), w first appears in \(X_{j}\) for \(j \ge i+2\) because of conditions (D2) and (D3) (note that v first appears in \(X_{i+1}\)). Therefore, there is no node \(\bar{w} \in Z_{i+1}\) such that \(\bar{w} \rightsquigarrow _{D[Z_{i+1}]} v\). Thus, we obtain \(\textsf{discord}(v, Y', D[Z_{i+1}]) = | Y' \cap \textsf{anticone}(v, D[Z_{i+1}])|= |\{w \in Y' | v\,\, /\!\!\!\!\!\!\rightsquigarrow _{D[Z_{i+1}]} w \}|\); i.e., the number of nodes in \(Y'\) that are not reachable from v on \(D[Z_{i+1}]\). In the example, \(v_1,v_2,v_3,v_5\) and \(v_6\) are in \(Y'\) and are not reachable from v on \(D[Z_{i+1}]\), which implies \(\textsf{discord}(v, Y', D[Z_{i+1}]) = 5\).

Next, consider \(\textsf{discord}\) of the nodes in \(Z_{i}\). Let us observe how \(\textsf{discord}\) of a node in \(Z_{i}\) increases by adding v to Y. As we mentioned above, there is no \(\bar{w} \in Z_{i+1}\) such that \(\bar{w} \rightsquigarrow _{D[Z_{i+1}]} v\). Therefore, if a node, say \(v'\), in \(Z_{i}\) is reachable from v on \(D[Z_{i+1}]\), \(\textsf{discord}\) of \(v'\) does not increase (see \(v'\) in the figure). On the other hand, if a node (e.g., \(v_1\)) is not reachable from v on \(D[Z_{i+1}]\), \(\textsf{discord}\) of the node increases by one. In the example, \(\textsf{discord}\)’s of \(v_1,v_2,v_3,v_5\) and \(v_6\) increase by one and those of the other nodes in Y do not increase.

In our dynamic programming-based algorithm, we do not maintain \(\textsf{discord}\)’s of all the nodes. Instead, we maintain the following information on the nodes in \(X_{i}\) when we have processed \(X_1,\ldots ,X_{i}\). First, for each subset \(S \subseteq X_{i}\), we consider the set of nodes in Y that are reachable from all the nodes in S and reachable from none of the nodes in \(X_{i} \setminus S\), which we denote by R(S). We define P(S) by

$$\begin{aligned} P(S) = |R(S)| = |\{ v \in Y | \forall u \in S, u \rightsquigarrow _{D[Z_{i}]} v, \forall w \in X_i \setminus S, w \,\, /\!\!\!\!\!\!\rightsquigarrow _{D[Z_{i}]} v \}|. \end{aligned}$$
(7.1)

We maintain P(S) for each subset \(S \subseteq X_{i}\). In the example, \(P(\{v_7\}) = | \{v_3, v_5\} | = 2\) because \(v_3\) and \(v_5\) are reachable from \(v_7\) and not reachable from \(v_6, v_8, v_9\ (\in X_{i} \setminus \{v_7\})\). Also, \(P(\{v_6, v_7\}) = | \{v_1\} | = 1\). By using P(S) for all \(S \subseteq X_{i}\), we can compute \(\textsf{discord}\) of v when we add v to Y as follows:

$$ \textsf{discord}(v, Y', D[Z_{i+1}]) = \left| \sum _{S: S \subseteq X_i, S \cap T_v = \emptyset } P(S) \right| , $$

where \(T_v\) is the set of nodes in \(X_{i}\) that are reachable from v. This is because \(\textsf{discord}\) of v is the number of nodes that are not reachable from v, and because any node in R(S) is reachable from v if \(S \cap T_v \ne \emptyset \). For example, \(\textsf{discord}(v, Y', D[Z_{i+1}]) = |P(\{v_6, v_7\})| + |P(\{v_6\})| + |P(\{v_7\})| + |P(\emptyset )| = |\{v_1\}| + |\{v_2, v_6\}| + |\{v_3, v_5\}| +\) \( 0 = 5\).

Secondly, to compute \(\textsf{discord}\) of the nodes in Y, for each subset \(S \subseteq X_{i}\), we maintain the maximum value of \(\textsf{discord}\) among the nodes in R(S), which we denote by Q(S). We have

$$\begin{aligned} Q(S) = \max _{u \in R(S)} \{\textsf{discord}(u, Y, D[Z_{i}])\}. \end{aligned}$$
(7.2)

In the example, \(Q(\{v_7\}) = \max \{\textsf{discord}(v_3, Y, D[Z_{i}]), \textsf{discord}(v_5, Y, D[Z_{i}])\} = \max \{8, 8\} = 8\) and \(Q(\{v_6, v_7\}) = \max \{\textsf{discord}(v_1, Y, D[Z_{i}])\} = 7\). By using Q(S) for all \(S \subseteq X_{i}\), we can decide whether all \(\textsf{discord}\)s of the nodes \(v'\) in Y are at most k after we add v to Y by checking if \(Q(S') + 1 \le k\) for all subsets \(S' \subseteq X_{i}\) such that \(S' \cap T_v = \emptyset \), and by checking if \(Q(S'') \le k\) for all subsets \(S'' \subseteq X_{i}\) such that \(S'' \cap T_v \ne \emptyset \) (the latter always holds if we have conducted the check for the previous bags \(X_1,\ldots ,X_{i-1}\)). This is because \(\textsf{discord}\) of a node in Y increases by one if the node is not reachable from v. \(S'' \cap T_v \ne \emptyset \) (resp. \(S'' \cap T_v = \emptyset \)) means that any node in \(R(S'')\) is (resp. is not) reachable from v.

A state of our dynamic programming-based algorithm is defined as a triple (iPQ). The first element i of the state means that the bags \(X_{1},\ldots ,X_{i}\) are processed. P and Q, described above, are maps from a subset of \(X_{i}\) to an integer. The update of a state (iPQ) is given as follows: First, consider the case \(X_{i+1}\) is an introduce bag; that is, \(X_{i+1} = X_i \cup \{v\}\) for some \(v \in N\). In this case, we divide this case into two subcases: (i) v is added to a discord k-independent set. (ii) v is not added to a discord k-independent set. For (i), let \((i + 1, P', Q')\) be the updated state. Then, \(P'\) and \(Q'\) are given as follows:

$$\begin{aligned} P'(S) = {\left\{ \begin{array}{ll} 1 &{} (S=\{v\}),\\ P(S) &{} (S\cap T_v=\emptyset ),\\ P(S \setminus \{v\}) &{} (S \cap T_v \supsetneq \{v\}), \\ 0 &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$
(7.3)
$$\begin{aligned} Q'(S) = {\left\{ \begin{array}{ll} \sum _{S': S'\cap T_v=\emptyset } P(S') &{} (S=\{v\}),\\ Q(S)+1 &{} (S\cap T_v=\emptyset \text { and } P(S) > 0),\\ 0 &{} (S\cap T_v=\emptyset \text { and } P(S) = 0),\\ Q(S \setminus \{v\}) &{} (S \cap T_v \supsetneq \{v\}), \\ 0 &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$
(7.4)

for a subset S of \(X_{i}\). The detail is described in the paper [7]. For (ii), let \((i + 1, P'', Q'')\) be the updated state. Then, \(P''\) and \(Q''\) are given as follows [7]:

$$\begin{aligned} P''(S) = {\left\{ \begin{array}{ll} 0 &{} (S=\{v\}),\\ P(S) &{} (S\cap T_v=\emptyset ),\\ P(S \setminus \{v\}) &{} (S \cap T_v \supsetneq \{v\}), \\ 0 &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$
(7.5)
$$\begin{aligned} Q''(S) = {\left\{ \begin{array}{ll} 0 &{} (S=\{v\}),\\ Q(S) &{} (S\cap T_v=\emptyset ),\\ Q(S \setminus \{v\}) &{} (S \cap T_v \supsetneq \{v\}), \\ 0 &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$
(7.6)

for a subset S of \(X_{i}\).

Secondly, we consider (iii) \(X_{i+1}\) is a forget bag; that is, \(X_{i+1} = X_{i} \setminus \{v\}\) for some \(v \in N\). In this case, let \((i + 1, P''', Q''')\) be the updated state. Then, we obtain the following [7]:

$$\begin{aligned} P'''(S) = & {} P(S) + P(S \cup \{v\}), \end{aligned}$$
(7.7)
$$\begin{aligned} Q'''(S) = & {} \max \{Q(S), Q(S \cup \{v\})\}, \end{aligned}$$
(7.8)

for a subset S of \(X_{i}\).

Our algorithm maintains a table \(\texttt{table}[i][P][Q]\), where (iPQ) is a state. The value of \(\texttt{table}[i][P][Q]\) is the maximum number |Y| of discord k-independent sets Y such that P, Q, and Y satisfy Eqs. (7.1) and (7.2). Let \(P_0\) and \(Q_0\) be maps such that \(P_0(\emptyset ) = Q_0(\emptyset ) = 0\) (note that \(X_1 = \emptyset \) in the nice DAG-PD, and thus P and Q for \(X_1\) discussed above take only \(\emptyset \)). The algorithm is given as follows [7]:

  1. 1.

    \(\texttt{table}[1][P_0][Q_0] \leftarrow 0\) and \(\texttt{table}[i][P][Q] \leftarrow -\infty \) for all \(i = 1,\ldots ,r\) and all possible P and Q except for \(i = 1\), \(P = P_0\) and \(Q = Q_0\).

  2. 2.

    For \(i = 1, 2, \ldots , r - 1\), if \(X_{i+1}\) is an introduce bag, do the following:

    1. (i)

      For all possible P and Q, let \(P'\) and \(Q'\) be given by Eqs. (7.3) and (7.4). Then, if \(Q'(S) \le k\) for all \(S \subseteq X_{i+1}\), then \(\texttt{table}[i+1][P'][Q'] \leftarrow \max \{\texttt{table}[i{+}1][P'][Q'],\) \({\texttt{table}[i]}[P][Q] {+} 1\}\). Otherwise, \(\texttt{table}[i+1]\) \([P'][Q'] \leftarrow -\infty \).

    2. (ii)

      For all possible P and Q, let \(P''\) and \(Q''\) be given by Eqs. (7.5) and (7.6). Then, \(\texttt{table}[i+1][P''][Q''] \leftarrow \max \{\texttt{table}[i+1][P''][Q''],\) \( \texttt{table}[i][P][Q]\}\).

    Otherwise (if \(X_{i+1}\) is a forget bag), do the following:

    1. (iii)

      For all possible P and Q, let \(P'''\) and \(Q'''\) be given by Eqs. (7.7) and (7.8). Then, \(\texttt{table}[i+1][P'''][Q'''] \leftarrow \max \{\texttt{table}[i+1][P'''][Q'''],\) \( \texttt{table}[i][P][Q]\}\).

  3. 3.

    Output the maximum value of \(\texttt{table}[r][P][Q]\) among all possible P and Q.

We can prove the following theorem using the above algorithm.

Theorem 7.1

([7]) Given a DAG \(D = (N, A)\) and a nice DAG-PD of D with DAG-pathwidth W, we can solve the maximum discord k-independent set problem in \(O((k+1)^{2^{W+1}}(k+2)^{2^{W+1}}2^WW^2|N|)\) time.

If we can regard k and the DAG-pathwidth W of D as (small) constants, our algorithm can solve the maximum discord k-independent set problem in linear time in |N|.

6 Conclusion

In this chapter, we provided an overview of the blockchain and introduced pathwidth and directed pathwidth. We introduced DAG-path-decomposition and DAG-pathwidth and described the characterization of DAGs by DAG-pathwidth. We used DAG-path-decomposition to design an algorithm that runs faster when the DAG-pathwidth of the input DAG is small. The concept of DAG-path-decomposition is applicable not only to blockchain DAGs, but also to many DAG structures that appear in the real world, and is a subject for future work. Just as we defined DAG-pathwidth that extends (undirected) pathwidth, we can probably define DAG-treewidth that extends (undirected) treewidth, which is also future work.