1 Introduction

Great breakthroughs in science have a way of perforating traditional disciplinary boundaries. Ned Seeman’s development of structural DNA nanotechnology [1, 2] is a case in point. Over the past forty years, researchers from most disciplines of science and engineering have been drawn into the thrilling creativity of this new field. This is most obviously motivated by opportunities for applying methods from many disciplines to the development of DNA nanotechnology. Often, however, benefits of such applications also accrue back to the disciplines from which they come, as the novel aspects of DNA technology force improvements of these methods and our understanding of them.

In this note, we discuss an example of this phenomenon, the use of a method from distributed computing theory [3] in the theory of DNA tile self-assembly [4,5,6], perhaps the earliest abstract theory to emerge from DNA nanotechnology. Both of these subjects deal with systems that evolve non-deterministically over time and are so complex that it is not feasible to explore all the individual trajectories along which they might evolve. In the case of distributed computing, the example of interest here consists of a large number of processors that asynchronously and non-deterministically pass messages to one another. In the case of tile self-assembly, the example of interest consists of a large two-dimensional structure that grows via the asynchronous, non-deterministic adsorption of various types of tiles (abstractions of DNA double-crossover molecules [7]) onto various locations along the boundary of the structure. In both examples, the population (number of processors or number of tiles) is assumed to be very large, and the number of trajectories of the system is far larger than its population.

The method that we discuss, which we call reasoning as if, has not been precisely formalized, and we will not succeed in formalizing it here. Our goal is simply to discuss two striking examples of the method, one in distributed computing and one in tile self-assembly, and to suggest directions for future research on the method.

The rest of this note is organized as follows. In Sect. 2 we review the snapshot algorithm of Chandy and Lamport [8], emphasizing the reasoning-as-if nature of this algorithm. In Sect. 3 we review the local determinism method of Soloveichik and Winfree [9], again emphasizing the reasoning-as-if nature of the method. In Sect. 4 we sketch research directions that we hope will lead to the further development of as-if reasoning as a useful method.

2 The Snapshot Algorithm

The snapshot algorithm was designed by Chandy and Lamport [8], and a colorful description of it, which we follow here, was provided by Dijkstra [10]. In it the algorithm assembles a “snapshot” of a possible but unlikely global system state in order to reason about the properties of the system. What makes this algorithm relevant to our focus here is that the system snapshot that is assembled is almost certainly not real. It is an imaginary state that probably never occurred in the system’s execution. Its power lies in its uncanny and provable mirroring of the global properties of the system as if the snapshot were real.

Chandy and Lamport use the analogy of photographers watching a sky filled with migrating birds to explain the algorithm [8]. The scene is vast, and the birds are in constant motion-no single photo suffices. Only a composite photo from snapshots taken by different photographers at different times and then thoughtfully pieced together can capture the whole scene.

Similarly, the snapshot algorithm uses as-if reasoning to enable us to determine properties of the global state of a distributed system. As the algorithm executes, it assembles a description of a snapshot state. When it terminates, we can query the snapshot state. Any stable predicate that holds in the assembled snapshot state holds for the system. An example of a stable predicate, from [11], is “the number of tokens traveling the network equals 7.” A stable predicate is one that, once it holds, holds forever.

A network is represented as a directed, finite, strongly connected graph in which each machine is a node and each edge is a first-in first-out buffer. A computation is specified by a sequence of atomic actions by individual machines. Each action updates the machine’s state, accepts at most one message on each of its input buffers, and sends at most one message on each of its output buffers. A global system state is determined by the state of each machine and the messages in each buffer.

The snapshot state is assembled from a snapshot of each machine, taken locally and at different times, and the assembled snapshot state is a fiction. However, any stable predicate that is true for the snapshot state is true for the system’s final state.

Coloring. Initially, all nodes (machines) and edges (messages) are white. At termination, all machines and messages are red. Each machine turns red once, which Dijkstra calls “blushing,” sending white messages before turning red and red messages after turning red [11]. No red message is ever accepted by a white machine.

Assembling the Snapshot State. The snapshot state consists of the state of each machine as it turns red and the white messages accepted by the machine after it turns red. We designate one machine as the seed. It turns itself red and sends a special message, called a Marker, on each of its output buffers. It then records any white messages it receives while red. It knows that its local snapshot is complete when it has received a Marker on each of its input buffers. Each subsequent machine, when it receives a Marker on an input buffer, similarly turns red and sends a Marker on each of its output buffers. Since each machine is reachable, all machines turn red in finite time.

Rewriting History. The algorithm rewrites history—“by magic,” according to Dijkstra—so that the snapshot state is the point at which all machines turn red. To do this, whenever there is a red action followed by a white action, they must be from different machines, and it interchanges them, iterating until all white actions precede all red actions. The snapshot state is the cut in this rewritten history, that is, the point at which all subsequent actions are red and all prior actions are white. The snapshot algorithm thus uses local snapshots taken at each machine to construct an imaginary but possible global state that enables us to determine system properties as if the snapshot state were real.

Uses. In the nearly 30 years since Dijkstra saluted the snapshot algorithm as “beautiful,” it has been widely used to check global properties in distributed systems, e.g., a computation terminating, as well as for monitoring and debugging [8, 12]. For example, checkpointing and rollback recovery are key mechanisms of fault-tolerant distributed systems [13]. They enable systems that crash to recover and resume execution from a stored state rather than needing to restart from scratch. In checkpointing, each device in the system stores its local state, so that in rollback recovery after a failure, the node’s state can be restored. Nakamura et al. point out that checkpoint–rollback recovery “inherently contains a snapshot algorithm to record the nodes’ checkpoints in such a way that they compose a consistent global state” [14].

3 Local Determinism

The abstract Tile Assembly Model (aTAM) is an idealized model of molecular self-assembly on a two-dimensional surface. Winfree et al. [15] introduced the first form of the model, which was based on DNA double-crossover molecules [1] and was already Turing universal (i.e., able to simulate arbitrary computations). Winfree [4] developed the model further, and Rothemund and Winfree [5, 6] refined it to its present-day form. The aTAM has been an active area of research in the molecular programming community ever since.

A tile type in the aTAM is a unit square that cannot be rotated, so that it has well-defined “north,” “west,” “south,” and “east” sides. Each of these sides is labeled with a glue type that we may take to be a non-negative integer. Each glue type has a strength that is 0, 1, or 2.

A tile assembly system (TAS) is an ordered pair \(\mathcal {T} = (T, s)\), where T is a finite set of tile types and \(s \in T\) is the seed tile type. An assembly sequence of \(\mathcal {T}\) is a finite or infinite sequence

$$\begin{aligned} \mathbf {\alpha } = (\alpha _0, \alpha _1, \alpha _2, \dots ) \end{aligned}$$

of assemblies \(\alpha _i\) satisfying the following conditions.

  1. 1.

    \(\alpha _0\) is the assembly consisting of a single tile of type s.

  2. 2.

    For each i, if the assembly sequence \(\mathbf {\alpha }\) is long enough for \(\alpha _{i+1}\) to exist, then \(\alpha _{i+1}\) is obtained by adding a single tile of some type \(t \in T\) to the assembly \(\alpha _i\). Moreover, this tile is added by abutting it to one or more tiles of \(\alpha _i\) in such a way that the sum of the strengths of the glues on the tile whose types match the glue types of \(\alpha _i\) that they abut is at least 2.

  3. 3.

    If \(\mathbf {\alpha } = (\alpha _0,\dots ,\alpha _i)\) is finite, then the assembly \(\alpha _i\) is terminal in the sense that no tile of any type \(t \in T\) can be added to \(\alpha _i\) as in condition 2.

  4. 4.

    If \(\mathbf {\alpha } = (\alpha _0, \alpha _1, \alpha _2, \dots )\) is infinite, then it is (strongly) fair in the sense of distributed computing [16, 17], which means that, if a tile can be added to an assembly \(\alpha _i\) at some location l, then there is some \(j \ge i\) such that \(\alpha _{j+1}\) is obtained from \(\alpha _j\) by adding a tile at location l.

Several things should be noted about the above definition. First, in any assembly sequence \(\mathbf {\alpha } = (\alpha _0, \alpha _1, \alpha _2, \dots )\), each assembly \(\alpha _i\) consists of exactly \(i+1\) tiles. Second, condition 3 requires an assembly sequence \(\mathbf {\alpha }\) to “go on for as long as it can.” Third, once tiles are added to an assembly, they do not move. This ensures that every assembly sequence \(\mathbf {\alpha }\) has a well-defined result, which is an assembly denoted \(\textrm{res}(\mathbf {\alpha })\). If \(\mathbf {\alpha } = (\alpha _0, \dots , \alpha _i)\) is finite, then \(\textrm{res}(\mathbf {\alpha })\) is the finite assembly \(\alpha _i\). If \(\mathbf {\alpha } = (\alpha _0, \dots , \alpha _i)\) is infinite, then \(\textrm{res}(\mathbf {\alpha })\) is the minimal infinite assembly that has each \(\alpha _i\) as a subassembly in the obvious sense. Fourth, the fairness condition 4 ensures that the result of an infinite assembly sequence must, like the result of a finite assembly sequence, be terminal in the sense defined in condition 3.

Many aTAM investigations involve design problems. In such a problem, there is a target assembly \(\alpha ^*\), and the task is to design a tile assembly system \(\mathcal {T} = (T, s)\) such that every assembly sequence \(\mathbf {\alpha }\) of \(\mathcal {T}\) has \(\alpha ^*\) as its result. Moreover, \(\alpha ^*\) is often very large, and it is typically desirable to (i) have the number |T| of tile types in \(\mathcal {T}\) be much smaller than the number of tiles in \(\alpha ^*\) and (ii) exploit the inherent parallelism of chemistry by having it be typical for an assembly \(\alpha _{i+1}\) in an assembly sequence \(\mathbf {\alpha }\) for \(\alpha ^*\) to be just one of many assemblies that could have been obtained by adding a tile to \(\alpha _i\). These two things together imply that a solution \(\mathcal {T} = (T, s)\) for the design problem of causing \(\alpha ^*\) to reliably self-assemble typically spawns an exceedingly large number of assembly sequences \(\mathbf {\alpha }\), all with result \(\alpha ^*\).

In fact, anyone who has designed non-trivial tile assembly systems knows all too well how easy it is to design a tile assembly system \(\mathcal {T}\) in which the target assembly \(\alpha ^*\) results from many or most, but not all assembly sequences \(\mathbf {\alpha }\) of \(\mathcal {T}\). When erroneous designs of this type occur, it is typically because the designer envisions a particular assembly sequence \(\mathbf {\alpha }\) with result \(\alpha ^*\), but some alternative assembly sequence \(\mathbf {\alpha }^\prime \) with a different result can occur (often because of a “race condition” in which growth in \(\mathbf {\alpha }^\prime \) can block envisioned growth in \(\mathbf {\alpha }\)).

The local determinism method of Soloveichik and Winfree [9] is a verification method that is also a beautiful solution to the above design problem. The key to this method is a definition of what it means for an assembly sequence \(\mathbf {\alpha }\) of a tile assembly system \(\mathcal {T}\) to be locally deterministic. The details of this definition need not concern us here except to note its key properties. The definition is subtle, but it is simple enough that it is typically routine to verify that a given assembly sequence \(\mathbf {\alpha }\) is locally deterministic (if it is). The definition does not require the underlying tile assembly system \(\mathcal {T}\) to be deterministic: A locally deterministic assembly sequence \(\mathbf {\alpha }\) is typically just one of a huge number of assembly sequences that can occur in \(\mathcal {T}\).

The remarkable main theorem about local determinism is that, if \(\mathbf {\alpha }\) is a locally deterministic assembly sequence of a tile assembly system \(\mathcal {T}\), then every assembly sequence \(\mathbf {\alpha }^\prime \) of \(\mathcal {T}\) has the same result, i.e., satisfies \(\textrm{res}(\mathbf {\alpha }^\prime ) = \textrm{res}(\mathbf {\alpha })\) [9].

This local determinism theorem says that a designer of a tile assembly system \(\mathcal {T}\) for a target assembly \(\alpha ^*\) who envisions a particular tile assembly sequence \(\mathbf {\alpha }\) of \(\mathcal {T}\) with result \(\alpha ^*\) may safely reason as if \(\mathbf {\alpha }\) is the assembly sequence that occurs, even if the likelihood of \(\mathbf {\alpha }\) occurring is vanishingly small, provided only that the designer also verifies that \(\mathbf {\alpha }\) is locally deterministic. Since it is natural for designers to think in terms of an envisioned assembly sequence (“First this substructure will assemble, then this one ....”), local determinism is a very designer-friendly method. Moreover, this friendliness arises directly from the fact that the method entitles the designer to reason as if a convenient, envisioned, and unlikely assembly sequence occurs.

To put the associated verification problem fancifully, if one wants to prove that all roads (assembly sequences) lead to Rome (the target assembly), it suffices to exhibit a fancy (locally deterministic) road and show that this leads to Rome.

But even more is true. Soloveichik and Winfree’s proof also shows that, if there is a fancy road, then all roads are fancy. That is, if a tile assembly system \(\mathcal {T}\) has a locally deterministic sequence \(\mathbf {\alpha }\), then all assembly sequences of \(\mathcal {T}\) are locally deterministic. The method is thus even more designer friendly than indicated above, since any assembly sequence that the designer chooses will be locally deterministic (if some assembly sequence is).

4 The Future of As If

Structural DNA nanotechnology opened up the fascinating field of molecular programming. The challenges of this new world have pushed participating computer scientists to develop new techniques for dealing with complex interactions of geometry and non-determinism at scales surpassing prior computing experience and rivaling those of the coming Internet of Things. One of these techniques, local determinism, is an instance of reasoning as if, a method rarely used in science and still not well understood. Here, we have reviewed local determinism together with the snapshot algorithm, an earlier instance of as-if reasoning, in the hope of promoting investigation of this method.

We briefly suggest future directions for research on as-if reasoning in molecular programming. First, can local determinism be usefully adapted from the abstract Tile Assembly Model to the other models of molecular programming? Obvious candidates here include other models of tile self-assembly such as the kinetic Tile Assembly Model (kTAM) [4] and the Two-Handed Assembly Model (2HAM) [18]. More ambitious possibilities include chemical reaction networks (CRNs) [19], the CRN-directed Tile Assembly Model (CRN-TAM) [20, 21], and thermodynamic binding networks [22].

More general questions also present themselves. Without delving too deeply into philosophy [23], can we more clearly formalize as-if reasoning? A concrete start would be to formalize what it is that the snapshot algorithm and local determinism have in common.

Finally, local determinism is, roughly speaking, a condition that allows us to reason from the possibility of a target structure self-assembling to the necessity of that target structure self-assembling. Modal logic, originally developed to reason about possibility and necessity [24], has become a powerful tool in distributed computing [25, 26]. Could it provide a means of formalizing—and strengthening—as-if reasoning?

In his book, Ned graciously wrote of molecular programming researchers that “more than any other, this community has contributed valuable ideas and workers to the field of structural DNA nanotechnology,” and that “this parallel field is one of the key drivers of structural DNA nanotechnology” [2]. We are delighted to work in this parallel field, which would not exist without Ned, but we know that he is in the driver’s seat, and that is the way we want it.