1 Past

1.1 Visual DSD Origins

The first paper on what would subsequently become the Visual DSD system was published in 2009 and presented a simple programming language for nucleic acid circuit design [1]. The language was developed by observing the current state-of-the-art in nucleic acid circuits, which at the time relied primarily on toehold-mediated DNA strand displacement [2, 3], a powerful technique for implementing enzyme-free molecular computation programmed by sequence-specific DNA hybridization. This involves an invading single strand of DNA displacing an incumbent strand hybridized to a template strand, and is mediated by a short, single stranded region of DNA referred to as a toehold. The Visual DSD system was implemented using the functional programming language F# [4, 5], which facilitated the translation of its theoretical underpinnings to program code, and was released soon after as a web application [6], which simplified user adoption. DNA strand displacement has since been used to implement a broad range of computational circuits in DNA including digital logic [7], artificial neural networks [8, 9], and distributed algorithms [10], among many others [2, 3].

A key aspect of the origins and subsequent development of the Visual DSD system was the application of fundamental concepts from computer science in general, and programming language theory in particular, to formally represent corresponding concepts from dynamic DNA nanotechnology. We contend that dynamic DNA nanotechnology is therefore an embodiment of computer science in the truest sense, and illustrate this via examples of computer science methods that underpin the design and analysis of dynamic DNA nanotechnology systems.

1.1.1 Formal Syntax and Operational Semantics

Our work on the Visual DSD language was inspired by previous work on the use of process calculi to model biological systems. At the time it was already recognized that theoretical approaches originally developed to model concurrent computer systems could also be applied to model biological systems, given their inherent parallelism. A promising approach was the use of process calculi such as the pi-calculus [11], originally developed to model mobile computing systems such as telecommunications networks. Pi-calculus processes can create fresh names, send and receive them over channels, and spawn new processes. The first work on biological modeling using the stochastic pi-calculus [12] was carried out by Regev et al. [13] and subsequent work developed an operational semantics for a stochastic pi-calculus programming language and its corresponding implementation [14, 15]. Operational semantics formally defines the meaning of programs by specifying what happens when programs are executed, typically using reduction rules that determine a set of transitions from one program to another. The set of valid programs is defined by a formal syntax.

In the case of the pi-calculus, a simplified syntax can be defined as follows, where the options on the right, separated by short vertical bars, represent valid instances of the syntactic category on the left:

$$\begin{aligned} P&{:}{:}= \pi .P \shortmid (P_1 \mid P_2) \shortmid \nu x P \\ \pi&{:}{:}= \overline{x}\langle y \rangle \shortmid x(z) \end{aligned}$$

The definition states that a process P can be an action \(\pi .P\), which runs the prefix \(\pi \) followed by P; a parallel composition \((P_1 \mid P_2)\), which runs \(P_1\) in parallel with \(P_2\); or the creation of a fresh channel \(\nu x P \), which creates a fresh channel x visible only to P. An action \(\pi \) can be a sender \(\overline{x}\langle y \rangle \), which sends the message y over channel x, or a receiver x(z), which receives a message z on channel x. As is standard for syntax definitions, the syntax of a process is recursive, meaning that it can be unfolded to represent processes of arbitrary size.

A simplified operational semantics can be defined using a reduction rule (1) to specify the conditions under which two parallel processes can communicate; the term can then be reduced to the result of the computation step. In this rule, if a sender process \(\overline{x}\langle y \rangle .P\) runs in parallel with a receiver process x(z).Q, then the sender and receiver can communicate on channel x, after which P and Q continue running in parallel, with the message y assigned to variable z in Q, written \(Q_{\{y/z\}}\). To allow multiple communicating processes to run concurrently, a rule (2) states that if P can reduce to \(P'\) then the same reduction can still be applied when P runs in parallel with another process Q:

$$\begin{aligned} \overline{x}\langle y \rangle .P \mid x(z).Q&\longrightarrow P \mid Q_{\{y/z\}}\end{aligned}$$
(1)
$$\begin{aligned} \textit{ if } P\longrightarrow P' \textit{ then }P\mid Q&\longrightarrow P' \mid Q \end{aligned}$$
(2)

Full definitions of pi-calculus syntax and semantics are provided in [11, 15, 16].

The Visual DSD language was inspired by the pi-calculus syntax and operational semantics. This includes formally defining dynamic DNA nanotechnology species as processes that can be composed in parallel and can interact via shared complementary domains, similar to how processes in the pi-calculus can send and receive messages on channels. To this end, the Visual DSD language used a simple syntax to represent a class of linear heteropolymer structures that mapped to a large fraction of the nucleic acid circuits being designed at the time, both theoretically and experimentally. This included seesaw gates, proposed by Qian and Winfree [17], which fit this paradigm and were used in two seminal papers in 2011 to implement large-scale digital logic circuits [7] and artificial neural networks [8]. In addition, Soloveichik et al. showed that this class of structures could be used to encode arbitrary chemical reaction networks as DNA circuits [18]. The syntax for this class of structures [1, 15, 19] can be summarized as follows:

$$\begin{aligned} \begin{array}{rcll} P &{} {:}{:}= &{} A \shortmid C \shortmid (P_1 \mid P_2) &{} \text {Strand, Complex or Parallel composition}\\ A &{} {:}{:}= &{} \langle S\rangle \shortmid \{S\} &{} \text {Upper or Lower strand} \\ C &{} {:}{:}= &{} \{S_L\}\langle S'_L\rangle [S]\langle S'_R\rangle \{S_R\} &{} \text {Segment with left and right overhangs} \\ &{} \shortmid &{} C_1:C_2 &{} \text {Complexes joined on Lower strand} \\ &{} \shortmid &{} C_1{:}{:}C_2 &{} \text {Complexes joined on Upper strand} \\ S &{} {:}{:}= &{} D_1 \ldots D_N &{} \text {Sequence of Domains} \\ D &{} {:}{:}= &{} X \shortmid {X}{}^\wedge \shortmid {X}^{*} \shortmid {X{\hat{\;} ^* }} &{} \text {Long or short Domain, or its complement} \\ \end{array} \end{aligned}$$

The definition states that a Process P can either be a Strand A, a Complex C, or a parallel compositions of Processes \((P_1\mid P_2)\). Strands are either Upper \(\langle S \rangle \) or Lower \(\{S\}\) and contain a Sequence \(D_1 \ldots D_N\) of Domains, where a Domain D may be a long domain (\( X \)), a short toehold domain (\({X}{}^\wedge \,\)), or a complement of one of these, written \({ X }^{*}\) and \(X{\hat{\;} ^* }\), respectively. Complexes are comprised of double stranded segments [S] which may have Upper and Lower strand overhangs on either side and can be concatenated along their Upper or Lower strands. This syntax allows complexes to be written compositionally, resulting in a close correspondence between syntax and structure that facilitates reading and writing of programs by the user. Note that here we use Upper and Lower to refer to the position of strands in a 2D graphical representation, which is common practice and aids in the visualization of complexes.

Using this syntax, we can program a simple Join circuit, which takes two inputs and produces one output as shown below, where the graphical representation is equivalent to the program code:

A graphical representation illustrates 2 inputs and one output of the Joint circuit program. The complex includes double stranded segments with overhanging and underhanging strands on either side.
A textual representation of the complex of Join species. t b circumflex b in angle brackets, vertical bar, t x circumflex x in angle brackets, vertical bar, t b circumflex asterisk in curly brackets and b t x circumflex in square brackets, colon, x to circumflex in square brackets, vertical bar, f 1 circumflex in angle brackets, to circumflex asterisk in curly braces. Two segments of the code represent double stranded duplex.

This circuit will be used as a running example throughout the remainder of the paper. It illustrates the syntax of strands and complexes, which are the two main types of DNA species, and their parallel composition. As stated in the above formal definition, a strand is represented as a sequence of domains enclosed in angle brackets, where the 3’ end of the strand is assumed to be on the right, represented graphically by an arrowhead, and a toehold domain is represented by appending the (^) character to the domain name. For example, \(\mathtt{<}\!\) tb^b \(\mathtt{>}\) represents a strand consisting of the toehold domain tb^ followed by the domain b. Note that a DNA strand can also be represented as a sequence of domains enclosed in curly brackets, where the 3’ end of the strand is instead assumed to be on the left. For example, the strand \(\mathtt{<}\!\) tb^b \(\mathtt{>}\) can also be written as {b tb^}. This is because strands are identical up to rotation symmetry, such that we can write the same strand either from left to right or from right to left. A complex is represented as a sequence of segments, where each segment is a double stranded duplex with overhanging upper or lower strands to the left or right. Complementary domains are represented by appending the (\(\mathtt{*}\)) character to the domain name. For example, the code {tb^*}[b tx^]:[x to^] represents a complex consisting of two segments. The first segment {tb^*}[b tx^] represents a double stranded duplex [b tx^] consisting of the strand \(\mathtt{<}\!\) b tx^ \({} \mathtt{>}\) bound to its complementary strand {b* tx^*}, with an additional single lower strand overhang {tb^*} to the left of the duplex. The second segment [x to^] represents a double stranded duplex consisting of the strand \(\mathtt{<}\) x to^ \(\mathtt{>}\) bound to its complementary strand {x* to^*}. These two segments are joined together along the lower strand by the operator (\(\mathtt{:}\)). Although the textual representation of complexes is defined as a connection of segments, in reality the connection results in a single continuous lower strand, as can be seen by the graphical representation. More generally, this is also the case for complexes joined along the upper strand, with the potential for multiple disconnected bottom strands.

We also defined an operational semantics for the language, which formalizes how DNA species can interact. Using the syntax introduced above, we present two simplified rules: a Bind rule for toehold-mediated binding of strands, and a Migrate rule for rightward branch migration of an invading strand displacing an incumbent:

A graphical representation illustrates the bind rule in strands. The upper strand L N R binds to the lower strand L dash N asterisk R dash through toehold domains and ends with a single complex.
$$\begin{aligned} \langle L~{N}{}^\wedge ~R\rangle \mid \{L'~{{N}{}^\wedge }^*~R'\}&\xrightarrow {\text {Bind,N}} \{L'\} \langle L \rangle [{N}{}^\wedge ]\langle R \rangle \{R'\} \end{aligned}$$
A graphical representation illustrates the migrate rule in strands which turns the unbound S on the left into bound on the right.
$$\begin{aligned} \{L'\}\langle L \rangle [S_1] \langle S~R_2 \rangle \text {:} \langle L_1 \rangle [S~S_2] \langle R \rangle \{R'\}&\xrightarrow {\text {Migrate,S}} \{L'\}\langle L \rangle [S_1~S] \langle R_2 \rangle \text {:} \langle L_1~S\rangle [S_2] \langle R \rangle \{R'\} \end{aligned}$$

In the rules above, the symbols \(L,L',L_1\) and \(R,R',R_2\) match a (potentially empty) sequence of domains. In the Bind rule, the upper strand \(\langle L~{N}{}^\wedge ~R\rangle \) binds to the lower strand \(\{L'~{{N}{}^\wedge }^{*}~R'\}\) via the complementary toehold domains \({N}{}^\wedge \) and \({{N}{}^\wedge }^{*}\), producing a single complex \(\{L'\} \langle L \rangle [{N}{}^\wedge ]\langle R \rangle \{R'\}\). A corresponding Unbind rule (not shown) would be similar, except that the direction of the reaction would be reversed. In the Migrate rule, the invading domains S that are unbound on the left-hand side of the rule become bound on the right-hand side of the rule, displacing the incumbent domains S that were previously bound. Given that the bound \(S_2\) domains are also present, the strand itself remains attached to the complex. A corresponding Migrate Left rules is also needed for migration in the other direction (not shown). Full definitions of the Visual DSD syntax and semantics are provided in [1, 15, 19], including additional rules that allow reductions to take place inside joined complexes and parallel compositions.

1.1.2 Chemical Reaction Networks

An additional contribution of the Visual DSD system was to formalize the compilation of a program, representing an initial set of DNA species, into a computational model describing how these species can interact with each other. We use the term compilation from computer science, where it denotes the translation of a higher-level language into a lower-level one – typically executable machine code. In the case of Visual DSD, the output of compilation is an executable kinetic model, formalized as a chemical reaction network (CRN) [20], which is defined as a set of reactions. Each reaction consists of a multiset of reactant species, a reaction rate and a multiset of product species. CRNs bear many similarities to Petri nets, a graph-based formalism for distributed systems that takes the form of a bigraph containing places (analogous to chemical species) and transitions (analogous to chemical reactions), which has also been used to model biological systems [21, 22]. Here, the species of the CRN are the species of the Visual DSD program, and the compilation rules are defined by the operational semantics of the Visual DSD language. This allows compilation of arbitrary programs drawn from an infinite set. In this way, Visual DSD mirrors the edit-compile-run cycle of traditional computer programming. Essentially, the programmer uses their mental model of how DNA species interact to carefully design a program consisting of an initial set of species with an intended behavior. Compilation applies the rules of the Visual DSD language to the program to generate a CRN representing all possible interactions between species. The CRN is then executed using a chosen simulation algorithm for a particular set of initial conditions, resulting in an execution trace of the behavior of the species over time. The programmer can then revise the Visual DSD program if the observed behavior differs from the intended behavior.

Compilation of DNA species to CRNs is achieved by applying the reduction rules in a recursive loop to enumerate all possible reactions. Briefly, the compilation algorithm [1, 19] works by starting with an empty set of processed species and a set of unprocessed species corresponding to the species initially present in the Visual DSD program. At each step of the loop, the algorithm removes a species from the unprocessed set and computes all possible unimolecular reactions together with all possible bimolecular reactions involving the existing processed species. The resulting reactions can in turn generate new species, which are added to the unprocessed set. This loop is repeated until the set of unprocessed species is empty. In this way, the algorithm enumerates all species and reactions that can be generated from the initial species. We also generalized this approach to compile and simulate programs expressed in different languages [15], including the stochastic pi-calculus. In general, this approach allows for potentially unbounded numbers of species, meaning that compilation of Visual DSD programs may not terminate. As with most programming languages, checking for termination is not possible in general so it is up to the programmer to develop programs that terminate. To help mitigate this, Visual DSD also provides a just-in-time mode that interleaves compilation with simulation (see Sect. 1.2.1).

The compiled CRN for our running Join circuit example is as follows:

A graphical representation illustrates the compiled C R N. The input 1 strand binds to the Join complex and produces Reverse strand and Join 2 complex. Next, the input 1 strand binds to Join 2 and produces an output strand. At the bottom, a signal strand is produced.

The first reaction in the forward direction is derived from the Bind rule, with \(N=tb\), \(R=b\), and \(L,L',R'=\emptyset \). This is followed by an application of the Migrate rule, which displaces the \( b \) domain, with \(S_1={tb}{}^\wedge \), \(S= b \), \(S_2={tx}{}^\wedge \), and \(L,L',L_1,R_2,R,R'=\emptyset \). This is followed by a reaction to unbind the distal \(\,{tx}{}^\wedge \) toehold, which is an application of the Unbind rule with \(N=tx\), \(L=b\) and \(L',R,R'=\emptyset \). In the figure above, these three reactions are merged into a single forward reaction, and similarly for the corresponding reverse reaction, resulting in the first reversible reaction shown above. The merge assumes that displacement and toehold unbinding reactions are much faster than toehold binding reactions, as defined by an Infinite DSD semantics [19], which we discuss further below. The other two reactions are derived similarly, where the final reaction is irreversible.

The behavior of the CRN can be summarized as follows, where strands and complexes are named for convenience. In the first reversible reaction, the \( \texttt {Input1}\) strand \(\mathtt{<}\!\) tb^ b \(\mathtt{>}\) binds to the \(\texttt{Join}\) complex {tb^*}[b tx^]:[x to^] to produce a \(\texttt{Reverse}\) strand \(\mathtt{<}\) b tx^ \(\mathtt{>}\) and an intermediate \(\texttt{Join2}\) complex. In the second reversible reaction, the \(\texttt{Input2}\) strand \(\mathtt{<}\) tx^ x \(\mathtt{>}\) binds to the \(\texttt{Join2}\) to complex produce an \(\texttt{Output}\) strand \(\mathtt{<}\) x to^ \(\mathtt{>}\). In the third reaction, the \(\texttt{Output}\) strand binds to the \(\texttt{Reporter}\) complex \(\mathtt{<}\) fl^ \(\mathtt{>}\) [x]{to^*} to produce a \(\texttt{Signal}\) strand \(\mathtt{<}\) fl^ x \(\mathtt{>}\), whose fluorophore emits light that can be measured. Note that the \(\texttt{Output}\) is only produced if both inputs are present, which corresponds to the desired Join circuit behavior. This CRN can then be simulated and analyzed using a range of computational methods that facilitate nucleic acid circuit design, as outlined below.

1.2 Visual DSD Evolution

Building on our initial work, over time we sought to generalize the set of nucleic acid circuits that could be designed and analyzed with the Visual DSD system. This was motivated by corresponding experimental advances that implemented new types of nucleic acid structures and interactions not yet supported by Visual DSD. In addition, even when the Visual DSD system was first created there were a number of published experimental systems that were not supported, including hairpin assembly systems [23], hybridization chain reaction systems [24], and the original DNA strand displacement tweezer system of Yurke et al. [25]. Furthermore, these limitations in syntax meant that certain structures, such as branching structures, that could potentially be generated even when starting from the subset of supported linear structures, could not be represented. This further motivated the subsequent evolution and generalization of the Visual DSD syntax and semantics, which we outline below.

1.2.1 Polymer Structures (2011)

Some of our earliest work on expanding the set of nucleic acid structures supported by Visual DSD was to include unbounded polymer structures created by connecting multiple nucleic acid complexes. This was achieved by adding new semantic rules to enable multi-stranded complexes to bind via complementary overhanging sequences on the ends of the complexes (but not partway along, which would result in the formation of tree-like structures). Such polymer systems had previously been shown to enable the direct representation of a Turing-complete stack machine in nucleic acids [26].

A practical issue with these types of structures is the possibility of generating an infinite CRN, since polymers can grow without bound during reaction enumeration. To address this, we introduced a just-in-time (JIT) enumeration algorithm for stochastic simulation [15], which compiled reactions on-the-fly as they became possible during simulation. By interleaving reaction enumeration and simulation steps, the algorithm only enumerates the finite set of reactions that occur in a single stochastic simulation, rather than attempting to enumerate the infinite set of possible reactions.

This approach draws on the notion of just-in-time compilation from computer science. In general, computer programs written in high-level languages are typically compiled into a lower-level language such as machine code, before they can be executed. This can be done ahead of time, such as for the C programming language, or at runtime while the program is being executed, such as for the Java programming language. This is called just-in-time compilation, where Virtual Machine bytecode such as Java bytecode is compiled into machine code at runtime, and it inspired our algorithm for stochastic simulation of potentially infinite CRNs [15].

We illustrate our approach by modifying our running Join circuit example as shown below, including replacing the double stranded to^ duplex with a single stranded hairpin:

A graphical representation illustrates the replacement of double stranded to circumflex duplex with a single stranded hairpin.
A textual representation of the complex. t b circumflex x in angle brackets, vertical bar, t x circumflex x, vertical bar, t b circumflex asterisk in curly brackets, x t x circumflex in square brackets, colon, x, to circumflex, vertical bar, t b circumflex x to circumflex asterisk, close brackets.

This results in a hairpin-based Join circuit capable of forming unbounded polymers. Below are some of the initial reactions from the resulting CRN, as enumerated by Visual DSD using JIT compilation:

A graphical representation of initial reactions. It has the binding of two complexes and results in t b circumflex strand.

The last reaction demonstrates the binding of two complexes to open up the tb^ hairpin, resulting in an exposed \(\mathtt{<}\) tb^ x \(\mathtt{>}\) strand that can in turn interact with the Join complex to continue growing the polymer indefinitely. Note that here we omit the hairpin closing reactions, however these can be enabled as a type of leak reaction (see Sect. 1.2.3 and the Visual DSD manual for details).

Using this approach, we were able to simulate and analyze a Turing-complete stack machine in Visual DSD, based on an efficient encoding of stacks as DNA nanostructures [27].

1.2.2 Semantic Abstractions (2012)

We also sought to enhance the flexibility of Visual DSD by encoding a number of distinct assumptions about DNA strand displacement kinetics as semantics rules. This produced an abstraction hierarchy of modeling assumptions [19], ranging from a Detailed semantics that explicitly models all toehold binding, branch migration, and toehold unbinding steps as distinct reactions, to an Infinite semantics in which branch migration and toehold unbinding are assumed to be instantaneous. These assumptions were implemented as options to the reaction enumeration algorithm. Infinite mode tends to be a good approximation at low concentrations, where unbinding and branch migration are substantially faster than binding and can be effectively modeled as infinitely fast. Detailed mode tends to be a better approximation at higher concentrations, though this comes with a higher computational cost.

An example of a strand displacement step compiled in Detailed mode is as follows, based on our running Join circuit example:

A graphical representation of reversible reactions illustrates the strand displacement in detailed mode.

Here, binding, unbinding, and migration reactions are assumed to have finite rates k, u, and m, respectively. In contrast, when the circuit is compiled in Infinite mode the above 3 reversible reactions are merged into a single reversible reaction:

A graphical representation illustrates the single reversible reaction by merging binding, migration, and unbinding.

Here, the concentration of species is assumed to be sufficiently low that the rates of unbinding and migration reactions are infinite compared to the rates of binding reactions. As a result, strand displacement is assumed to take place in a single step that merges binding, migration, and unbinding. Since branch migration is infinite, complexes are also considered equal up to branch migration. More generally, this approach allows a circuit to be analyzed at varying levels of detail without needing to modify the Visual DSD program, facilitating a trade-off between the accuracy of the model and the computational cost of the analysis [19].

1.2.3 Leaks (2012)

Experimental results were increasingly demonstrating that DNA strand displacements circuits often did not function as intended, in many cases due to the presence of leak reactions in which an invader strand displaces an incumbent at a low rate, in the absence of a complementary toehold domain. Such leaks are typically unintended because they do not follow a toehold-mediated reaction pathway and can be a major source of unwanted signal in experimental implementations of DNA circuits. In response to the growing body of experimental data demonstrating the occurrence of leaks, we extended the Visual DSD system to model these types of reactions [19]. We achieved this by extending the Visual DSD semantics with additional leak rules, which essentially correspond to versions of the branch migration rules in which no toehold is present and the invading strand is not part of the same complex as the incumbent.

A leak reaction occurs when the nucleotides at one extremity of a bound strand spontaneously unbind, creating a short toehold that facilitates a strand displacement reaction. In our running Join circuit example, the Input2 strand can displace a bound Output strand from the Join complex, even in absence of the Input1 strand. This happens when one or two nucleotides at the 5’ end of the bound Output strand spontaneously unbind, creating a short toehold that allows the x domain of the Input2 strand to displace the Output, as follows:

A graphical representation illustrates the leak reaction. The input 2 strand displaces a bound output strand.

While the leak rate (\(10^{-9}\) nM\(^{-1}\)s\(^{-1}\)) is typically several orders of magnitude slower than the toehold-mediated strand displacement rate (k), over time it can still result in the accumulation of unwanted Signal strand even when only one of the two input strands is present, which is not the intended functionality of the Join circuit. More generally, this work highlights the potential of Visual DSD to model and predict experimental interactions that are not specifically intended by the circuit designer.

1.2.4 Localized Components (2014)

Another important development in the field of dynamic DNA nanotechnology was the design of spatially localized circuits on DNA origami tiles, first computationally [28, 29] and later experimentally [30, 31]. This approach combines aspects of dynamic and structural DNA nanotechnology to improve both the speed and scalability of circuit designs. To incorporate this new development, we generalized the Visual DSD system by introducing syntax to represent nanostructures tethered to a tile surface [32]. Our initial approach was to tag tethered components with labels that indicate which components are tethered close enough to interact, and to model the effects of locality on reaction kinetics by associating a local concentration with each tag. This concentration was either estimated by the user or inferred from experimental data. We separately showed that SMT-based constraint solving can be used to determine satisfiability of the geometric constraints inherent in tethered molecular structures [33]. More recently, we used simple biophysical models to computationally estimate the rate constants for such localized reactions [34].

We illustrate our approach with a modified version of our running Join circuit example:

A graphical representation of two species in close proximity to each other. Species include join and reporter.

The Join species is modified by replacing the double stranded to^ duplex with a single stranded to^ hairpin and by adding a tether with location tag l on the 3’ end of the tb^* overhang. In addition, the Reporter is modified so that it contains a tether with the same location tag l on the 5’ end of the to^* overhang. This models the assumption that the two species are tethered close to each other, such that their effective concentration is given by l, here assumed to be 10,000 nM. In practice, these local concentrations can be estimated from data using parameter inference methods [30]. In the resulting CRN, the freely diffusing Input1 strand \(\mathtt{<}\!\) tb^ b \(\mathtt{>}\) binds to the Join complex tethered to the tile and displaces the \(\mathtt{<}\) b tx^ \(\mathtt{>}\) strand. The freely diffusing Input2 strand \(\mathtt{<}\) tx^ x \(\mathtt{>}\) then opens the hairpin, exposing the to^ toehold. This then binds to the exposed to^* toehold of the tethered Reporter complex and displaces the Signal strand \(\mathtt{<}\) fl tx^ \(\mathtt{>}\) . The interaction is scaled by the local concentration l, since the Reporter and Join complex are tethered in close proximity to each other. Importantly, the resulting scaled rate 10000*k of this reaction is unimolecular with units s\(^{-1}\), since it involves two complexes tethered to the same origami at fixed locations, and therefore the interaction between these two tethered complexes is not affected by the concentration of strands in solution. This approach was used to model the kinetics of localized logic circuits, by inferring local concentrations from experimental data [30].

1.2.5 Custom Reactions (2014)

Throughout this period, a range of architectures for dynamic DNA nanotechnology were continually being developed, including some that made use of not only DNA strands but also DNA or RNA enzymes. Two prominent examples include the PEN DNA toolbox, developed in the Rondelez lab [35, 36], and the Genelet system, developed in the Winfree lab [37] and subsequently refined in the Schulman lab [38]. Other, related enzyme-driven architectures have also been developed [39, 40]. These architectures rely on DNA or RNA enzymes such as polymerases, exonucleases, and nickases to implement computational circuits. The PEN DNA toolbox, in particular, has a suite of multiple software tools for circuit design and analysis [41,42,43]. Since the enzymatic components of these systems were not supported in the Visual DSD system, which solely modeled nucleic acid interactions, we developed an extension to enable the insertion of custom, user-specific chemical reactions into the compiled CRN model.

A simple example of a custom reaction is shown below, by modifying our running Join circuit example.

A graphical representation of a custom reaction where the input 1 strand produces 0.1 rate and consumed at 0.01 rate.

This models the assumption that the Input1 strand \(\mathtt{<}\!\) tb^b \(\mathtt{>}\) is produced at rate 0.1 and consumed at rate 0.01 by enzymatic synthesis and degradation, respectively. Note that production is modeled here with a constant rate, hence no reactant is specified. Since Visual DSD does not support an explicit representation of enzymes, their effects are modeled at an abstract level using custom reactions. Furthermore, in the Join circuit example the default rate of the strand displacement reaction is equal to the rate k associated with the toehold to^ that mediates the reaction. However, in practice we may wish to allow different reactions mediated by the same toehold to take place at different rates, in this case to model the fact that the presence of the fluorophore on the reporter complex gives rise to a different strand displacement rate r. We used custom reactions such as these to demonstrate the modeling of feedback control circuits in Visual DSD with three different architectures: strand displacement reactions, Genelets, and the PEN DNA toolbox [44].

1.2.6 Complex Topologies (2016)

The above versions of Visual DSD were all based on the original underlying syntax of linear polymers. However, by this time experimental techniques using complex branching structures were becoming increasingly common, but were not supported by this syntax. To keep pace with these experimental techniques, we developed a new version of Visual DSD that supported arbitrary graph structures, based on the notion of strand graphs [45]. Graphs consist of nodes connected by edges and have found application in many areas of computer science, such as modeling the structure of the Internet. We note that other work had previously used graphs to model DNA nanostructures [46,47,48]. Site graphs are a generalization of graphs in which each node has multiple named sites and each edge connects two sites. Site graphs were already being used to model complex protein structures and their interactions, for instance by the Kappa language [49]. We proposed strand graphs [45] as a variant of site graphs in which the sites are ordered, where each node represents a strand and each site represents a domain, the order of the sites corresponds to the order of the domains in the strand, and an edge represents a bond between two complementary domains. The strand graph data structure is highly general, enabling arbitrary DNA nanostructures to be represented, including pseudoknots (structures with non-nested bonding patterns). We defined semantic rules that generalized the interactions between DNA strands to support these complex topologies, while also preserving all of the rules from previous versions of Visual DSD based on linear polymers [45]. For the remainder of this paper we refer to the Visual DSD syntax based on linear polymers as the classic syntax.

The complexes from our running Join circuit example can be expressed in strand graph syntax as follows, where the graphical representation above corresponds to the textual representation below:

A graphical representation of the single stranded input 1 and input 2 species such as t b circumflex b enclosed in angle brackets and t x circumflex x enclosed in angle brackets.
A textual representation of the complexes written from the 5 dash end to the left, to the 3 dash end on the right.

The program consists of a multiset of strands, separated by the parallel composition operator. As with the classic syntax there are two types of species, single strands and complexes. The single stranded Input1 species \(\mathtt{<}\!\) tb^ b \(\mathtt{>}\) and Input2 species \(\mathtt{<}\) tx^ x \(\mathtt{>}\) have the same textual representation as in the classic syntax, while the Join complex consists of two shorter strands \(\mathtt{<}\) x!2 to^!1 \(\mathtt{>}\) and \(\mathtt{<}\) b!4 tx^!3 \(\mathtt{>}\) bound to a longer strand \(\mathtt{<}\) to^*!1 x*!2 tx^*!3 b*!4 tb^* \(\mathtt{>}\), and the Reporter complex consists of a Signal strand fl^ x!4 bound to a Quencher strand \(\mathtt{<}\) to^* x*!4 \(\mathtt{>}\). Complexes are enclosed in square brackets, assuming that the strands within the square brackets form a connected component, meaning they are all connected to each other via named bonds, with autogenerated numbers used for new bonds generated during reaction enumeration. Complexes are considered equal up to renaming of bonds, meaning that individual bond names do not matter provided they are distinct. Note that the textual syntax assumes that all strands are written from the 5’ end on the left to the 3’ end on the right, whereas in the graphical representation the longer strand is rotated to align with the two shorter strands. This reflects the fact that complementary strands bind anti-parallel to each other, where the 3’ end of one strand aligns with the 5’ end of the other. In practice the bonds can be omitted from the graphical representation for conciseness and improved readability, since they are only used to indicate connectivity. While the Join example shown can also be represented in the classic syntax, the strand graph syntax can be used to represent much more complex structures, including branching structures and pseudoknots. Side-conditions on the corresponding inference rules serve to limit applications of the rules that could generate physically implausible structures, such as binding within tight hairpin loops [45]. To illustrate this, we used the strand graph version of Visual DSD to analyze nucleic acid circuits with branching topologies that were previously implemented experimentally. We refer the reader to Petersen et al. [45] for details of the strand graph semantics that supports arbitrary structures, and to Spaccasassi et al. [50] for a more general semantics in terms of logic programming, which we discuss in Sect. 2.1.

1.3 Visual DSD Analysis

Over time, a number of model analysis capabilities were added to the Visual DSD system. This was facilitated by the formal underpinnings of Visual DSD including a well-established formal semantics, which allowed a range of computer-aided verification and analysis techniques to be brought to bear [51]. In this section we briefly outline some of these methods and illustrate their application using the running Join circuit example.

1.3.1 Probabilistic Model Checking (2012)

Model checkers can automatically verify whether a state-transition system satisfies a specification expressed in a temporal logic such as computation tree logic (CTL), which is a branching time logic. For example, the following CTL formula

$$\begin{aligned} \textsf{A}\ [\ \textsf{F}\ \texttt {"terminal"}\ ] \end{aligned}$$

states that all possible paths (A) through the state space will finally reach (F) a “terminal” state. We can illustrate the application of this formula to the running Join circuit example, by analyzing the state space that is generated from an initial state containing one copy of each species. This state space can be represented graphically as follows:

A graphical representation illustrates four states, where each transition corresponds to the execution of a single reaction. Initial and terminal states are highlighted.

Each state contains a multiset of species, and each transition between states corresponds to the execution of a single reaction, where the rate of the transition is given by the rate of the reaction multiplied by the number of copies of each species involved in the reaction. Here the state space is relatively simple since only one copy of each species is present initially, however the number of states can grow exponentially with the number of copies. The “initial” and “terminal” states are labeled and outlined in bold using black and red colors, respectively, where the terminal state is defined as a state with no outbound edges, meaning that no reactions are possible from this state. Here we can see by inspection that the system satisfies the above formula, since it always converges to the "terminal" state.

For more complex state spaces we used the PRISM probabilistic model checker in combination with Visual DSD to verify a range of correctness properties expressed in a probabilistic temporal logic [52]. Using this approach, we showed how probabilistic model checking can identify design flaws resulting from circuit cross-talk, validate garbage collection schemes that clean up strands with exposed toeholds, and compute the probability of reaching a consensus state in an approximate majority voting circuit [52]. This approach is tractable for relatively small numbers of molecules, where stochastic effects are important, such as spatially localized molecules. For large numbers of molecules in solution, system behavior can often be viewed as deterministic, in which case an ODE is generally a suitable approximation.

1.3.2 Model Simulation and Parameter Inference (2013)

Visual DSD was originally developed with built-in methods to simulate the behavior of a CRN over time and dynamically visualize the simulation output [6]. This immediate visual feedback of circuit behavior helped to accelerate the design of nucleic acid circuits. A standard stochastic simulator was implemented first, followed by a just-in-time stochastic simulator that supported unbounded CRNs [15]. A deterministic solver was also added, which generated ordinary differential equation (ODE) models from the CRN according to standard mass action kinetics. This deterministic solver was used to incorporate Bayesian parameter inference using Markov Chain Monte Carlo (MCMC), allowing model parameters to be inferred from experimental data. The plots below show fitting of the model parameters to measured datapoints (left) and examples of marginal posterior distributions for parameter values obtained via Bayesian parameter inference (right). The rate \(k=0.003\)/nM/s was inferred from the data points shown using simulations for initial concentrations of 10 nM Input1 and Input2, 100 nM Join, and Reporter.

Two screenshots illustrate the simulation result for initial concentrations of input 1 and 2, join, and reporter. Left. A line graph of concentration versus the time has 8 mixed trend curves. Right. A histogram illustrates the frequency of the data points.

This parameter inference method was used to help design an experimental implementation of a molecular consensus algorithm using two-domain DNA strand displacement reactions [10], and a spatially localized architecture for fast and modular DNA computing [30].

1.3.3 Satisfiability Modulo Theories Solving (2013)

Satisfiability modulo theories (SMT) solvers generalize Boolean Satisfiability (SAT) solvers by incorporating additional theories, such as theories for integer-valued or real-valued arithmetic. This approach can be used to verify functional properties of nucleic acid circuits, including to guarantee properties of the terminal states of a circuit as a function of its initials state. One key advantage of using the SMT approach is that the analysis can scale to millions of copies of a circuit in parallel, though this comes at the cost of not being able to analyze temporal or probabilistic properties of the circuit. Another key advantage is that circuits can be verified for arbitrary input conditions, as opposed to a single combination of inputs.

We integrated the Z3 SMT solver [53] with Visual DSD and used this to prove the functional correctness of large-scale DNA strand displacement circuits [54]. We illustrate the method using the running Join circuit example, by verifying that this circuit satisfies the predicate \(P_{\texttt {AND}}(q_0,q)\) defined in [54] . The predicate specifies that the system running from initial state \(q_0\) to final state q produces a final quantity of output q(O) that is the smaller of the two initial input quantities, \(q_0(I_1)\) and \(q_0(I_2)\):

$$\begin{aligned} P_{\texttt {AND}}(q_0,q) \Longleftrightarrow q(O) = min (q_0(I_1),q_0(I_2)) \end{aligned}$$

The method also identifies the precise constraints on the initial conditions that are required for the specification to be satisfied, in this case that the initial number of Reporter and Join complexes must be greater than the initial number of Input1 and Input2 strands. This allows us to prove the correctness of the circuit for arbitrary inputs, provided the constraints are satisfied. The logical behavior of the AND circuit is formalized using a threshold \(\theta \), where an input or output strand represents a logical True if an only if the number of copies of the strand is greater than \(\theta \). We verify that the system satisfies the following formula, which specifies that the final output value q(O) being above a threshold \(\theta \) is equivalent to both initial input values \(q_0(I_1)\) and \(q_0(I_2)\) being above that threshold:

$$\begin{aligned}{}[q(O)> \theta ] \Longleftrightarrow [q_0(I_1)> \theta ] \wedge [q_0(I_2) > \theta ] \end{aligned}$$

Together these formulae specify that the Join circuit will function as an AND logic gate on its two inputs. Using this approach, we verified the behavior of a 4-bit square root circuit, together with the components used for its construction, even when millions of copies of the circuits are interacting with each other in parallel. This method is also applicable to the analysis of chemical reaction networks with large species counts, to provide guarantees for arbitrary initial conditions, though without taking into account kinetic rates. This complements alternative methods such as interactive theorem proving, which has been used to verify coupled phase transitions in population protocols [55]. We have also used SMT solving to check the geometric constraints inherent in localized molecular circuit interactions [33].

1.3.4 Spatial Simulation with Partial Differential Equations (2014)

To keep pace with the increasing number of nucleic acid circuits being implemented in a spatial context at the time [56, 57], we added a partial differential equation (PDE) solver to Visual DSD to model diffusive aspects of spatially heterogeneous circuit designs. The general form of PDEs that Visual DSD can solve are those that can be described by the equation \(\frac{\partial c}{\partial t} = f(c) + D\nabla ^{2}c\).

We illustrate this spatial simulation method using the running Join circuit example, by specifying the initial placement of the two input species. The Input1 species is initialized with a value of 10 nM in a circular region of width 0.4 located at coordinates (0.3, 0.3), while the Input2 species is initialized in a similar fashion at coordinates (0.7,0.7), where the coordinates are expressed as fractions of the dimensions of the 2D surface. As in the well-mixed system, we assume 100 nM Join and Reporter complexes in solution throughout. We also include additional spatial directives, which specify zero flux boundary conditions, and that the two input species diffuse at rate 0.5. The 2D spatial domain is a square of edge 50 mm, with a grid resolution of 100 divisions in each dimension and a simulation time step of 1s. The heat maps below shows the quantity of Signal using the 1-dimensional (left) and 2-dimensional (right) PDE solver.

A heat map of time versus distance on the left. The highest range scale mark is between 10 and 30 distances. On the right, a heat map of distance versus distance. The highest range scale mark between 25 and 0 along the y-axis.

In both simulations we observe signal in the region of space where both signals overlap due to diffusion, consistent with the desired behavior of a spatial Join circuit. This approach was used to model a number of DSD systems with non-trivial spatial dynamics, including an autocatalytic network, a predator-prey system, and a spatial molecular consensus algorithm [58]. It was later used to assist with the design of spatial DNA-based communication in populations of synthetic protocells [59].

1.3.5 Probabilistic Model Checking with the Chemical Master Equation (2015)

While our previous probabilistic model checking approach was highly efficient for analyzing a system at a single point in time [52], we sought to substantially improve the efficiency of analysis over multiple time points. We achieved this through numerical integration of the Chemical Master Equation (CME), which is a system of ordinary differential equations whose solution yields the probability that the system is in a given state over time. For circuits with large numbers of molecules, a standard deterministic simulator based on mass action kinetics can typically be used. However, for circuits with small numbers of molecules, such as most localized circuits, methods such as CME integration are significantly more accurate.

We illustrate our method using the running Join circuit example. The plots below show the results of the analysis assuming 10 copies of the Input1 and Input2 strands and 100 copies of the Join and Reporter complexes. The plots show a timecourse (left) of the mean (solid line) and standard deviation (shaded region) of the Input1 (red), Input2 (green), Output (blue) and Signal (yellow), and a heatmap (right) of the full probability distribution for the Signal strand over time, together with a histogram of the probability distribution at the final time point.

Two graphs. Left. A line graph of individuals versus the time. The 4 lines have mixed trends. Right. A heat map of individual versus the time in seconds. The highest range scale exhibits an upward trend.

This method is particularly well-suited to analyzing localized molecular circuits, since each location typically contains a small number of molecules. In the example above, even though only 10 copies of each input are present initially, the state space consists of 286 states, including a single terminal state, and 1100 transitions between these states. We analyzed a localized circuit for computing the square root of a four-bit number, proved its correctness with respect to its functional specification and analyzed the extent to which the localized design improves both speed and scalability in comparison to well-mixed circuits [60].

1.3.6 Verification of CRN Equivalence (2015)

A key goal in computer-aided design is to formally verify the correctness of systems in a modular way, such that verified components can be combined to build more complex systems that are correct by construction. We have illustrated how nucleic acid circuits can be modeled as CRNs that can be simulated and analyzed. However, CRNs are also a powerful means of specifying the intended behavior of nucleic acid circuits, where a high-level CRN specification is compiled to its low-level nucleic acid implementation [18]. An important challenge in the field has been to verify that a nucleic acid circuit is a correct implementation of a CRN specification.

We developed a technique for proving correctness of CRN implementations [61] based on the concept of serializability from database theory, which requires that interleaved concurrent updates to a database must be equivalent to some serial schedule of those updates. While this proof technique has not yet been implemented in the Visual DSD system, we can demonstrate its application using our running Join circuit example, by analyzing the state space diagram of the circuit from Sect. 1.3.1. We can prove that the Join circuit is a correct implementation of the CRN specification Input1 + Input2 \(\longrightarrow \) Signal, where the species in this CRN denote the formal species of the Join circuit, and the Join and Reporter species denote the fuels of the circuit, defined as any species that must be present initially in order for the chemical reactions in the circuit to run to completion. The remaining species of the Join circuit are defined as either intermediates or waste. The proof relies on demonstrating that any trace generated by the circuit can be rewritten to produce a serial trace which corresponds to a valid execution of the CRN specification. For this simple example we can show that the state space from Sect. 1.3.1 satisfies the required conditions for correctness from the theorems proved in [61], including that the terminal state is universally reachable and that every trace from the initial state has a commit reaction, which is an irreversible step where all formal reactants have been consumed before any formal products are produced. In this case the rightmost transition denotes a commit reaction. Thus, the Join circuit is a correct implementation of the CRN specification. Furthermore, the correctness is preserved in arbitrary contexts provided certain constraints are satisfied, including that the only shared species are formal species, waste, and certain intermediates where the gate design can tolerate the presence of additional copies of that species. This obviates the need to check the correctness of the circuit under different initial conditions or when it is composed with other circuits, which is a key advantage of our approach. We used this method to verify two different nucleic acid implementations of a distributed consensus algorithm, specified as a CRN with multiple reactions, where each reaction implementation was verified separately and composed in a modular fashion.

More recently, the Winfree group has reported a number of proof techniques based on pathway decomposition [62] and bisimulation-based approaches [63], which have been integrated into the Nuskell CRN-to-DSD compiler [64]. This provides an integrated system for designing and verifying CRN translation schemes. We refer the reader to Sect. 2.2 for further discussion of this work.

2 Present

2.1 Logic Programming Framework

As nucleic acid circuits continued to increase in complexity, we sought to develop a unifying framework that could support not only complex nucleic acid topologies but also both DNA and RNA enzymes. In addition, we sought to develop a system that was flexible enough to encode a range of alternative modeling hypotheses and to readily incorporate new dynamic DNA nanotechnology implementation strategies developed in the future. This led us to rebuild the Visual DSD system based on a logic programming framework [50], which we refer to as Logic DSD. Importantly, Logic DSD subsumes and unifies all previous syntactic and semantic extensions to the Visual DSD system, by implementing a single rule-based abstraction. The challenge in this work was to develop a modeling language that not only embodied a specific semantics but was also sufficiently expressive to allow new user-specified semantic rules to be defined within the language itself.

Logic programming is a powerful computational paradigm that originally found favor in the fields of knowledge representation and artificial intelligence. It allows the programmer to implement arbitrary rule sets in a framework that encapsulates the assumptions underlying their particular experimental system. Logic programs consist of collections of facts and clauses, which can be used in a proof search procedure to determine whether user-provided queries are satisfiable given those facts and clauses. As an example, the logic program

A program code reads human open bracket, Socrates enclosed in double quotation mark symbol, close bracket dot mortal of X colon hyphen human of X dot.

specifies that Socrates is human (a fact) and that if X is human then X is also mortal (a clause). These can be used by a resolution engine to prove that Socrates is therefore mortal. In Logic DSD we use logic programming over strand graph representations of nucleic acid nanostructures, by generalizing the strand graph syntax developed previously [45]. This enables models to be compiled by proof search on semantic rules written in a logic programming language, which encodes specific user assumptions about how reactions and their rates are generated [50].

In Logic DSD, instead of the compilation rules being hard-coded within the language itself, the language includes a very general set of rules for proof search over nucleic acid structures [50]. This required the development of an equational theory of strands, which is a notion of equality between species that is required so that the system can determine the equivalence of candidate structures and those represented in the rules. This means that the rules that specify the desired semantics are provided as part of the program, alongside the circuit design. We define a default set of rules that corresponds to the semantics of previous versions of the Visual DSD system, however these can also be replaced by custom rules written by the user. This allows the user to potentially use different sets of rules for different types of circuits, depending on the modeling assumptions and implementation strategies, such as whether specific enzymes are present.

We illustrate Logic DSD using the running Join circuit example. The program code is the same as the strand graph code from Sect. 1.2.6, copied below for convenience:

A graphical representation of the Logic D S D with the help of Join circuit example.
A textual representation illustrates the logical rules between two strands.

In addition, this code now needs to be accompanied by logical rules that define its semantics. The rules make use of contexts that are matched to the program in order for the rules to be applied. A given process can be matched to a context with N holes, written \(C[]_1 \ldots []_N\), where each hole represents a part of the structure that is matched to a rule and can be subsequently modified when the rule is applied.

Each hole in a rule is filled by a pattern, which can be one of the following: a strand \(\mathtt{<S>}\) containing a sequence S of domains or logical variables that match a domain; the 5’ end of a strand \(\mathtt{<S}\); the 3’ end of a strand S>​; a sequence S of domains or logical variables; or a nick \(\mathtt{S1> | < S2}\) between two strands, which denotes a break. When a context is matched to a process, the patterns in the holes are matched to the corresponding parts of the process and the variable C is matched to the rest of the process. This combination of contexts and patterns allows strand graph rewriting rules to be expressed. In addition, the use of predicates allows highly expressive conditions to be defined, which need to be satisfied in order for the rule to be applied.

We illustrate two of the main rules and their application to the Join circuit example below, and refer the reader to [50] for complete definitions. The following bind rule defines the semantics of binding, and is accompanied by a corresponding graphical representation of the rule and its application to the Join circuit example:

Four lines of text illustrate the bind rule with two processes such as P 1 and P 2. Processes include domains such as D and D dash.
A graphical representation of the bind rule with respect to Join complex with domains such as D and D dash.

When this rule is applied to the Join circuit example we have the following matches, indicated by boxes with dashed lines, which allows binding to take place on the toehold tb^:

A text reads, P 1 = C 1, t b circumflex in square brackets, P 2 = C 2, t b circumflex asterisk in square brackets, Q = C 1, t b circumflex exclamatory 5 in square brackets, vertical bar, C 2, t b circumflex asterisk exclamatory 5 in square brackets.

The rule specifies that two processes P1 and P2 can bind provided P1 contains domain D and P2 contains domain D’ such that D and D’ are complementary, written compl(D,D’). If so, the resulting process Q replaces the matched domains with corresponding bound versions D!i and D’!i, respectively, and places the two processes in parallel, where i does not occur anywhere else in the process, written  freshBond(D!i, P1|P2). The built-in reaction predicate is used as input to the built-in reaction enumeration algorithm, which generates a CRN in a similar fashion to the previous algorithm. Here the reaction predicate indicates that the reactants are processes P1 and P2, the product is Q and the rate of the reaction is kb. Note that in this case the rate of binding is fixed, however the language also allows arbitrary functions to be used to compute the rates, including associating the rate to the toehold and its surrounding context, such as whether it is at the 3’ or 5’ end of a strand [50].

Similarly, the following displace rule defines the semantics of strand displacement:

Four lines of text illustrate the displace rule along with the D and D dash domains.
A graphical representation illustrates the displace rule with the rate of the reaction k d.

Where the ordering of the holes in the graphical representation is from left to right then top to bottom. When this rule is applied to the Join circuit example we have the following matches:

Two lines of text illustrate the resulting matches of P and Q with displace rule applied to the Join circuit.

This approach essentially replaces the operational semantics described in Sect. 1.1.1 with an executable logic programming encoding that can be directly edited by the user and saved as part of the Visual DSD program.

More generally, we used this method to encode nucleic acid implementation strategies that make use of polymerase, exonoclease, and nickase enzymes. Our approach greatly simplifies the encoding of enzyme-based systems, by avoiding the need to manually encode each enzyme operation for all possible species. We also encoded implementation strategies that rely on localization to a substrate and complex nucleic acid topologies including branches and pseudoknots [50].

The set of compilation rules that can be expressed in this framework is extremely broad, as logic programming systems of this kind are known to be Turing-complete. This means that future experimental developments in dynamic DNA nanotechnology can be accommodated without rewriting the core of the Logic DSD system, simply by writing a new set of rules. In practice, however, expensive computations such as detailed biophysical simulations will likely prove impractical if used in rule definitions. Furthermore, proof search in logic programming is highly parallelizable, making it conceptually straightforward to harness the power of massively parallel computational hardware or cloud services to scale up proof search. In this way, the current version of the Visual DSD system could serve as a foundation for the design of future dynamic DNA nanotechnology systems based on experimental techniques that have yet to be conceived.

2.2 Related Work

Throughout the time that the Visual DSD system was under development, a number of groups were developing computational design tools for DNA circuits. These approaches, and their temporal relationships to each other and to our own work, are summarized in Fig. 1. For a more detailed comparison of domain-specific languages for DNA circuit design we refer the reader to a review article on the topic [51]. Perhaps most notably, the Winfree group developed a range of tools over a number of years that have been integrated into automated pipelines for computational nucleic acid circuit design. For example, the Piperine design pipeline [68] enables the high-level design of CRN-based circuit architectures [72] and their subsequent compilation into lower-level DNA strand displacement implementations and ultimately into nucleotide sequences suitable for experimental testing.

Fig. 1
A timeline of the evolution of the visual D S D system. 2001, early domain-level simulations, 2011 starts with the evolution of visual D S D software tool, then, in 2018, S M T-Based analysis of molecular geometry begins. In 2020, peppercorn reaction enumerator work is processed.

Timeline of developments of the Visual DSD system and related work for high-level design and analysis of nucleic acid circuits. Red: Evolution of DSD syntax and semantics. Blue: Analysis techniques applied to DSD models. Green: Related work

The Nuskell system [64] is a compiler that converts abstract CRN designs into strand displacement implementations using CRN compilation schemes that can be specified by the user using a built-in domain-specific language for this task. The Nuskell system also incorporates verification capabilities, using approaches based on pathway decomposition [62] or bisimulation [63] to formally check that the generated strand displacement network is a correct implementation of the original input CRN in a rate-independent sense. This required the definition of a semantics for the domain-level representation so that a correspondence between domain-level species and the abstract species of the input CRN can be rigorously stated and verified. Nuskell uses the Peppercorn software [71] to enumerate non-pseudoknotted structures at the domain level. Peppercorn uses a set of rules similar in spirit to those of the Visual DSD system but expresses them in a pattern-based notation similar to “DU+” used by NUPACK [73] for concise specification of secondary structures. Of particular interest in Peppercorn is its explicit support for condensing reaction networks into simpler versions, in which multiple microstates can be combined into a single “resting state” connected only by relatively fast reactions. This is a more flexible approach to reaction merging than the system that we initially developed as part of the hierarchy of abstractions used in previous versions of the Visual DSD system [19].

Other related tools include the Multistrand stochastic simulator that simulates DNA nanostructures at the single-base level [67, 74] and the KinDA [70] system that predicts kinetics and thermodynamics of domain-level designs after specific nucleotide sequences have been assigned to those domains. The Seesaw compiler developed by the Qian group [69] enables digital logic circuits to be compiled into seesaw gate strand displacement circuits [17] and enables circuits to be implemented using even unpurified oligonucleotides. The DyNAMiC Workbench system [66] was developed as a web-based tool to target the “port-based” design abstraction developed for hairpin assembly systems by Yin et al. [23]. Finally, a range of design tools have also been developed specifically for the enzyme-driven PEN toolbox system [41], including several that automate various aspects of network design [42, 43].

Taken together, these examples illustrate the evolution of computational tools, including those based on the Visual DSD system, over this period of rapid growth in the capabilities and complexity of molecular computing circuits that could be implemented experimentally using dynamic DNA nanostructures.

3 Future

We conclude with some thoughts on what the future may hold for the computational design of nucleic acid circuits, looking ahead to the next 40 years of dynamic DNA nanotechnology at the intersection of computational tools, laboratory experiments, and biomedical applications.

Fig. 2
A chart of three columns such as computational tool integration, experiment integration, and computational design for practical application, with an interlinked list of potential future developments.

Potential future developments for computational modeling in dynamic DNA nanotechnology, with some key dependencies highlighted as arrows

3.1 Computational Tool Integration

Enhancing the integration of computational tools has the potential to substantially improve the design of dynamic DNA nanotechnology systems. For instance, the Winfree group and others have integrated multiple tools that span a range of activities required to design and implement an experimental system, via the development of pipelines such as Nuskell [64] and Piperine [68]. More generally, this highlights the power of approaches in which a number of distinct categories of tools, such as nanostructure design tools [75], coarse-grained simulation tools [76], sequence-level design and analysis tools [70, 73, 77,78,79], CRN compilers [64, 68], reaction enumerators [6, 50, 71], and verification tools [62, 63] can be used in conjunction to analyze different aspects of system design.

Over time, tool integration will likely move from an ecosystem on a local machine to a cloud-based framework with interoperable services. The Visual DSD system adopted a web-based approach, with the most recent versions taking advantage of JavaScript compilation to allow execution directly in a browser on the client machine, eliminating the need for server compute infrastructure. While convenient for deployment, this limits the user to the power of their equipment. To facilitate remote execution on a cluster, a command-line executable of Visual DSD was also developed. While this facilitated increased computational performance, it came at the cost of usability, where the web-based graphical user interface was replaced with a command-line text interface, and integration with computational infrastructure needed to be manually configured depending on the infrastructure being used. To enable both usability and scalability, we envisage a graphical user interface with a built-in option to either run locally or run on the cloud through a simple checkbox, unleashing the power of the cloud on demand without compromising ease of use. As more tools continue to move to the cloud, software integration will likely take place via web services that scale according to the computational resources required (Fig. 2).

To enable tools from multiple groups to interoperate seamlessly, whether through local interfaces or cloud-based web services, common data interchange formats will also be needed. For example, the Synthetic Biology Markup Language (SBOL) [80] has been developed for the synthetic biology community to share genetic circuit designs and is part of a family of interchange formats for systems and synthetic biology [81]. The Systems Biology Markup Language (SBML) [82] provides a common interchange format for kinetic models expressed primarily as chemical reaction networks. To date, there are limited interchange formats for dynamic DNA nanotechnology. In future, as DNA nanotechnology and synthetic biology continue to converge, there may be an opportunity to extend the SBOL standard, or to create new standards based on existing formats such as the dotparen notation for secondary structure or Cadnano JSON files for structural DNA nanotechnology designs. In addition to ongoing grassroots efforts, existing organizations such as the International Society for Nanoscale Science, Computation, and Engineering (ISNSCE) could play a coordinating role.

There is also an opportunity to facilitate the development of computational tools for DNA nanotechnology through better programming language integration. Recent work on the NUPACK system included an application programming interface (API) for the Python programming language [77], thereby enabling programmatic access to tool features using a general purpose programming language. We have previously argued [51] that domain-specific programming languages offer powerful tools for the development of design systems for particular application domains, such as dynamic DNA nanotechnology. This view is not inconsistent with the desire for an integrated toolchain glued together by a general purpose scripting language such as Python. In Visual DSD, improved integration could be achieved by adding the ability to call out to external functions, for instance to determine rate constants and to enumerate reactions based on the output of external solvers such as the Z3 SMT solver [53], nucleic acid design tools such as NUPACK [78, 79], or coarse-grained biophysical modeling tools such as oxDNA [76]. From a language implementation perspective, this would require a foreign function interface (FFI) to retrieve the results of calls to external tools: this could build on previous work on functional logic programming systems [83] that integrate logic programming-style proof search with function calls. From a computational feasibility perspective, oxDNA simulations are often time consuming and can sometimes take several days, so relationships would most likely need to be determined first by oxDNA or experimentally and then encoded in the logic programming framework.

Finally, the potential for machine learning to be applied to biodesign has been widely acknowledged [84], including for example the opportunity to use machine learning tools to predict DNA interactions from sequences[85, 86], using both feature-based and recurrent neural network-based approaches, and incorporating domain knowledge into the models. The significant computational power of modern GPUs could also be harnessed for computational design and simulation of DNA nanotechnology systems, as demonstrated by recent work applying differentiable programming to design gene regulatory networks [87]. As mentioned above, cloud computing resources could be harnessed for computational design by exploiting the highly parallelizable nature of proof search in logic programming systems such as Logic DSD.

3.2 Experiment Integration

Experimental techniques in DNA nanotechnology have continued to rapidly advance in the decade since Qian and Winfree reported their seminal “four-bit square root” strand displacement circuit [7]. As things stand, sequence design tools such as NUPACK [78, 79] are almost indispensable for designing such circuits. As circuit complexity continues to increase, computational design tools at the domain level will become increasingly valuable for debugging failure modes and tuning system dynamics. When it comes to analyzing error modes of these systems, such as unintended leak interactions between DNA strands, already we are reaching the limits of what is possible computationally, even for very simple circuits. For instance, leak analysis in Visual DSD can generate thousands of unintended reactions even for very simple circuits, due to the combinatorial explosion of variant structures produced via leaks, which can themselves undergo further leak reactions, recursively. Recent promising developments have included the design of circuits that resist leak [88], which could facilitate the implementation of substantially larger circuits. More generally, the need for higher-level computational design tools will likely only grow as circuit complexity increases due to further improvements in experimental techniques.

Furthermore, recent developments in areas such as enzyme-driven DNA circuits [40] hint at the future convergence of DNA nanotechnology with synthetic biology, which will likely open new areas for modeling the interactions of DNA-based molecular devices with biological and biochemical components, including in living cells. In terms of keeping pace with such developments, one goal of the newest Logic DSD system [50] is to provide headroom by adopting a Turing-powerful language in which to express the semantic rules used for compilation of structural models into kinetic ones. In principle, end-users can reprogram the behavior of this DSD compiler to implement the specific assumptions of their experimental system, which is embodied by the user-supplied rules rather than hard-coded into the semantics of the DSD compiler itself.

In the longer term, there is a need to not only integrate the different modeling methodologies but also to track the evolution of knowledge over time, in the form of computational models. Previous work has investigated training models of nucleic acid circuits through parameter inference methodologies [10, 30]. In future we envisage an approach where computational models are at the center of the Design-Build-Test-Learn (DBTL) cycle, and continually updated over time as new experiments are performed.

We also note the substantial potential for the integration of new and future experimental techniques for high-throughput testing and readout of DNA-based molecular circuits. Most notably, high-throughput sequencing methods and nanopore sequencing may enable large-scale interrogation of the dynamics of DNA circuits, which has previously been limited to the observation of a relatively small number of signals due to spectral overlap issues inherent in the use of fluorescent reporters. Similar techniques could be used for larger-scale data collection, enabling more data to be gathered on the internal states of DNA circuits during execution.

Future directions in experimental DNA nanotechnology could facilitate system design by improving the correspondence between computational modeling and experimental reality. For instance, a key issue in computational modeling of experimental systems is ensuring that the initial conditions of the experiment match the initial conditions of the simulation. In previous work, this has been achieved in a number of ways, including explicitly modeling some fraction of inactive gates to account for incorrectly assembled gates [10] or by adjusting the initial concentrations of components in the experimental system to make the fraction of active gates match that required by the circuit design [69]. Perhaps a future research challenge in experimental DNA computing could be to aim toward “Angstrom resolution” in dynamic DNA nanotechnology, similar to the goal set for the field of structural DNA nanotechnology [89]. This might be a more challenging task as it would involve not just precision in placement of atoms in the initial state of the system but also in the programming of the subsequent pathways taken when the system computes over time. A possible initial goal would be to reliably construct just the initial state of a system, in terms of filtering out incorrectly synthesized or misfolded structures at the single-molecule level. At present, this is largely done by annealing followed by manual purification via polyacrylamide gel electrophoresis (PAGE). However, this does not guarantee that every gate in the sample will necessarily be correctly formed, opening up the possibility of errors. Future advances in single-molecule analysis, such as imaging or nanopore sequencing, could enable structures the size of currently-used DNA strand displacement gates to be analyzed, and perhaps even sorted, to produce highly pure samples for use in molecular computing reactions that can more faithfully reproduce the starting conditions assumed in the models used for computational design. Alternatively, high-fidelity biological synthesis of both DNA [10] and RNA [90] strand displacement components could improve the purity of experimental systems.

3.3 Computational Design for Practical Applications

A key challenge for molecular computing as a field is to successfully demonstrate practical applications for the technologies that have been developed over recent years. One possibility here is the use of dynamic DNA nanotechnology within living organisms to sense and control biological networks [91, 92]. These could be used for cell labeling [93], for diagnostic applications [94], for therapeutic effects (e.g., by silencing specific genes) [95], or even to build theranostic (therapeutic and diagnostic) systems that autonomously detect and treat diseases. Such applications in nanomedicine would exploit the innate biocompatibility of DNA nanotechnology to carry out computation within living cells, where traditional silicon-based microprocessors cannot operate.

Previous work on implementing strand displacement reactions in living cells [92, 96, 97], including more recently using heterochiral DNA nanotechnology [98, 99], brings this possibility closer. However, even with chemical modifications added to the DNA, the interactions of DNA-based molecular computing systems with living cells are complex. This highlights a potentially fruitful avenue of future work in the modeling and design of computational nucleic acid systems: to interface nucleic acid circuit models with whole-cell models of cellular processes [100], including predicting, and designing against, degradation of circuit components by nuclease enzymes and other forms of interference.

Finally, over the next 40 years we anticipate that the fields of DNA nanotechnology and synthetic biology will continue to converge. In particular, CRISPR/Cas systems for RNA-guided gene editing [101] and trancriptional regulation [102] have recently been integrated with strand displacement reactions for programmable control of CRISPR targeting [103,104,105,106]. Similar approaches have used strand displacement reactions to regulate translation via RNA-based “toehold switches” [107, 108] and transcription via small transcription-activating RNAs (STARs) [109]. More generally, the control of biological systems via “output interfaces” such as the CRISPR/Cas system is likely to be a substantial growth area for applications of dynamic DNA (and RNA) nanotechnology. We anticipate further opportunities to apply rule-based modeling tools such as Visual DSD to hybrid systems that span both fields. We also anticipate that integrating computational design tools for DNA circuits with tools specific to the application domain will be critical, paving the way for new and exciting applications of DNA nanotechnology over the next 40 years.