Sequenceable Event Recorders

Cardelli, Luca

doi:10.1007/978-981-19-9891-1_17

Luca Cardelli⁷

Part of the book series: Natural Computing Series ((NCS))

2193 Accesses

Abstract

With recent high-throughput technology, we can synthesize large heterogeneous collections of DNA structures and also read them all out precisely in a single procedure. Can we use these tools, not only to do things faster, but also to devise new techniques and algorithms? In this paper, we examine some DNA algorithms that assume high-throughput synthesis and sequencing. We record the order in which N events occur, using $N^2$ redundant detectors but only N distinct DNA domains, and (after sequencing) reconstruct the order by transitive reduction.

You have full access to this open access chapter, Download chapter PDF

1 Introduction

With recent high-throughput technology, we can synthesize large heterogeneous collections of DNA structures [7, 13] and also read them all out precisely in a single procedure [10]. This contrasts with the older practice of assembling structures one at a time and of reading them out individually (e.g., by fluorescence), or reading them together ambiguously (e.g., by gel electrophoresis). Can we take advantage of these high-throughput and high-precision technologies, not only to do things faster but also to devise new techniques and algorithms? In this paper, we examine some DNA algorithms that assume both high-throughput synthesis and high-throughput sequencing: they would not be very practical otherwise.

A sequence ‘s’ of DNA nucleotides hybridizes (forms a double strand) with its reverse Watson-Crick complement denoted ‘s*’; we write the resulting double strand as ‘s’. Subsequences of ‘s’ are called domains provided they are independent of each other, that is, provided that differently identified domains do not hybridize with each other, or with significantly long parts of each other [16]. Under normal laboratory conditions, a domain ‘a’ is called short if it hybridizes reversibly with ‘a*’, and long if it hybridizes irreversibly with it.

A short single-stranded domain ‘t’, called a toehold, followed in the same sequence by a long single-stranded domain ‘a’ can initiate strand displacement. This is the process (later detailed in Fig. 3) where a single-stranded sequence ‘ta’ hybridizes to a double strand composed of ‘t*’ attached to the bottom strand of a double-stranded ‘a’. The invading ‘ta’ can displace and possibly replace the existing ‘a’ domain of the double strand through a random-walk competition between the two ‘a’ domains hybridizing to the same ‘a*’.

A nick is an interruption in one of the two strands of a double strand, at the boundary between two domains. By cascading short and long domains, occasionally separated by nicks, we can achieve multi-step strand displacements where the whole sequence of displacements can itself be reversible or irreversible. This way we can emulate reversible and irreversible chemical reactions [14] and other computational abstractions [9].

The readout of the outcome of such sequences of displacements is often done by fluorescence. Fluorophore/quencher pairs are attached to some domains that participate in the reactions, those in particular whose displacement indicates that a significant event has occurred. The displacement separates the fluorophore from the quencher and hence induces visible fluorescence. This provides a real-time account of the computation, but the readout capability is restricted: a limited number of separate events can be detected by using different fluorescence colors. This is analogous to debugging a program by inserting a limited number of print statements at a time, each one printing a single letter.

Another way of achieving a readout is via gel electrophoresis, to distinguish the sequences in a solution by their length at the end of the experiment (or at predetermined time points). Many different sequences can be identified provided they have different lengths (masses) and provided that we know their length ahead of time. Unexpected lengths can be hard to identify. This is analogous to debugging a program by using control flow counters to tell us how many times each routine is invoked, or each structure is accessed, without any insight about the order of events.

Finally, and especially with more recent high-throughput technology, we can obtain a readout by ligating the nicks and sequencing all the strands in the solution at the end of the experiment (or at predetermined time points). With high-throughput sequencing, we can inspect potentially the entire composition of the solution. The debugging analogy now is that of taking a core dump: analyzing in complete detail the entire state of a computation, but only infrequently or at the end, and again without any obvious insight on the order of events that occurred.

The order of events is usually of great interest: for example, multiple laborious gene knockout experiments are frequently carried out to determine the order of gene activations. What if we could instead take a single core dump that tells us the order of all the events of interest? To that end, we should record the order of events within the state of the system, so that we can inspect such recording at the end as part of the core dump. Assuming high-throughput sequencing, we can embed a large amount of information within the solution. We are going to assume that we can embed $N^2$ pieces of information, where N is the number of events of interest. This seems achievable for reasonably small N while providing a lot of information, encoding for each event whether it happened before, together, or after any other event. Each one of the $N^2$ event order detectors is a structure that accepts inputs but does not produce outputs: when it detects certain conditions, it locks down in a stable state and waits to be sequenced later.

Our strategy is therefore to embed a preorder (a reflexive and transitive relation) of events within the solution. This is a pre-order because we may not be able to detect the precise order of two events if they happen very close to each other, in which case both directions are recorded. With $N^2$ detectors, we can determine the order of any pair of events without needing to coordinate the detectors with each other or with a central structure; hence, each detector can be relatively simple. An alternative is to use only N detectors that sequentially add records to a central tape, but this requires a way of guaranteeing atomic access to the tape [9]. Still, event recorders of the tape variety, readable by sequencing, have been nicely demonstrated using natural DNA and protein mechanisms [12, 15].

A preorder is not the entire history of a computation. We are considering the preorder of first-occurrence of events: any subsequent occurrences for the same signal are not recorded. This limited information can still provide support for causality: if an event always precedes another event over a number of runs, then this supports the first event causing the second, or having a common cause.

In the rest of this paper, we aim to describe the architecture of such a preorder recorder, using DNA strand displacement technology, slowly building up from simpler problems. A property of all the designs in this paper is that (apart from the single-stranded input signals) all DNA structures are nicked double strands with no additional modifications or secondary structure. Therefore, the required and potentially large numbers of components can be fabricated by bacterial cloning as a single or a few long DNA double strands, followed by enzymatic cutting and nicking [4] (see Appendix). Other technologies for high-throughput synthesis of large heterogeneous libraries exist [7, 13]. Thus, we rely on both high-throughput synthesis for producing the $N^2$ detectors and on high-throughput sequencing to read them out.

2 Occurrence Recorder

We begin by investigating the simplest event recorder: recording the occurrence of events at any time during an experiment. By an ‘event’ here, we mean the appearance of a whole population of identical molecules and in fact a specific structure of molecules that can be uniformly identified. Any event that does not fit that description must first be transduced into one of these uniform molecular structures. By a signal, we mean a population of one such molecular species over time, and by an event, we mean the appearance of a signal population (we do not detect the disappearance of a population).

In discussions, we summarize DNA structures by a textual notation. In addition to lowercase letters like ‘a’ for single-stranded long domains, and underlined letters like ‘a’ for the corresponding double-stranded long domains, a single short domain is used for all toeholds: ‘–’ is an open (i.e., un-hybridized) toehold on a single strand or on the upper strand of a double strand, ‘$\text {\_}$’ is an open toehold on the lower strand of a double strand, and ‘–’ is a covered (double-stranded) toehold. A sequence of domains on a double strand with an initial open toehold and an intermediate covered toehold looks like ‘$\underline{\,\,\,\text {a}{\textbf {-}}\text {b}}$’. This summary notation omits information about nicks, which are instead detailed in corresponding figures. Note that, before sequencing, all the open domains should be complemented, and all the nicks should be ligated.

Figures instead depict the corresponding single and double strands graphically (e.g., Fig. 1). A domain is a short or long sequence of dashes ‘-’ with domain delimiters ‘$\mathtt{>}$’ and ‘$\mathtt{<}$’ pointing in the 5’-to-3’ direction to indicate either a nick (an interruption in the strand) or the 3’ end, and ‘+’ to indicate the 5’ end or the logical boundary of a domain (not a nick). The name of a domain is a lower-case letter placed on top of the upper strand, with implicitly the reverse complement domain on the lower strand. All toeholds are the same sequence: they have a blank name. Reversible reactions are ‘$\mathtt{<=>}$’ and irreversible reactions are ‘$\mathtt{=>}$’.

2.1 Yes Gate

The events that we want to detect are represented by single-strands ‘–a’ each consisting of a (short) toehold ‘–’ attached to a (long) domain ‘a’. If ‘–a’ is ever present, we want to know about it: this is the purpose of the Yes gate for ‘a’.

First (Fig. 1) let us consider the traditional way of detecting ‘–a’. A double-stranded structure ‘$\underline{\,\,\,\text {a}{\textbf {-}}\text {q}}$’ with an open toehold ‘_’ accepts the single-strand ‘–a’ (reversibly) and opens up another toehold, yielding ‘$\underline{{\textbf {-}}\text {a}\,\,\,\text {q}}$’. That structure then locks down (irreversibly) by combining with an auxiliary single-strand ‘–q’ to produce the fully hybridized ‘$\underline{{\textbf {-}}\text {a}{\textbf {-}}\text {q}}$’ and the toehold-free ‘q’.

If we attach a fluorophore (F) and quencher (Q) pair at the right end of ‘$\underline{\,\,\,\text {a}{\textbf {-}}\text {q}}$’ (and not at the end of ‘–q’), we can detect the occurrence of ‘–a’ because it separates F from Q and causes visible fluorescence. However, if we were to (ligate and) sequence the solution, it would be difficult or impossible to tell the difference between the initial and final state, because they differ only by open toeholds and by the positions of nicks that are erased by ligation.

An illustration represents 3 step reaction of domains. The first 2 are reversible reactions. The reaction from 2 to 3 is irreversible. In total, there are 2 single strands labeled, a and q, and 1 double strand labeled, a q, in each step. The double strands are F and Q in the first 2 reactions. — **Fig. 1**

Let us now consider (Fig. 2) an additional domain ‘r’ that will help us tag the desired outcome. The ‘–q’ single-strand is replaced by a ‘$\underline{{\textbf {-}}\text {qr}}$’ double strand, but with a nick on the bottom between ‘q’ and ‘r’.^{Footnote 1} The first reaction is the same as before, but the second reaction is now a 4-way strand displacement^{Footnote 2} (Fig. 3 right). This detector is non-catalytic: it captures some of the ‘–a’ strands and releases ‘a–’ strands (which are usually harmless).

If this gate is triggered, then the main outcome is ‘$\underline{{\textbf {-}}\text {a}{\textbf {-}}\text {qr}}$’, which is a nicked but fully complemented double strand: it is ready for ligation and sequencing. If the gate is not triggered, then the outcome is the initial ‘$\underline{\,\,\,\text {a}{\textbf {-}}\text {q}}$’ which is distinguishable after sequencing.^{Footnote 3}

2 illustrations. 1, represents a 3 way sequencing of strands labeled a, a q. 2, presents a 4 way sequencing of strands labeled a q q r. — **Fig. 3**

For a catalytic version (one that does not sequester the input), consider the design in Fig. 4: we add two more structures to Fig. 2 that absorb the ‘a–’ that was left over and convert it back to a free ‘–a’. Such catalytic irreversible gates avoid sequestering weak signals, while being fully activated by weak signals, leading to robust detection (if the signals are not drained too quickly by downstream processing).

2.2 Occurrence Recorder Algorithm

We can use Yes gates to detect a collection of signals in an experiment via a single high-throughput readout: we prepare a Yes gate detector for each signal, we mix them in at the beginning, and we sequence the entire solution at the end, revealing any detectors that have fired.

3 Coincidence Recorder

We now move to a more interesting task: detecting the simultaneous presence of signals. The idea enabling the sequencing-based readout of gates, and in particular the novel use of 4-way displacement, is due to Chen and Seelig [5] (the Yes gate of Fig. 2 is also a special case of this). Their design was originally meant as an AND gate made of a sequenceable Join part accepting inputs, and a sequenceable Fork part producing outputs. We are going to use just a sequenceable Join half to detect the simultaneous occurrence of any pair of signals in a given set of signals, relying on high-throughput sequencing to inspect all possible combinations.

3.1 Join Gate

The design in Fig. 5 is rooted in a fluorophore-oriented Join gate, along the lines of Fig. 1, which ultimately comes from [11] and [2]. However, again here we want to find a sequencing-friendly version, where the initial structures ‘$\underline{\,\,\,\text {a}{\textbf {-}}\text {b}{\textbf {-}}\text {q}}$’ and ‘– qr’ with input signals ‘–a’ and ‘–b’ are sequencing-distinguishable from the final structure ‘$\underline{{\textbf {-}}\text {a}{\textbf {-}}\text {b}{\textbf {-}}\text {qr}}$’, which indicates that both signals were present at the same time. The gate locks down when the two signals are received in turn. If one signal appears first and persists until the second arrives, this gives the same result as both signal appearing together. If one signal is removed before the other one appears, the gate reverts and the result indicates no co-occurrence. The gate can be made more kinetically symmetrical by mixing Join(a,b) with Join(b,a).

An illustration represents 4 step reaction of domains. The reactions from 1 to 2 and 2 to 3 are reversible reactions. The reaction from 3 to 4 is irreversible. In total, there are 2 single strands labeled, a and b, and 2 double strands in each step. — **Fig. 5**

As in the previous case, we can add structures to this gate that convert it to a catalytic gate. But we need to handle the two signals together in the additional structures, because ‘a–’ must be able to revert to ‘–a’ when ‘–b’ is not present. Hence, we use the binary structure in Fig. 6 for distinct a,b. This structure cannot coexist with a catalytic Yes(a) as it would lock down Join(a,b) on the first input: a later non-coincident ‘–b’ would give a false positive. Join(a,a) must not have the additional catalytic structures for the same reason: it is best to replace it with a non-catalytic Yes(a).

An illustration represents the first and last reactions in a multi-step reaction of domains. The first reaction and subsequent reactions are reversible reactions. The reaction from the penultimate to the ultimate reaction is irreversible. The reactions contain 2 single strands sequenced as, a and b, and 2 double strands sequenced as v b a and v. — **Fig. 6**

3.2 Coincidence Recorder Algorithm

We can use Join gates to detect the simultaneous occurrence of any pair of distinct signals in a collection: we prepare a Join gate detector for each such pair, we mix them in at the beginning, and we sequence the entire solution at the end, revealing any detectors that have fired. If we detect Join(a,b) and Join(b,c), we can deduce the coincidence of a and c, and we should also detect Join(a,c): that redundancy serves as a crosscheck. We could use fewer Join gates, but if we did not include the transitive Join(a,c) and b never came, we would not detect the coincidence of a and c.

4 Preorder Recorder

We now aim to build a device to record the order of occurrence of events in an experiment. The question is: given a set of events $a,b,c,d,\ldots $ that occur in some order, in what order did they first occur? If some events can occur together (up to experimental uncertainty), the relationship is a preorder: a reflexive and transitive relation. We want to reconstruct the temporal preorder of events from a single observation at the end of a run, with a single mass sequencing.

Such a preorder recorder would be useful for monitoring a process over time without sampling the system at multiple time points. Our recorder does not record timing and does not record sequence, but it records the first-occurrence preorder, storing it within the system itself. Recording the order, rather than the full timing of events, means that we need not use energy during periods of inactivity, and we need not worry about how often we should sample the system. The energy expenditure is all preloaded: no additional resources are needed no matter how long or complex the events history becomes, and there can be no ‘memory overflow’ of the recording. Repeated preorder experiments can build up evidence for causality, by observing which events always happen in the same order, independently of timing and other conditions.

The algorithm below uses a number of gates that is quadratic in the number of signals N but is independent of the observation time. After the initial setup, it requires no further energy because it reacts to signals and does not actively inspect the environment for their presence. More subtly, the algorithm uses a number of distinct domains that is just $N+4$ (+1 for toeholds). This is important to avoid crosstalk among domains, which becomes more difficult to avoid when we have more domains. The situation would be much worse if we needed $N^2$ distinct domains in addition to $N^2$ gates.

The coincidence recorder in the previous section was obtained by iterating a Join gate. For the preorder recorder, we iterate a choice gate, which we describe next. Instead of presenting directly the domain structures (for which there are multiple possibilities), we first describe abstractly how the choice gate behaves, and how the algorithm uses it. The DNA implementation is described later.

4.1 Choice Gate Specification

A choice gate is a two-input gate denoted a?b between input events a and b. As an abstract operator it is symmetric: $a?b = b?a$. Its desired behavior is as follows:

If a arrives no later than b, then a?b produces a distinct result that we indicate $a\le b$ or equivalently $b\ge a$.
If b arrives no later than a, then a?b produces a distinct result that we indicate $b\le a$ or equivalently $a\ge b$.
If a and b arrive together, then a?b produces a result that we indicate $a\sim b$ or equivalently $b\sim a$. (This is in practice an equal mixture of $a\le b$ and $b\le a$, or an unequal mixture if they arrive slightly offset.)
As a special case, if a ever arrives, then a?a produces a result $a\sim a$.

The three results between different a and b are assumed to be distinct and distinguishable by sequencing. Our algorithm requires only that there are three detectable final configurations: $a\le b$ and $b\le a$ depending on which of two inputs arrives first, and a mixture of the two, $a\sim b$, if they arrive together. We may further analyze the results quantitatively: a 100%/0% mixture of $a\le b$ and $b\le a$ indicates that enough of a arrived to exhaust the gate population before any of b (if any) arrived. Other mixtures may indicate how much events overlapped in time, their relative strength, or some confusion between those. Weak signals may appear to have arrived together.

There are many ways to achieve this specification, and we will discuss at least two. But first we describe the algorithm that uses these gates.

4.2 Preorder Recorder Algorithm

Suppose we have a (moderately large) set of events a, b, c, d, e, f, ..., like the occurrence of some mRNAs in a cell-free extract. They will activate in some order like b.cd.ae.d (b first, then cd together, then ae together, then d). We want to store that order as the events arise, and read it back at the end.

For N signals, we need $N^2$ distinct DNA structures: all the possible combinations of two signals, including all the x?x cases. We are not going to distinguish event sequences with repetitions and oscillations: we only look at the first occurrence of a signal. For example, the sequence b.b.b is the same as b for us, and a.b.a is the same as a.b (we can still tell that the first a arrived before the first b: the second a does not confound it).

We do not provide any external timing: there are no clocks needed to sample these signals over time, and there is no predetermined sampling frequency. We just need to assume that the sequence of events is slow enough. If it is not slow enough then a.b will look just like ab (in practice, the closeness of two signals will be reflected in the relative proportions of a $\le $ b and b $\le $ a, so we can still get some more information). The time resolution is thus determined by the speed of the DNA reactions. If they happen to be fast enough for the intended observed system, then sampling over longer time periods does not require any more gates or any more energy: the gates just naturally sit waiting for the signals to arrive.

The input to our algorithm is a preorder of signals, like a.bc.def.g that is occurring in real time in our experiment. We initially add to the solution all the choice gates x?y such that x and y range over all those signals (including $x=y$). At the end, we sequence all the leftover structures (e.g., $x\le y$) and we reconstruct the preorder from them. The process of reconstructing the preorder graph from what is essentially its reachability matrix is called transitive reduction and has the same complexity as transitive closure and matrix multiplication [1].

4.3 Crosstalking Choice Gate

We now describe a DNA implementation of the choice gate a?b. We discuss below how the gates crosstalk, and what are the consequences of crosstalking. But in summary, for our application, this implementation is sufficient, and it is also considerably more economical than a ‘proper’ non-crosstalking implementation.

The inputs are the usual two-domain signals with toehold on the left. For each abstract choice operator a?b, we use two pairs of double strands abbreviated as group [a?b| and group |b?a], with a?b = [a?b| + |b?a]. They are symmetric but different because [a?b| reacts to a ‘–b’ strand, while |b?a] reacts to an ‘–a’ strand. Conversely, [a?b| reacts also to an ‘a–’ strand and |b?a] reacts also to a ‘b–’ strand, through the same toehold but in opposite directions.

In Fig. 7, each of the primary structures (top) eventually binds to one and only one of the two end caps (bottom): we arbitrarily associate one end cap with [a?b| and the other with |b?a] (the square bracket indicates the side the end cap is with), so in fact a?b = [a?b| + |b?a] = [b?a| + |a?b] = b?a. The central portions with the ‘a’ and ‘b’ domains are surrounded by four fixed domains ‘s’, ‘p’, ‘q’, ‘r’: these are the same sequences for all the choice gates, regardless of variations in ‘a’ and ‘b’.^{Footnote 4} The nameless toehold is the same sequence everywhere.

2 illustrations. 1, is labeled, a, question mark b, in square and dash brackets. It presents 2 double strands sequenced as p a b q and s p. 2, is labeled with a question mark a, in square and dash brackets. It represents 2 double strands sequenced as p b a q and q r. — **Fig. 7**

If a signal ‘–b’ (with toehold on the left) binds to [a?b|, it blocks the toehold and displaces to the right. It also releases ‘b–’ (with toehold to the right), which goes to |b?a], again blocks the toehold there, and displaces to the left, catalytically releasing a copy of the original ‘–b’. The end caps can bind to the remaining open toeholds and lock down the configuration. If ‘–a’ arrives later, it finds all the toeholds blocked and cannot bind to the remaining structures, Thus ‘–b’ arriving first prevents ‘–a’ from binding later. If ‘–a’ arrives first, the situation is symmetric, with the end caps binding to the opposite structures than in the ‘–b’-first case.

In more detail, the initial binding of signals opens up new toeholds for the double-stranded ‘sp–’,‘– qr’ end caps: they cause 4-way strand displacements and stabilize the outcomes in a way that is distinguishable by sequencing. For a ‘–b’ input the final structures are ‘$\underline{\text {p}{\textbf {-}}\text {a}{\textbf {-}}\text {b}{\textbf {-}}\text {qr}}$’ + ‘q’, which is the result we earlier called $a\ge b$, and ‘$\underline{\text {sp}{\textbf {-}}\text {b}{\textbf {-}}\text {a}{\textbf {-}}\text {q}}$’ + ‘p’, which is the result we earlier called $b\le a$ (Fig. 8, top). The opposite happens if ‘–a’ arrives first (Fig. 8, bottom). If ‘–a’ and ‘–b’ arrive together, then both results are produced because the released ‘a–’ and ‘b–’ bind concurrently to as yet untouched copies of the gates.

4 illustrations. 1, is labeled, a is greater than or equal to b. It presents 2 double strands sequenced as, p a b q r, and q. 2, is labeled, b is less than or equal to a. It presents 2 double strands sequenced as, s p b a q, and p. — **Fig. 8**

These activations are irreversible and catalytic: ‘–a’ and ‘–b’ are released back without requiring additional structures. This is going to help kinetically and is also less likely to perturb the system we are observing. Reflexive gates a?a work as expected: we need them to signify that a signal ‘–a’ has arrived at some time. We produce the a?a structures by the general recipe, meaning as [a?a| + |a?a], hence with twice the concentration of the main structure. This is in fact what we need to keep the kinetics balanced with respect to non-reflexive gates.

A single choice gate works as described, but we need to consider the situation where there are multiple choice gates together. In a gate with [a?b|, the input ‘–b’ releases ‘b–’, which goes on to bind to |b?a], but also to any other |b?x]: crosstalk! Normally this would be incorrect, but here we want to activate |b?x] as well, since it tells us that ‘–b’ arrived before ‘–x’. If there is a |b?x], then there is also an [x?b|, which driven by ‘–b’ activates |b?x] anyway. So the crosstalk between gates does not hurt in this particular instance. The most interesting consequence is that, as we noted, although we have $N^2$ gates, we only have to encode N distinct domains (plus the 4 auxiliary ones). This greatly reduces the potential interference between domains that would be an obstacle to scaling up the number of signals. As an added benefit, these crosstalking gates are automatically catalytic (cf. Fig. 9).

As an example, for 3 signals a, b, c, we use the following 9 choice gates (first column) and corresponding initial structures (second column):

$$\begin{aligned} \text {ga}&\text {tes}&\text {struc}&\text {tures}&\text {after}&\text { `}\,{\textbf {-}}\text {c'}&\text {after}&\text { `}{\textbf {-}}\text {b'} \\ a&?a&[a?a|&\,\,\, |a?a]&[a?a|&\,\,\, |a?a]&[a?a|&\,\,\, |a?a] \\ b&?b&[b?b|&\,\,\, |b?b]&[b?b|&\,\,\, |b?b]&b\ge b&\,\,\, b\le b \\ c&?c&[c?c|&\,\,\, |c?c]&c\ge c&\,\,\, c\le c&c\ge c&\,\,\, c\le c \\ a&?b&[a?b|&\,\,\, |b?a]&[a?b|&\,\,\, |b?a]&a\ge b&\,\,\, b\le a \\ a&?c&[a?c|&\,\,\, |c?a]&a\ge c&\,\,\, c\le a&a\ge c&\,\,\, c\le a \\ b&?c&[b?c|&\,\,\, |c?b]&b\ge c&\,\,\, c\le b&b\ge c&\,\,\, c\le b \end{aligned}$$

An illustration presents 6 double strands listed in 3 lines, sequenced as follows. Line 1. b cross a, and a cross b. Line 2. p, a cross b, b, b cross a, q, and p, b cross a, a, a cross b, q. Line 3. s p and q r. — **Fig. 9**

If a signal ‘–c’ arrives, it initially activates 3 structures, the ones of the form [x?c|, producing outcomes $x\ge c$. Soon after, the signal ‘c–’ that is released by those activations crosstalks with the structures of the form |c?y], producing outcomes $c\le y$ (third column). If a signal ‘–b’ arrives next, it further activates some gates, but not the ones that have been used up by ‘–c’ (fourth column). If we sequence the structures at this point, we can conclude (with multiple redundancies) that:

$$\begin{aligned} c\le b \le a \end{aligned}$$

That’s a definite $c<b$, because we observe $c\le b$ but not $b\le c$. Moreover, we do not observe $a\le a$ which means that a never arrived. If we were to observe $c\le b$ and $b\le c$, then we would deduce that c, b arrived together, up to our time resolution.

Detection of the preorder should be robust because of the redundancies. Background noise and bad gates can be tolerated, because we just need to detect which of $a \le b$ vs. $b\le a$ is strongest. Moreover, our set of observed structures must be transitively closed: if the input is the sequence a.b.c then we should observe $a \le b$ (and not $b \le a$) and $b \le c$ (and not $c \le b$), and transitively also $a \le c$ (and not $c \le a$). The transitive closures can act as consistency checks.

4.4 A “Proper” Choice Gate

If we want to use a choice gate in some general and modular way within some bigger design, then we need a gate that respects all the conventions, and in particular that does not crosstalk with unrelated gates. In the design in Fig. 9 the domains called ‘axb’ and ‘bxa’ are uniquely determined by ‘a’ and ‘b’ to avoid crosstalk with other gates. Here, a ‘b–’ input does not release a ‘–b’ signal that connects with other gates, but rather a ‘–bxa’ signal that binds uniquely to the other half of that choice gate. In our preorder recorder application, where we use $N^2$ gates, we would now need $N+N^2$ distinct signal domains. Other than that, this choice gate could replace the crosstalking one. A catalytic version can be obtained as in Fig. 4.

5 Conclusions

We have described a class of DNA algorithms designed to take advantage of high-throughput sequencing and also relying on high-throughput synthesis. A combinatorial number of different structures are activated on demand without any timing or synchronization, operating by natural parallelism. The outcome is produced not as an output but as the final state of the system to be read by sequencing.

Notes

1.
The ‘q’ domain can be single-stranded, interacting by a simpler 3-way displacement, but that would rule out producing the structure directly by cloning [4]. If the whole ‘–qr’ were single stranded, a polymerase could not attach to the final structure to complement the ‘r’ domain, as required for sequencing and positive detection, but a PCR step could be used instead.
2.
4-way strand displacement is slower than 3-way [8] Chap. 5 (although potentially more robust [6]). This may degrade the ability of our algorithms to separate events in time, but otherwise it does not affect their logic, which includes the possibility of coincidence of events. The 4-way displacement is of the unusual ‘open’ kind [8], that is, initiated by a single toehold binding instead of two.
3.
‘$\underline{\,\,\,\text {a}{\textbf {-}}\text {q}}$’ needs to be fully complemented, ligated, and sequenced. To that end, we can add an additional double-stranded domain on the left of the initial toehold, as in [5]. This allows a polymerase to proceed in the 3’–5’ direction of the bottom strand and fully complement the top strand. For this presentation, we omit these domains because they have no other function and do not participate in the described reactions. Moreover, even if sequencing misread the initial state, we would still get our answer by the presence or absence of the final state ‘$\underline{{\textbf {-}}\text {a}{\textbf {-}}\text {qr}}$’ + ‘q’.
4.
In fact, all four ‘s’, ‘p’, ‘q’, ‘r’ domains can be the same sequence without ambiguity in the outcomes of Fig. 8. Still, we keep them distinct in light of other possible constraints, such as in Fig. 10.
5.
https://www.aatbio.com/data-sets/restriction-enzymes-cut-sites-reference-table.

References

A.V. Aho, M.R. Garey, J.D. Ullman, The transitive reduction of a directed graph. SIAM J. Comput. 1(2), 131–137 (1972). https://doi.org/10.1137/0201008
Article MathSciNet MATH Google Scholar
L. Cardelli, Two-domain DNA strand displacement. Math. Struct. Comput. Sci. 23(2), 247–271 (2013). https://doi.org/10.1017/S0960129512000102
Article MathSciNet MATH Google Scholar
S.H. Chan, B.L. Stoddard, S.Y. Xu, Natural and engineered nicking endonucleases-from cleavage mechanism to engineering of strand-specificity. Nucleic Acids Res. 39(1), 1–18 (2010). https://doi.org/10.1093/nar/gkq742
Article Google Scholar
Y.J. Chen, N. Dalchau, N. Srinivas, A. Phillips, L. Cardelli, D. Soloveichik, G. Seelig, Programmable chemical controllers made from DNA. Nature 8(10), 755–762 (2013). https://doi.org/10.1038/nnano.2013.189
Article Google Scholar
Chen, Y.J., Seelig, G.: Scaling Up DNA Computing with Array-Based Synthesis and High-Throughput Sequencing. In this volume (2021)
Google Scholar
D.Y. Duose, R.M. Schweller, J. Zimak, A.R. Rogers, W.N. Hittelman, M.R. Diehl, Configuring robust DNA strand displacement reactions for in situ molecular analyses. Nucleic Acids Res. 40(7), 3289–3298 (2012). https://doi.org/10.1093/nar/gkr1209
Article Google Scholar
R.A. Hughes, A.D. Ellington, Synthetic DNA synthesis and assembly: putting the synthetic in synthetic biology. Cold Spring Harb. Perspect. Biol. 9(1), a023812 (2017). https://doi.org/10.1101/cshperspect.a023812
Article Google Scholar
D. Nadine, Synthetic Molecular Machines for Active Self-assembly: Prototype Algorithms, Designs, and Experimental Study. Dissertation (Ph.D.) (2013). https://doi.org/10.7907/T0ZG-PA07
L. Qian, D. Soloveichik, E. Winfree, Efficient turing-universal computation with DNA polymers, in DNA Comput. Mol. Program. ed. by Y. Sakakibara, Y. Mi (Springer, Berlin Heidelberg, Berlin, Heidelberg, 2011), pp.123–140
Chapter MATH Google Scholar
J.A. Reuter, D.V. Spacek, M.P. Snyder, High-throughput sequencing technologies. Mol. Cell 58(4), 586–97 (2015). https://doi.org/10.1016/j.molcel.2015.05.004
Article Google Scholar
G. Seelig, D. Soloveichik, D.Y. Zhang, E. Winfree, Enzyme-free nucleic acid logic circuits. Science 314(5805), 1585–1588 (2006). https://doi.org/10.1126/science.1132493
Article Google Scholar
S.L. Shipman, J. Nivala, J.D. Macklis, G.M. Church, CRISPR-Cas encoding of a digital movie into the genomes of a population of living bacteria. Nature 7663, 345–349 (2017). https://doi.org/10.1038/nature23017
Article Google Scholar
A.N. Sinyakov, V.A. Ryabinin, E. Kostina, Application of array-based oligonucleotides for synthesis of genetic designs. Mol. Biol. 55, 487–500 (2021). https://doi.org/10.1134/S0026893321030109
Article Google Scholar
D. Soloveichik, G. Seelig, E. Winfree, DNA as a universal substrate for chemical kinetics. Proc. Natl. Acad. Sci. 107(12), 5393–5398 (2010). https://doi.org/10.1073/pnas.0909380107
Article Google Scholar
T. Tanna, F. Schmidt, M.Y. Cherepkova, M. Okoniewski, R.J. Platt, Recording transcriptional histories using Record-seq. Nat. Protoc. 15, 513–539 (2020). https://doi.org/10.1038/s41596-019-0253-4
Article Google Scholar
D.Y. Zhang, Towards domain-based sequence design for DNA strand displacement reactions, in DNA Comput. Mol. Program. ed. by Y. Sakakibara, Y. Mi (Springer, Berlin Heidelberg, Berlin, Heidelberg, 2011), pp.162–175
Chapter Google Scholar

Download references

Acknowledgements

Thanks to Matthew Lakin, Georg Seelig, and David Soloveichik, for helpful comments, and to Yuan-Jyue Chen and Georg Seelig for initial discussions that lead to this paper.

Author information

Authors and Affiliations

University of Oxford, Oxford, UK
Luca Cardelli

Authors

Luca Cardelli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luca Cardelli .

Editor information

Editors and Affiliations

Department of Mathematics and Statistics, University of South Florida, Tampa, FL, USA
Nataša Jonoska
Department of Computer Science; Bioengineering; Computation & Neural Systems, California Institute of Technology, Pasadena, CA, USA
Erik Winfree

Appendix: Restriction Enzymes

Naturally occurring restriction enzymes bind to double-stranded DNA and make a double-strand cut. Some natural enzymes and some engineered versions of restriction enzymes cut only one of the strands: these are called nicking enzymes [3]. Although there are many such enzymes used for cutting natural and synthetic DNA, their properties are severely restricted.^{Footnote 5} In nature, there is also a Cas9 protein that is essentially a programmable restriction enzyme: it can cut DNA at almost any desired location determined by a separate RNA strand.

Let us imagine that we are able to design our own restriction enzymes. We show that it is then possible to cut and nick the crosstalking choice gates in Fig. 7 out of a longer DNA double strand. Such a strand is ideally obtained by bacterial cloning, which can produce large quantities of very long high-quality strands, enabling the mass production of DNA gates by cutting them out of long cloned strands [4].

There are many alternatives to the hypothetical choices of restriction enzymes show below, depending on where the DNA cuts are located with respect to the binding sites of an enzyme. But let’s assume the following possibilities:

‘B..#’ is a blunt-cutting enzyme that binds to a recognition sequence ‘B’ and makes a blunt double cut (i.e., at the same location on both strands) indicated by #, at some non-critical distance toward 5’.
‘^..L’ is a nicking enzyme that binds to a recognition sequence ‘L’, and makes a nick indicated by ${}^\wedge $ toward 3’, on the opposite strand, at a distance corresponding to a toehold length.
‘R..^’ is a nicking enzyme that binds to a recognition sequence ‘R’, and makes a nick indicated by ${}^\wedge $ toward 5’, on the opposite strand, at a distance corresponding to a toehold length.

The embedding of the ‘B’ binding sequence is straightforward because it can be placed outside the gates, in the surrounding DNA. The main gate structures however have several internal nicks; hence, the enzyme binding sites must be placed inside the signal domains (we cannot assume they can cut precisely at a very long distance from their binding site). Since these domains occur twice with different surrounding nicks, the placement of the binding sequences is non-trivial. However, the following scheme is adequate: each domain used for encoding signals has ‘L’ and ‘R’ enzyme binding sequences embedded as in Fig. 10 top left: they produce nicks at a toehold length just outside of the domain. For the staggered cutting of the end caps, though, it is non obvious that ‘L’ and ‘R’ would work together to produce a staggered double strand cut as indicated. Alternatively, two separate staggered-double-strand-cut enzymes need to be used there, with the stagger being the length of a toehold.

An illustration presents 5 double strands arranged in 2 lines, sequenced as follows. Line 1. x with labels L and R on the second strand. s p with labels L and R on the first and B and L on the second strands, respectively. Line 2 and 3, p a b q. — **Fig. 10**

This scheme came from a discussion with Yuan-Jyue Chen, after he pointed out that the placement of restriction binding sequences was problematic.

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cardelli, L. (2023). Sequenceable Event Recorders. In: Jonoska, N., Winfree, E. (eds) Visions of DNA Nanotechnology at 40 for the Next 40 . Natural Computing Series. Springer, Singapore. https://doi.org/10.1007/978-981-19-9891-1_17

Download citation

DOI: https://doi.org/10.1007/978-981-19-9891-1_17
Published: 05 July 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-9890-4
Online ISBN: 978-981-19-9891-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Sequenceable Event Recorders

Abstract

1 Introduction

2 Occurrence Recorder

2.1 Yes Gate

2.2 Occurrence Recorder Algorithm

3 Coincidence Recorder

3.1 Join Gate

3.2 Coincidence Recorder Algorithm

4 Preorder Recorder

4.1 Choice Gate Specification

4.2 Preorder Recorder Algorithm

4.3 Crosstalking Choice Gate

4.4 A “Proper” Choice Gate

5 Conclusions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix: Restriction Enzymes

Appendix: Restriction Enzymes

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation