1 Introduction

Over the last four decades, structural DNA nanotechnology became intertwined with and provided a foundation for a range of other fields, most notably dynamic DNA nanotechnology and DNA computing. Adleman’s original demonstration of DNA computing took advantage of the inherent parallelism of chemical reactions and the predictability of DNA base pairing to solve an instance of the traveling salesman problem [1]. However, it soon became apparent that DNA computing would not be able to compete with electronic computers because of the exponentially large amounts of DNA required for solving problems out of reach of conventional computers [11]. Researchers subsequently explored alternative computing paradigms, from tile-assembly systems, an area that Seeman made foundational contributions to [3, 29, 34, 43], to Boolean logic gates [25, 31, 37, 38], neural networks [13, 20, 27, 42, 45] or chemical reaction networks [6, 7, 33]. Importantly, the goal of this more recent work is not to compete with electronics but to efficiently process information that is already available in molecular form or to use embedded computation to organize molecules into shapes and patterns.

1.1 Scaling up DNA Computing for Molecular Diagnostics

The potential of DNA computing to analyze information encoded in biological nucleic acids, in particular single-stranded RNA, was recognized early on [5, 31, 37]. Benenson et al. developed a molecular automaton that could perform a computation where the outcome (the release of an antisense drug mimic) was dependent on the absence or presence of specific inputs (ssDNA with sequence analogous to diagnostically relevant mRNA) [5]. In our own recent work, we constructed a DNA-based molecular classifier that assigns samples to a class (bacterial or viral infection) based on the levels of seven distinct mRNA mimics [21]. Zhang et al. extended this work by including an amplification step that made it possible to work with actual patient samples rather than synthetic or in vitro transcribed RNA thus bringing molecular computation for diagnostics closer to a practical application [44]. However, classification accuracy could be further improved by considering more input features (i.e., distinct mRNA). In fact, an optimal in silico classifier designed to solve the same problem as our molecular classifier takes into account the levels of over two hundred distinct transcripts [41]. Moreover, in silico classifiers designed to solve more complex computational problems such as identifying a cancer tissue of origin routinely take thousands of genes as inputs [22, 28]. It is easy to imagine that future diagnostic applications of molecular computation could benefit from technologies that would allow the construction and analysis of circuits with thousands of gates.

1.2 Scaling up DNA Computing for DNA Data Storage

Digital data will soon outgrow available storage capacity and novel approaches for data storage are required. DNA is a promising material for digital data storage because of its high information density and durability (reviewed in Ref. [9]). Documented suggestions for the use of DNA for digital data storage go as far back as the 1960s [23], but only recent improvements in DNA synthesis and sequencing have started making DNA storage practical [14, 17, 18, 24]. The amount of data stored in DNA increased from 0.7 MB in 2012 to 200 MB in 2018 and will keep growing at an accelerating pace given the interest in this technology from both industry and academia. However, having large volumes of data stored in DNA creates new challenges, namely the need to access and search these data or perform other forms of information processing. Sequencing all stored data and converting it back to electronic form for processing are impractical and will not scale. A molecular computation approach for performing data analysis directly in the chemical realm, i.e., “near data,” before converting results to electronic form, is a more promising alternative [8]. However, although some problems such as random access [24], image similarity search [4, 36], image preview [40] and some types of Boolean search [2] can be mapped to hybridization reactions and thus be implemented at relatively large scale already, other tasks such as image classification will likely require more complex computational primitives and orders of magnitude more computational elements than are currently feasible [13].

1.3 Limitations of Current Approaches to DNA Computing

Almost all DNA circuitry built to date has been assembled from individually column-synthesized oligonucleotides, because this technology provides a low synthesis error rate and control over the concentration of individual strands. Having a low rate of deletions, insertions or substitutions are desirable because such errors can result in side reactions (i.e., “leaks”) once the strand is incorporated into a gate [12, 35, 46]. Control over concentration is necessary because most DNA gates are assembled from multiple strands with a defined stoichiometry. However, at a cost of $1,000 or so for 96 oligos synthesized in a 96-well plate format, and the need to manipulate oligos individually, any approach based on ordering column-synthesized oligos cannot scale.

A second, related challenge for scaling up DNA circuits is the time-consuming step of gate assembly and purification. Currently, each gate complex needs to be assembled separately; otherwise signal strand supposed to be bound to an upstream gate may bind to downstream gate instead. Moreover, gate complexes often need to be gel purified to remove excess strands or partial complexes, which can cause undesired leak reactions and computation error.

A third limitation to scaling up DNA circuit computation comes from the use of fluorescence-based reporters for reading out the results of a computation. Typical commercial fluorometers typically read up to four distinct fluorescence channels and a similar number of samples in parallel. Plate readers allow monitoring of up to 96 reactions in parallel but are typically limited to reading no more than two fluorescence channels simultaneously. These limitations mean that only a small number of variables can be monitored in a given computation (i.e., as many as there are independent fluorescence channels), and only, a limited number of computations can be performed in parallel (i.e., as many as there are separate reaction chambers).

Other limitations such as low reaction speed will be briefly discussed in the discussion, but addressing them is not the focus of the work presented here.

2 A Vision for the Future

Pools of array-synthesized DNA (reviewed in [19]) provide a promising alternative to individually column-synthesized DNA because cost per oligo is orders of magnitude lower and pools with over a million distinct sequences are now available commercially. Such very large-scale and (relatively) low-cost pools provided the foundation for breakthroughs in DNA data storage [14, 17, 24] but so far, array-synthesized oligos have not been used for molecular programming applications because of the lower synthesis quality, variation in concentration between oligos, and the low yield of each individual oligo in the pool.

Similarly, next-generation DNA sequencing can be used to read out hundreds of millions of DNA sequences in parallel but so far has not been used to read out DNA computations because current gate architectures are not compatible with read out by sequencing. Specifically, whether an output strand was released from a gate or is still hybridized to it is unlikely to impact whether it is detected in a sequencing reaction and thus naive sequencing of a DNA circuit reaction mix cannot reveal the result of the computation.

Building on these technologies is a necessity if we are to scale up the complexity and size of DNA nanostructures and circuits. However, doing so will require novel gate and structural architectures that are compatible with low abundance and low quality DNA. We expect that enzymatic processing will be important in order to selectively amplify the array-synthesized DNA and process it into functional gates or other devices.

We note that the idea to use array-synthesized DNA for DNA computing is at least a decade old. Qian and Winfree outlined how a Seesaw gate could be made through enzymatic processing of a single DNA hairpin synthesized on an array [26]. The proposed approach ensures correct stoichiometry of the two components strands in the final gate and also provides some degree of sequence proof-reading due to the enzymatic cleavage step. However, gate concentration is bounded by that of intact oligonucleotides in the pool and performance is likely limited by leak reactions involving gates derived from truncated or otherwise imperfect oligonucleotides.

Here, we introduce an alternative approach for making general purpose DNA logic gates from array-derived DNA. Our approach uses PCR amplification to selectively amplify full length but low abundance DNA. We further show how DNA sequencing can be used to read out the result of a computation thus providing a path toward reading out a very large number of gate operations in parallel (Fig. 1).

Fig. 1
A flow diagram has parallel gate preparation with oligo library synthesis, sub-pool P C R, and enzymatic digest. D N A computation with Boolean logic, molecular programming language, and D N A architecture, and a cyclic high throughput readout.

Overview of circuit construction, computation, and read out. Left: DNA gates are derived from large pools of oligonucleotides through PCR amplification and enzymatic processing. Middle: computation occurs in a pooled reaction upon addition of inputs. Right: end point results of the Boolean logic computation are read out in parallel using next-generation sequencing (NGS)

3 Results

In this section, we first introduce our logic gate design and reaction mechanism and the explain how gates can be processed from single-stranded DNA through PCR amplification and nicking enzyme digestion. We then provide experimental results for gate operation and demonstrate readout of up to 380 gates in a single reaction using DNA sequencing.

3.1 Nicked Double-Stranded DNA Gates Reaction Mechanism

We use a modified version of the nicked double-stranded DNA (ndsDNA) gate architecture proposed by Cardelli [7]. The reaction mechanism for a single-input, single-output gate, a logical repeater or sequence translator, is detailed in Fig. 2. The input and output have the same domain architecture with a short toehold followed by a longer recognition domain. Each complete logic gate consists of a join and a fork gate. Several auxiliary species, referred to as helper strands, are also required for the reaction. Inputs bind to the join gate and trigger a sequence of strand displacement reactions that result in the release of an intermediate output strand. This intermediate output then reacts with the fork gate, triggering a second sequence of reactions that finally result in the release of the output. The two-component join-fork architecture and use of helper strands ensure that the input and output sequence are completely unrelated and have the same domain architecture. The gate architecture is modular, and gates with multiple inputs and outputs can easily be designed.

Fig. 2
Two schematics represent components of D N A. A has gate representation, gates, signal species, and helpers. B has a reaction pathway from join to fork.

DNA components (a) and reaction mechanism (b) for a one-input, one-output gate. The repeater gate consists of a join and fork component and a set of helper strands. Inputs and outputs are referred to as signal species. Here, we propose to derive join and fork gates from DNA pools through enzymatic processing while inputs and helpers are synthetic DNA. The gate architecture is similar to that described in Ref. [12] with two important exceptions: first, all toeholds are internal, meaning that they are flanked by double-stranded domains on both sides. This change was made to ensure that all toeholds have similar binding energies. Second, double-stranded domains (light and dark blue) appended to some of the helper strands are used for selective PCR and sequencing of reacted gate species as detailed in the text

3.2 Gate Design

A key advantage of the ndsDNA gate architecture is that each gate can be derived from fully double-stranded DNA through enzymatic processing. While in our earlier work we used plasmid DNA as a starting material in order to minimize error rate, we here propose to generate the double-stranded DNA by PCR amplification of single-stranded DNA synthesized on an array. We will make two modifications to the architecture introduced in Ref. [12]. First, we will ensure that all toeholds have the same sequence and are “internal,” i.e., they are flanked by double-stranded domains on both sides (see the leftmost toehold in Fig. 3 for an example of an internal toehold). As a result, all toeholds should have very similar binding energies and all individual strand displacement reactions should occur at similar rates. Second, and more importantly, we modify the gate architecture to enable read out of gate activation by sequencing. Specifically, we add a double-stranded domain (shown in blue or light blue in Fig. 2) to the final helper strand for both the join and fork gate. These domains do not participate in the strand displacement reactions but are appended to the gate complex when the reaction has completed. These domains can be detected by sequencing and can be used to distinguish reacted from unreacted gates.

Fig. 3
A workflow of gate generation. A left toehold with double-stranded domains is processed with nicking enzymes, that generates breaks in the top strand of the gate, and an initiating toehold on the right.

Enzymatic processing steps used for gate preparation. Functional domains and nicking enzyme binding sites are indicated. The denaturing gel data show that bands corresponding to DNA oligonucleotides of the expected length for a correctly processed gate

A third important point is that we operate in a different regime than that described in Ref. [12]. Specifically, we assume that inputs are in excess of the gates such that the gates are either fully triggered or completely off. Thus, while these gates were initially designed to approximate a desired analog (chemical kinetics) behavior, we here treat them as digital logic gates.

3.3 Making ndsDNA Gates from Array-Synthesized DNA

The workflow of gate generation is detailed in Fig. 3. An initial gate strand is chemically synthesized including flanking sequences that can be used for PCR amplification. Using matching primers, this strand is amplified to generate double-stranded DNA. The resulting duplex DNA is then processed with nicking enzymes that generate breaks in the top strand of the gate and also generate an initiating toehold.

By using PCR and nicking enzymes to generate each join and fork gate from a single strand, we overcome two major issues. First, PCR amplification is only successful if both primer sequences are present. Therefore, the PCR reaction will preferentially amplify full-length molecules over a background of fractional synthesis products. Moreover, most multi-stranded gate architectures require careful control over stoichiometry, which is currently impossible to achieve with array-synthesized DNA. Using only a single strand to generate the entire gate overcomes this limitation. Moreover, all strands that are synthesized with the same primer sequences will be amplified together. By using a common set of primers for all gates that belong to the same functional module, we can amplify and then process all components of a circuit of interest in a one-pot reaction.

Figure 3 shows an example of join gate processing. In this case, the initial duplex was generated by PCR of a single column-synthesized oligo. Enzymatic digestion was performed as outlined on the left of the figure. Denaturing gel electrophoresis was used to show that the digestion results in digestion products of the expected length.

3.4 Characterizing Gate Kinetics

Before turning to a readout by sequencing, we will characterize the behavior of individual gates and small circuits using traditional fluorescence-based assays. For an initial test, we generated an AND logic gate (both join and fork gates) starting from column-synthesized oligos. Upon PCR amplification and enzymatic digestion, we first characterized gate behavior using non-denaturing electrophoresis (Fig. 4, left). We then triggered the gate using either both or only one input and measured the rate of output release using a fluorescent reporter, following the protocol detailed in Ref. [12]. The kinetics data show correct AND logic behavior (Fig. 4, right).

Fig. 4
Three schematics for D N A and gate configuration. A has a flow diagram with different binding sites and reactions. B has lane 1 for fork gate and lane 2 for fork gate + input. It has different pathways. C for reaction kinetics has a logical gate and a graph of C versus time.

DNA AND logic gate. a Fork gate configuration before and after reaction. b Native gel assay shows the expected DNA species for the fork gate before and after reaction. c Fluorescence kinetics data for the complete AND gate (fork + join + helpers) confirm that signal is high only if both inputs are present as expected for AND logic

Fig. 5
Two flow diagrams. A has logic gates and readout screen. The text reads, next-generation sequencing, high throughput, and limited time points. B has 4 levels via ligation, mix with primers, and P C R, leading to massive parallel sequencing. Reacted gates are F B and F B asterisk.

Reading out DNA logic gates with next-generation sequencing. a Using a sequencing-based readout rather than a fluorescence-based readout in principle makes it possible to read out millions of reactions in parallel. However, unlike in a fluorescence-based assay, only reaction end points can be assessed because of a lack of time resolution, making a sequencing-based readout ideal for Boolean logic. b After completion of the computation, ligation is used to seal the nicks in the gate complexes. Because only reacted gates contain the domain shown in light blue, they can be selectively amplified with PCR. After that, additional DNA domains necessary for sequencing (purple) are appended to the reacted gates using standard Illumina ligation protocol

3.5 Reading Out DNA Computation with Next-Generation DNA Sequencing

Our approach relies on high-throughput (Illumina) sequencing to detect whether a gate was activated or not. High-throughput sequencing is routinely used to read out hundreds of millions of sequences in parallel, and a key advantage of our approach is that the state of all gates in a reaction mix can be read out in a single experiment. In our work, each sequencing read will correspond to a single gate molecule and will provide both information about the identity of the gate and its activation status.

As explained above, reacted gates can be distinguished from unreacted gates because an additional double-stranded domain (shown in blue in Fig. 5) is appended to the reacted gate. In the sequencing reaction, the entire gate sequence including the appended auxiliary domain will be detected. Presence (or absence) of the auxiliary domain indicates whether a specific gate participated in a reaction. Moreover, the sequence of the gate uniquely identifies the gate (i.e., the sequence shows which inputs are accepted and outputs released). The sequencing reaction thus effectively amounts to counting the number of reacted (and unreacted) gates of each type.

To prepare gates for sequencing, we first use ligation to seal all nicks, followed by PCR to enrich full-length products. Next, we use ligation to add the sequencing adaptors required for Illumina sequencing. These adaptors are used to bind the product to the flow cell and also serve as binding sites for the sequencing primers.

Fig. 6
Four schematics for sequencing. A. AND gate and pathway for D 0, D 1, and D 2. B. AND gate and a bar graph for sequencing count. C. Translator gate and pathway for D 0 and D 1. D. Gate sequences and a bar graph for sequencing count.

Preliminary data confirm that gate operation can be read out using next-generation sequencing. a A logic AND gate and a simplified representation of the corresponding DNA gates. b Sequencing read counts for the join and fork gate that together make up the AND gate. c A translator gate and a simplified representation of the corresponding DNA gates. d Sequencing read counts for two-stage reaction cascades. Each one-input, one-output logic gate consists of a join and a fork gate. Blue bars indicate counts in the absence of inputs, and red bars show counts if inputs are present. The results show that two gates can be cascaded and can be read out by sequencing reacted gates

To test whether gate operation could be read out by sequencing, we again turned to gates derived from individual ultramers (Fig. 6). We tested both a single AND gate and two distinct cascades of repeater gates. In all instances, we found the gates behaved broadly as expected, confirming that sequencing could be used to read out the performance of ndsDNA gates.

3.6 Reading Pools of Array-Derived Gates

Next, we turned to the challenge of scaling up circuit construction and analysis. We asked whether DNA gates could be synthesized on an array and whether gate operation could be read out with sequencing. To that end, we designed a set of 380 join gates that could be triggered with all possible combination of 20 distinct inputs (see Fig. 7). Gates that respond to two copies of the same input, i.e., inputs (A, A), were not synthesized but we did synthesize gates for both input pairs (A, B) and (B, A). Although these are logically identical, the DNA domains are ordered differently, potentially resulting in performance variation. In the current iteration of our approach, all single-stranded inputs and auxiliary strands are individually column synthesized. In principle, it would be possible to also obtain at least some of these strands through enzymatic processing. However, we here instead chose to limit the number of strands that need to be ordered by reusing sequences between different modules that are tested independently. Although we observed some variation in levels of triggering, all gates behaved as designed, providing support for the work proposed here. Next, we will extend this approach to testing full AND gates, characterizing gate kinetics and increase the size and complexity of the circuit.

Fig. 7
Three graphical representations for sequencing. A to C are input B versus input A graphs. A represents no input, B for all input, and C for half input. B has multiplexed data throughout the graph. C has data towards the bottom left corner.

Preliminary data confirm that gate operation can be read out using next-generation sequencing in a highly multiplexed format. In this experiment, 380 join gates were tested simultaneously. Each join gate accepts two inputs and gates were designed to accommodate all possible combinations of 20 distinct inputs (no gates were designed that accept the same input twice). Although input order should not matter for the required logic operation, inputs react with join gates sequentially, and two different gate designs are triggered by each input combination. Each entry in the heat map corresponds to read counts for one join gate. a In the absence of any inputs, read counts are low. b When all inputs are present, read counts are high for all gates. c When only a subset of inputs are added to the mix, read counts are high only for the triggered gates

4 Discussion

In this manuscript, we provided a detailed example of how high-throughput DNA synthesis and sequencing technologies can be used to scale up DNA computing. We experimentally demonstrated AND logic gates as well as testing a pool of 380 array.

A potential challenge to scaling up circuit size is that the concentration of each gate in a pool is inversely proportional to the size of a pool [26]. That is, if PCR is used to amplify a pool with 10 gates to a total concentration of 1 \(\upmu \)M, each individual gate will be at an operating concentration of 100 nM. For a circuit with 10,000 gates, the operating concentration of each gate would only be 0.1 nM, likely resulting in a very slow computation (likely taking hours or even days) for multi-layered circuits. However, we believe that there are multiple avenues for overcoming this limitation. For example, recent work by our own group and others [10, 15, 16, 30, 39] showed that spatial colocalization of circuit elements using DNA origami scaffolds can dramatically accelerate DNA circuit computation. Similar colocalization approaches based on DNA nanostructures, beads, or other scaffolds could also be developed for array-derived logic gates.

Another potential drawback of our approach is that it only provides end point data rather than full reaction kinetics. While this is not a concern in the context of Boolean logic circuits for which we only care whether an output is true or false once the computation has completed, it could prove limiting for other applications. However, we believe that approaches based on nanopore sequencing which can provide real-time readout of reaction progress can overcome this limitation.

Although this work is still preliminary, we were able to demonstrate parallel operation and readout of 380 Join gates in a single reaction. We believe that this work will provide a foundation for large-scale DNA computing with applications in disease diagnostics and DNA data storage.