1 A Brief History of DNA Computing

Nature-evolved DNA molecules are the primary information-carrying medium of life [1]. The computing power of DNA relies primarily on its structural potential. In 1953, Watson and Crick first proposed the double-helix structure of DNA, which marks a key step to uncover the secret of life [2]. In 1982, Seeman for the first time proposed a rational design of Holliday junction-like branched DNA structure [3], pioneering the endeavor to construct human-defined structures using DNA beyond the secret of life. This work and the subsequent progress in DNA nanotechnology provide insights and toolbox for the design of dynamic structures to implement computing algorithms.

In 1994, Adleman proposed a DNA-based algorithm to solve a Hamiltonian path problem [4]. This work for the first time demonstrated the feasibility of carrying out computations using synthetic DNA molecules, thereby signaling the start of DNA computing. The parallelism far beyond that of conventional silicon-based computers attracted wide interest. Following this work, efforts were made to explore the parallel computation ability of DNA molecules to solve various mathematically complex problems, including Boolean satisfiability problem (SAT) [5,6,7,8,9], maximal clique problem [10], traveling salesman problem [11], etc. During this period, it was realized that the available molecular parallelism was not enough to combat the slow clock speed of biochemical operations and the redundancy required to combat the high intrinsic error rate of the operations. Evolutionary computation models were proposed to overcome the limitations, although not experimentally demonstrated [10, 12,13,14,15].

Subsequently, a variety of new molecular mechanisms for implementing computational algorithms were proposed, which greatly enriched the toolbox of DNA computing [16,17,18,19,20,21,22,23,24]. In 2000, as a milestone in DNA computing, an enzyme-free DNA “tweezer” was proposed by Yurke et al., which could switch between ON and OFF states through strand displacement reactions [25]. Based on accurate base pairing principle, with tunable reaction kinetics [26,27,28,29,30,31] and spontaneous execution, strand displacement reactions immediately enable various modular design of DNA computing architectures [16, 18, 32,33,34]. In 2004, Dirks and Pierce realized triggered amplification by hybridization chain reaction (HCR) [17], which has been broadly applied in computing [35], biosensing [36], and self-assembly [37, 38]. In 2006, Seelig et al. proposed a toehold-mediated strand displacement scheme to construct enzyme-free logic circuits [18]. In 2007, Zhang et al. developed a signal amplification reaction network that uses DNA strand as catalyst [39].

With the advances of DNA computing toolbox, various computational functions (e.g., automaton [40, 41], logic computing [42, 43], neural network computing [44, 45], cargo-sorting [46], and maze solving [35]) have been realized. Figure 1 presents a timeline of representative advances in DNA computing, classified mainly according to the realized functions and design principles. In 2003, Benenson et al. reported a molecular automaton that uses DNA both as data and fuel [40]. In 2010, Pei et al. first developed a programmable computing device to play a game [41]. In 2011, Qian and Winfree proposed a simple yet modular computing unit “seesaw” motif, with which large-scale digital computation [42] and neural network computation [44] were implemented experimentally with improved performance. In 2012, Padirac et al. developed switchable memories using DNA and DNA processing enzymes [19]. In addition to state jumps, they also implemented time-domain programming. Recently, temporal dynamics programming was further developed to implement predator-prey reaction network [47], adaptive immune response simulator [48], and enzyme-free oscillators [49]. Logic computing has also been developed with the introduction of DNA origami-based logic gates [50], spatially localized logic gates [33], integrated gene logic chip [51], single-stranded gates [52], and DNA switching circuits [53]. Meanwhile, task-oriented DNA molecular algorithms have been demonstrated in recent years, such as edge detection [54], cargo-sorting [46], maze solving [35], and pattern recognition [45].

Fig. 1
An illustration of a timeline of representative advances of DNA computing. The timeline starts from 1994 with Adleman and Hamiltonian path. The last event on the timeline is in 2020 with Haley et al, hidden thermodynamic driving motif.

Timeline of representative advances in the field of DNA computing [4,5,6,7,8,9,10, 16,17,18,19, 21, 25, 30, 32, 33, 35, 39,40,41, 44, 47,48,49,50,51,52,53,54,55,56]

2 Opportunities and Challenges

Highly ordered arrangement of physically addressable computing units at the nanoscale facilitates high-performance computing of silicon-based conventional computers. To complete a calculation task, the corresponding electronic transmission paths are activated by addressing to map the algorithm to the hardware. In contrast, DNA computing functions are mainly implemented with interactions between DNA and DNA [18], other biological molecules (e.g., RNAs [57], proteins [19, 51], and small molecules [58]), or environmental conditions (e.g., light [59], pH [60], and ions [61]). These molecule computing units are addressed chemically rather than physically, since they are mixed in solution with indistinguishable locations. To carry out a required computing task, a DNA-based chemical reaction network is programmed to run according to specified rules (algorithms) by designing and controlling the interactions between DNA and above elements. The differences in underlying implementations between DNA and solid-state computing devices (e.g., electronic and optical computers) result in unique advantages and challenges for further development of DNA computing.

2.1 Bridge Between Matter and Information

Computing relies on information processing, transfer, and storage. For conventional storage media, information is stored in the specified (magnetic, optical, electrical) states of matters by spatial manipulation of these storage media. In contrast, the aperiodic nucleobases make a DNA strand itself a piece of information. This not only leads to ultrahigh information density in DNA, but also bridges matter and information. With aperiodic nucleobases and regularly sized hydrogen bonds, DNA double helix is an aperiodic crystal, which supports reliable information storage and transfer [62]. The information stored in DNA is transferred into RNA and proteins through genetic transcription and translation. The stereoregular duplex structure allows DNA to be accessed and processed by sequence-independent enzymes for self-replication and degradation. Thus, DNA links a wide matter world at the molecular level with the sequence information it carries.

2.2 Massive Parallelism

DNA computing was initially proposed to solve mathematically complex problems, such as Hamilton path problem [4], SAT problem [5], and maximum clique problem [10], taking advantage of specific and highly parallel binding between DNA molecules. Due to the stochasticity of molecular interactions, every individual molecule of the same population randomly follows certain permitted reaction paths. According to the law of large numbers, all possibilities are covered as long as the number of molecules is sufficient. Compared with algorithms that search for every possible combination one by one until the answer is found, DNA-based molecular algorithms can greatly reduce the time complexity.

It should be noted that the parallelism underlying molecular interactions relies on the availability of participating molecules. For example, a 500-node traveling salesman problem has more than 101000 potential solutions, which is beyond the number of available molecules, as 1 L of 1 M solution could only provide 6 × 1023 manipulative molecules. Besides, as proposed by Back et al. [12], a huge number of DNA molecules can participate a calculation in parallel to generate a random population of candidate solutions, followed by a filtering step to remove all DNA molecules not representing a solution to the problem. Such “filtering approach” becomes infeasible as the problem size grows, since it becomes difficult to select a small number of answer products from a large number of non-answer products. Theoretically, this limitation of parallelism in problem size could be overcome by evolutionary algorithms [12, 13, 63].

2.3 Scalability

Conventional computing devices are based on the integration and spatial arrangement of same building blocks. For these systems, scaling is realized by integrating more building blocks. In contrast, DNA computing relies on specific interactions of orthogonal sequences. The 4-base coding nature and base pairing rule of DNA sequences support a rich orthogonal molecule library for large-scale computing. However, with the increasing number of required molecule types, the Hamming distances between DNA strands become smaller, leading to transient or stable unwanted binding of DNA strands [15]. In addition, molecules participating in a basic function need to search for each other through diffusion. Increasing molecule types will increase the chance for a molecule to bind to an unwanted molecule and thus decrease the probability to be found by its target molecule, which would result in the decrease of computing speed together with increases of leakage. Therefore, scalability of DNA computing is limited to a certain finite size that cannot be extended by simply adding more computing units.

3 Directions for Future Development and Potential Approaches

3.1 Scaling-Up

The number of computing elements in a computing system determines its executable program complexity. Similar to other computational machines, the improvement of circuit scale is an important direction of evolution. The first modern electronic computer ENIAC contained 18,000 electron tubes [64]. Nowadays, tens of billions of transistors are integrated into an everyday mobile phone chip. For DNA computing, the effective scope is still solving problems that contain a small number of nodes or variables, using less than a few hundred participating DNA strands. Therefore, circuit scaling-up is an important requisite for functional evolution of DNA computing, which raises challenges including: (1) the specificity and efficiency of molecular reactions deteriorated with the increase of participating molecules; (2) the lack of a scalable computing architecture to realize the automatic mapping of complex algorithms to hardware implementations. For scaling up DNA computing, several strategies have been proposed and further explorations could be worthwhile.

3.1.1 Spatial Separation

A whole reaction can be split into different compartments by spatial separation. As a result, the molecules from different compartments are restricted from meeting each other; therefore, the reactions can be carried out efficiently in each compartment. Both semiconductor circuits and cells use the spatial separation strategy to control the material and information transmission pathways to complete complex computing tasks. In DNA computing, spatial separation has also been explored [65,66,67,68]. In 2011, Chandran et al. proposed a theoretical framework to implement parallel and scalable computation, using localized strand displacement reactions on the surface of DNA nanostructure [65]. In 2016, Genot et al. realized simultaneous observation of 104 reactions, by encapsulating a computational reaction system with various input conditions into droplets [67]. The molecules in each droplet reacted independently, since the oil film blocked molecular communication between droplets (Fig. 2a), which enabled parallel and high-resolution mapping of circuits.

Fig. 2
A schematic illustrates 3 types of separation, isolation, communicative separation, and templated separation. Templated separation consists of D N A origami.

Three types of spatial separation that may inspire design of DNA computing systems

Communicative separation allows different spatial positions to store different chemical information with input/output communication. In 2019, Joesaar et al. encapsulated basic logic blocks into proteinosomes as protocells to mimic the function of natural cells [55]. As the output of one reaction, the released DNA strands in one protocell could pass through protocell membranes, diffuse in solution, and then enter another protocell, triggering the reactions in the downstream as input strands. This approach was supported by protocell–protocell communications (Fig. 2b), which holds potential for high-performance DNA computing with distributed systems. However, the computation demonstrated so far is limited to several protocells with single logic gates inside. Circuit size in each protocell and communication efficiency between protocells remain to be explored.

The addressability of DNA origami at nanometer precision facilitated the confinement of molecule reactions on DNA origami surface, which enables templated separation of reactions on different origamis (Fig. 2c). In 2017, Chatterjee et al. proposed spatialized information propagation by using DNA origami as the canvas to design circuits [33]. The circuit on DNA origami receives input signals and fuel strands from the solution and releases output strands for readout, allowing signal communication between origamis. However, circuits across multiple origamis via inter-origami communication have not been realized. Due to the lack of threshold and amplifier functions in such spatial-separated systems, this strategy still faces challenges when increasing scale of the circuit. DNA self-assembly may provide alternative solutions to construct localized response elements with noise suppression and signal amplification functions.

3.1.2 Combination of Order and Disorder

Collision events of DNA molecules in solution are disordered, which brings a high degree of parallelism to DNA computation together with inaccuracy at some extent. The parallelism means subdividing a deterministic space into a probability space for one calculation. Taking the search algorithm as an example, in sequential computing, each time a possible path is explored, and a specific output is generated. For parallel DNA molecular reactions, all possible paths are explored simultaneously. As the number of possible reaction paths increases, detectors with higher sensitivity and accuracy are required to obtain the solution for a problem. In addition, as the number of molecules increases in the system, the probability for a single molecule of being at the state of non-specific binding is higher, which will consequently reduce the speed and probability along the correct reaction path and increase the probability of signal leakage.

Living systems provide unique examples for coordinating reactions involving a large amount and variety of molecules. In cells, spatial compartments are utilized to confine reactions into small reaction containers. With a certain degree of fluidity, the skeletal structures of cells provide a heterogeneous environment for disordered reactions. Cells aggregate to form ordered tissues, organs, and finally the organisms. With the combination of disorder and order, organisms have evolved advanced computing capabilities (e.g., learning, thinking, and decision-making). High-precision manufacturing technologies, including DNA nanotechnology, hold potential to build highly ordered containers for DNA molecular reactions. The ordered organization of computing modules, together with the disordered molecular reactions, will provide high parallelism and overall coordination to the computing system (Fig. 3). It is possible to develop more complex artificial DNA-based computing systems in vitro with improved synthetic intelligence.

Fig. 3
A schematic of ordered and disordered architecture. It consists of straight lines forming a mesh structure and bubbles with dots present inside them.

Schematic illustration of high-level ordered and low-level disordered architecture

3.1.3 Reversible and Directional Reaction

Currently, the generate-and-test approach is the most widely utilized one to experimentally demonstrate DNA computing process. When the scale of potential solutions exceeds the amounts of available molecules, a problem becomes theoretically infeasible. In fact, the faithful readout of the result is also limited by the proportion of correct calculation result. A solution to a problem could be viewed as a correct assembly of DNA molecules, and a high yield of the correct DNA assembly will be of benefit to the filtration of correct answer. If the yield is too low, a solvable problem may be misinterpreted as no solution. Condon and coworkers proposed a strategy using reversible strand displacement reactions for DNA computations that are space and energy efficient [69]. Thubagere et al. proposed a random walk-based algorithm for cargo-sorting [46], in which a single-stranded DNA robot picks up cargo irreversibly through toehold-mediated strand displacement reactions. Carrying the cargo, the robot performs a reversible random walk among adjacent tracks via toehold exchange until reaching the goal track for cargo drop-off.

The reversible motion strategy mentioned above could be extended to solve complex optimization problems. Taking a maze-solving problem for example, assuming the probability of stepping in each allowed direction is equal, the probability for a navigator to reach the exit of the maze shown in Fig. 4a is 1/3456. As each individual navigator randomly follows a correct or wrong path for the maze solution, it only requires 16 steps to reach the exit (Fig. 5a). However, the probability suggests that only ~ 0.03 nM correct assembly would be obtained with 100 nM navigators, making it hard to detect the correct solution. Using a reversible search algorithm, the navigator could return to a node that has been visited, while its last step to exit is irreversible (Fig. 4b). Through this approach, there is no dead end except the exit; thus, every navigator is capable of reaching the exit. In a simulation, more than 50% navigators reached the exit within 1000 steps (Fig. 5b). With possible repeated visit of an intermediate node, it takes remarkably more time to reach the exit for the reversible navigator than the irreversible one. However, the arrived percentage of reversible navigators exceeds that of irreversible navigators in 100 steps (Fig. 5c). Sacrificing time for higher success probability may provide an approach. For a reversible system, time sacrifice will probably lead to higher success probability with higher yields of correct DNA assemblies, which may provide solutions to complex tasks beyond the practical computing power.

Fig. 4
2 schematic illustrations a and b. A is an irreversible search algorithm. B is a reversible search algorithm. The illustrations have entry and exit bubbles marked. All bubbles are interconnected to form a maze.

Schematic illustrations of irreversible search algorithm (a) and reversible search algorithm (b) for a maze

Fig. 5
3 graphs plot arrived percentage versus steps. Graph A plots straight lines. Graph B plots an increasing line, and graph C plots an increasing reversible line with a straight irreversible line.

Simulated overall success rates of maze solving. a All navigators that could reach the exit undergoes 16-step propagation in the irreversible algorithm. b Arrived percentage increases with step numbers in the reversible algorithm. c Comparison of arrived percentage in the two propagation modes within 200 steps

3.1.4 More Efficient Molecular Searching Modes

In homogeneous solutions, molecules recognize each other mainly through diffusion. This is why DNA computing systems constructed from diffusive building blocks usually face a limitation in reaction rates toward the correct pathway. Commonly, tens of hours are needed to finish calculation when hundreds of DNA strands are involved [45]. Fast computing under low DNA concentrations could be realized by introducing new molecular searching modes. As a successful demonstration, calculations were completed in minutes under nanomolarity concentrations [33], by using DNA origami to confine the diffusion of each computing element into nanoscale regions.

Inspiration may also be obtained from the natural systems. In the cellular environment, the searching process of a protein toward its target DNA segments generally involves complex motions, including sliding, hopping, and intersegmental transfer as well as diffusion [70] (Fig. 6). When the DNA strand is stretched, protein prefers 1-dimensional lateral search. When the DNA strand is coiled, which is the native state, proteins can transfer between spatially adjacent segments. Combining these searching modes, even at very low concentrations, proteins can realize fast target search throughout the whole cell. Learning from these natural molecular searching modes, high-performance DNA computing may be developed based on novel molecular interaction mechanisms. The accurate spatial addressing property of DNA nanotechnology and tunable mechanical rigidity of DNA nanostructures may offer novel scaffolds to construct molecular machines with new molecule motion modes. With these molecular machines, more fast and efficient molecular recognition may be possible, which would realize the increase of the executable program complexity of DNA computing systems.

Fig. 6
A schematic illustrates 4 D N A searching modes of proteins in cells. The 4 modes are 3 D diffusion, sliding, hopping, and intersegmental transfer.

Possible DNA searching modes of proteins in cells that may inspire more efficient molecular interactions for DNA computing in solution

3.2 Updating and Reusing

As a natural computer, life accumulates memories from past experiences, renews, and upgrades itself. Electrical computing chips also update their states either triggered by periodic clock or aperiodic pulse signals. Most proposed DNA computing devices are disposable and cannot be reused, because computing elements are permanently destroyed after calculation. Therefore, DNA computing is less developed in this respect, and developing renewable and reusable circuits will greatly expand the application scope of DNA computing.

3.2.1 Hardware Resetting

In DNA computing, the reset of the hardware state is based on chemical reactions, which is an important challenge that limits the sustainable use of computing devices. Spontaneous chemical reactions follow the tendency of energy change. By adding new strands to trigger reverse strand displacement [71], or by using the action of nicking enzyme to change the energy state of the system [72], the reaction can be reversed, and the input signal can be degraded, thereby realizing the reset of the computing device. Although resettable circuit implementations have been validated, the recovered concentration of input strands for next computing cycle reduced rapidly, making the circuit incapable of recycling. To perform sequential operations like electronic computers, further exploration on the design of a resettable DNA computing system is needed. Meanwhile, in combination with the parallelism of molecular reactions, highly parallel computing within one clock cycle may be developed. In this direction, to improve the reset efficiency, it is necessary to simplify the molecular structure design with further understanding of the underlying mechanisms.

3.2.2 Iteration and Update of Molecular Reaction Networks

Neuromorphic computing empowers artificial devices to learn from new inputs and realize self-update. Recently, DNA circuits-based neural network computing has been demonstrated [44, 45, 57]. However, the weight values of these neural networks were trained in silico. This one-time-use feature makes DNA circuits unable to renew themselves and thus uncapable of learning.

Evolutionary DNA algorithms have been proposed to overcome parallelism limitations by dividing the selection of the final answer from a single huge pool into recursive selections from various small pools [12, 13]. For example, in Systematic Evolution of Ligands by EXponential Enrichment (SELEX), a destructive process is performed to remove intermediate results that do not fit the constraints of a problem. These iterative selections allow evolution of computing results to approach the solution, which is different from the Adleman-style computing with a single selection at the final step [73,74,75,76]. The evolutionary strategy also provides possibilities to achieve molecular model training in machine learning. Rondelez’s team has developed a series of evolutionary DNA reaction networks using a toolbox of DNA processing enzymes, i.e., polymerase, nickase, and exonuclease [19, 47, 77,78,79]. With a rich library of DNA processing enzymes to generate, transfer, and degrade DNA signals, it is promising to experimentally implement more complicated evolutions with DNA-based reaction networks to mimic biological systems in vitro.

4 Summary

DNA computing has evolved, slowly but steadily during the last 30 years. Despite the remarkable progress, challenges remain in many facets, such as function diversity, feasible circuit size, and computing efficiency. In particular, DNA computing relies on molecular diffusion and recognition of DNA molecules, which is fundamentally different with conventional and other type of computing systems that use a universal signal (e.g., electron or photon). We envision that next-generation DNA computing with molecular intelligence may evolve with inspiration from both natural living systems and electronic computers.