Skip to main content
Log in

Parallel pairwise operations on data stored in DNA: sorting, XOR, shifting, and searching

  • Published:
Natural Computing Aims and scope Submit manuscript

Abstract

Prior research has introduced the Single-Instruction-Multiple-Data paradigm for DNA computing (SIMD DNA). It offers the potential for storing information and performing in-memory computations on DNA, with massive parallelism. This paper introduces three new SIMD DNA operations: sorting, shifting, and searching. Each is a fundamental operation in computer science. Our implementations demonstrate the effectiveness of parallel pairwise operations with this new paradigm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Perhaps counter-intuitively, sorting binary values in hardware is as difficult algorithmically as sorting arbitrary values such as integers or real numbers (Cormen et al. 2009).

References

  • Adleman LM (1994) Molecular computation of solutions to combinatorial problems. Science 266:1021–1024

    Article  Google Scholar 

  • Athreya N, Milenkovic O, Leburton J-P (2019) Detection and mapping of dsDNA breaks using graphene nanopore transistor. Biophys J 116(3):292

    Article  Google Scholar 

  • Broadwater DB, Kim HD (2016) The effect of basepair mismatch on DNA strand displacement. Biophys J 110(7):1476–1484

    Article  Google Scholar 

  • Ceze L, Nivala J, Strauss K (2019) Molecular digital data storage using DNA. Nat Rev Genet 20(8):456–466. https://doi.org/10.1038/s41576-019-0125-3

    Article  Google Scholar 

  • Chen T, Solanki A, Riedel M (2021) Parallel pairwise operations on data stored in DNA: sorting, shifting, and searching. In: 27th international conference on DNA computing and molecular programming (DNA 27). Schloss Dagstuhl-Leibniz-Zentrum für Informatik

  • Church G, Gao Y, Kosuri S (2012) Next-generation digital information storage in DNA. Science (New York, N.Y.) 337:1628. https://doi.org/10.1126/science.1226355

    Article  Google Scholar 

  • Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn. The MIT Press, London

    MATH  Google Scholar 

  • Doty D, Ong A (2021) Simulating 3-symbol turing machines with SIMD||DNA. arXiv preprint arXiv:2105.08559

  • Flynn MJ (1972) Some computer organizations and their effectiveness. IEEE Trans Comput 21(9):948–960

    Article  MATH  Google Scholar 

  • Krug J, Spohn H (1988) Universality classes for deterministic surface growth. Phys Rev A 38(8):4271

    Article  MathSciNet  Google Scholar 

  • Li W (1987) Power spectra of regular languages and cellular automata. Complex Syst 1(1):107–130

    MathSciNet  MATH  Google Scholar 

  • Li L, Jiang W, Lu Y (2018) A modified Gibson assembly method for cloning large DNA fragments with high GC contents. Synth Metab Pathw Methods Protoc 203–209

  • Liu K, Pan C, Kuhn A, Nievergelt AP, Fantner GE, Milenkovic O, Radenovic A (2019) Detecting topological variations of DNA at single-molecule level. Nat Commun 10(1):1–9

    Google Scholar 

  • Radding C.M, Beattie K.L, Holloman W.K, Wiegand R.C (1977) Uptake of homologous single-stranded fragments by superhelical dna: Iv. branch migration. Journal of molecular biology 116(4), 825–839

  • Salehi SA, Jiang H, Riedel MD, Parhi KK (2015) Molecular sensing and computing systems. IEEE Trans Mol Biol Multi-Scale Commun 1(3):249–264

    Article  Google Scholar 

  • Soloveichik D, Seelig G, Winfree E (2010) DNA as a universal substrate for chemical kinetics. Proc Natl Acad Sci 107(12):5393–5398. https://doi.org/10.1073/pnas.0909380107

    Article  Google Scholar 

  • Tabatabaei S, Wang B, Athreya N, Enghiad B, Hernandez A, Fields C, Leburton J-P, Soloveichik D, Zhao H, Milenkovic O (2020) DNA punch cards for storing data on native DNA sequences via enzymatic nicking. Nat Commun. https://doi.org/10.1038/s41467-020-15588-z

    Article  Google Scholar 

  • Tabatabaei SK, Wang B, Athreya NBM, Enghiad B, Hernandez AG, Fields CJ, Leburton J-P, Soloveichik D, Zhao H, Milenkovic O (2020) DNA punch cards for storing data on native DNA sequences via enzymatic nicking. Nat Commun 11(1):1–10

    Article  Google Scholar 

  • Wang B, Chalk C, Soloveichik D (2019) SIMD||DNA: single instruction, multiple data computation with DNA strand displacement cascades. In: Thachuk C, Liu Y (eds) DNA computing and molecular programming. Springer, Cham, pp 219–235

    Chapter  MATH  Google Scholar 

  • Yurke B, Turberfield AJ, Mills AP, Simmel FC, Neumann JL (2000) A DNA-fuelled molecular machine made of DNA. Nature 406(6796):605–608

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc Riedel.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Instructions for converting to another scheme

Instruction 1 identifies and distinguishes the two different bits. In instruction 1, strand (\({\text {S}}_1\) 1 2 3) is issued. In bit 0, the strand will displace the short strand over domains 2 and 3 but does not edit bit 1 since domain 1 is the only open domain for binding. In instruction 2, all domains in bit 1 are replaced by a single strand covering all domains with identifier \({\text {S}}_a\). Then in instruction 3, the strand \({\text {S}}_1\) is detached, so domains 1, 2, and 3 on bit 0 are exposed. In Instruction 4, all domains in bit 0 are replaced by a single strand covering all the domains with the identifier \({\text {S}}_b\). Then any encoding scheme with 7 domains in 1 cell could be written to the bits by first detaching strand \({\text {S}}_a\) and writing the encoding for bit 1, then detaching strand \({\text {S}}_b\) and writing the encoding for bit 0 (Fig. 9).

Fig. 9
figure 9

Current coding scheme can be converted to another coding scheme

Appendix B Detailed implementation of each step for parallel sorting

Here we give an instruction set for parallel binary bubble sort with the previously defined encoding scheme. We can implement each step of the sorting algorithm in 12 individual operations. Details of the design are shown in Fig. 10.

Fig. 10
figure 10

Instructions for parallel sorting

The 12 instructions fall into 2 stages. The first stage is “identifying.” During instructions 1-4, all the pairs (0, 1) are identified, and in both bit 0 and 1, a toehold is exposed for writing new data. More specifically, Instructions 1 and 2 identify the combination of (1, 0). In instruction 1, (\({\text {S}}_1\) 6 7 1 2 3) is issued to each pair of bits. In pair (0, 0), \({\text {S}}_1\) and domains 6, 7 are exposed. In pair (0, 1), since the only open domain is 1, it will not form a strong enough bond. In pair (1, 0), only \({\text {S}}_1\) is exposed. In pair (1, 1), \({\text {S}}_1\) and domains 2, 3 are exposed. In instruction 2, strand (6* 7* 1* 2* 3*) is issued to each pair of bits. Since pair (1, 0) is the only pair that does not have exposure 5 or 2, this strand will detach strand \({\text {S}}_1\) in each pair except pair (1, 0). After Instruction 2, the toehold between a bit value of 1 and a bit value of 0 in the pair (1, 0) is replaced by a strand with an identifier of \({\text {S}}_1\). Instruction 3 seals off the domain exposed in the other pairs during Instruction 1 and 2 so that it will not be edited later. In instruction 4, the strand with identifier \({\text {S}}_1\) is detached, exposing domains 6 and 7 in the left cell containing bit 1, or domains 2 and 3, in the right cell containing bit 0. After this instruction, toeholds are exposed only in the 1 s and 0 s in pair (1, 0). Other bits are not affected.

The second stage is flipping the bits in the pair (1, 0). In instruction 5, in the case of a bit value of 0, domains 2 and 3 are temporarily covered by a strand with identifier \({\text {S}}_2\) so that the writing process will not interfere with the identified 0 s at this moment. In instruction 6, a bit value of 1 is replaced by a strand with identifier \({\text {S}}_3\) via the open toehold at domains 6 and 7. The strand is then detached in instruction 8, exposing all the domains of that bit. Then, the bit value of 0 is written to the location of a bit value of 1 in instruction 8. In instruction 9, the temporary cover for a bit 0 is lifted. Then, in instructions 10 through 12, a bit 1 is written to the location of a bit value of 0 using the same scheme as instructions 6 through 8. Throughout the process, only bits identified in the first stage with toeholds exposed are affected.

Appendix C Detailed implementation of each step for parallel exclusive OR

The instructions are shown below, alongside an example of the Exclusive OR algorithm for sequence 11101 to 00000 in two iterations.

In each XOR iteration, the f(1,1) = (0,0) rewriting must be performed on non-overlapping pairs of bits. In the first iteration, the pairing is as follows: cell 0 with cell 1, cell 2 with cell 3, and so on. This means that all instruction strands only operate on these pairs. For this algorithm specifically, this can be achieved by using different sequences for the even versus the odd cells on the strand. In instruction 1, the strand (\({\text {S}}_1\) 6 7 1 2 3) is issued to identify (1,0) pairs. In instruction 2, strand (6* 7* 1* 2* 3*) is issued to detach any \({\text {S}}_1\) strands with exposed domains of 6 and 7, or 2 and 3. In instruction 3, the strands (\({\text {S}}_2\) 6 7 1) and (\({\text {S}}_3\) 1 2 3) are issued to identify (1,1) and (0,0) pairs respectively. Finally, (0,1) pairs are identified with strand (\({\text {S}}_4\) 4 5 6 7 1) for instruction 4. Now that all 1 domain toeholds are covered, strand (\({\text {S}}_2\)* 6* 7* 1*) is issued in instruction 5 to detach all \({\text {S}}_2\) and expose (1,1) pairs. In instruction 6, strand (\({\text {S}}_5\) 2 3 4 5 6 7 1 2 3 4 5 6 7) is issued to cover both cells in (1,1) pairs. Both \({\text {S}}_5\) and \({\text {S}}_4\) are now detached using strands (\({\text {S}}_5\)* 2* 3* 4* 5* 6* 7* 1* 2* 3* 4* 5* 6* 7*) and (\({\text {S}}_4\)* 4* 5* 6* 7*) in instruction 7. Then in instruction 8, all exposed cells are written to 0 using strands (2 3) and (4 5 6 7). In instruction 9, all \({\text {S}}_1\) and \({\text {S}}_3\) are detached using (\({\text {S}}_1\)* 6* 7* 1* 2* 3*) and (\({\text {S}}_3\)* 1* 2* 3*). By covering all exposed domains using strands (2 3) and (6 7) in instruction 10, all (1,1) pairs identified in the register are rewritten to (0,0) pairs. At this point, instructions 1-11 of the parallel sorting in Sect. 1 are implemented to write all (1,0) pairs to (0,1). For these sorting steps, the cell pairing can be overlapping. The result of this whole iteration of the XOR algorithm is a DNA sequence that has the same bit parity as the input, but is more ordered (i.e., closer to being sorted), and contains the same or fewer 1’s. In Fig. 11, the first iteration is carried out with non-overlapping pairs for cells 0 with 1, and so on. However, in Fig. 12, depicting a second iteration of the XOR algorithm, the pairing is: cell 1 with cell 2, cell 3 with cell 4, and so on. In the third iteration, the pairing can return to the original pairing in the first iteration. For a n bit register, after n iterations of the XOR algorithm, the last cell contains the output of the n bit XOR.

Fig. 11
figure 11

Instructions for the Exclusive OR. The first iteration converts 11101 to 00011

Fig. 12
figure 12

Instructions for the exclusive OR. The second iteration converts 00011 to 00000

Appendix D Detailed implementation of each step for parallel left shift cell

The instructions are shown as followed, with an example of shifting 11001 to 10011.

The first three instructions are exactly the same as those for identifying bit pairs in Sect. 3.1. In instruction 1, the strand (\({\text {S}}_1\) 6 7 1 2 3), which identifies the different patterns of two bits, is issued to each pair of bits. In instruction 2, strand (6* 7* 1* 2* 3*) is issued, detaching strands with open domains 6 and 7, or 2 and 3. After this instruction, strands with identifier \({\text {S}}_1\) only remain at pair (1, 0). In instruction 3, we issue two species of strands at the same time: (\({\text {S}}_2\) 6 7 1) and (\({\text {S}}_3\) 1 2 3). (\({\text {S}}_2\) 6 7 1) will bind with pair (1, 1) and (\({\text {S}}_3\) 1 2 3) will bind with pair (0, 0). \({\text {S}}_2\) will not form a stable binding with pair (0, 0) or (0, 1) because the binding area is only one domain. Same goes with \({\text {S}}_3\) and pair (1, 1) or (0, 1). After this instruction, only domain 1 between pair (0, 1) is still exposed. In instruction 4, strand (\({\text {S}}_4\) 4 5 6 7 1) is issued. Through the open domain 1 between pair (0, 1), the strand in bit 0 is replaced by \({\text {S}}_4\). After this step, the first bit in pair (1, 0) is identified with the strand \({\text {S}}_1\), and the first bit in pair (0, 1) is replaced with the strand \({\text {S}}_4\).

Instructions 5 to 9 rewrite the first bit in pair (1, 0) to 0. In instruction 5, the strand \({\text {S}}_1\) is detached, exposing domains 6, 7, 1, 2 and 3. The exposed domains 2 and 3 are sealed off in instruction 6 to not interfere with subsequent instructions. In instruction 7, strand (\({\text {S}}_5\) 2 3 4 5 6 7) is issued through the open toehold on domains 6 and 7 in the bit 1 in pair (1, 0), and displaces the strand in that bit. Since domains 2 and 3 are sealed off, bit 0 will not be modified in this instruction. In instruction 8, strand \({\text {S}}_5\) is detached, leaving the domains in the bit open. In instruction 9, strands (2 3) and (4 5 6 7), which represent 0, are written to the bit containing open domains.

In the final two instructions, we write 1 to the first bit in pair (0, 1). In instruction 10, 3 strands are issued to each pair of bits: (\({\text {S}}_2\)* 6* 7* 1*), (\({\text {S}}_3\)* 1* 2* 3*) and (\({\text {S}}_4\)* 4* 5* 6* 7* 1*). \({\text {S}}_2\), \({\text {S}}_3\) and \({\text {S}}_4\) are detached through these strands. Since \({\text {S}}_4\) covers the bit 0 in pair (0, 1), after this step, domains 3 and 4 are exposed in these bits, ready to be written to 1. In the final step, strands (2 3), (2 3 4 5), and (6 7) are issued to each cell. Strands (2 3) and (6 7) will fix the exposed domains from strand \({\text {S}}_2\) or \({\text {S}}_3\), and strand (2 3 4 5) will write bit 1 to the bit with domain 3 and 4 exposed. Details of the design are shown in Fig. 13.

Fig. 13
figure 13

Instructions for the left shift cell

For all the pairs of (0, 0) and (1, 1), the first bit in those pairs will not be modified since the toehold 1 will be covered with \({\text {S}}_2\) or \({\text {S}}_3\) in the process.

Appendix E Detailed implementation of the second level in parallel search

Here we discuss the second level of the parallel search operation. The first level of the search operation uses the instructions that were described in Sect. 3.1, except we now only issue strands to non-overlapping bit pairs. We use identifiers \(A_0=00, A_1=01, A_2=10, A_3=11\) to represent symbols in this level. For instance, to search for the target string 1011, we search for the symbol \(A_2\) in odd symbols and \(A_3\) in even symbols. The cases of \(A_2\) in even symbols and \(A_3\) in odd symbols are covered by searching with an offset.

In the first instruction of the second level, we uncover the \(A_2\) in the odd symbols, creating an open region. In instruction 2, we use a long strand to cover the entire right half of the symbol, from the start of identifier \(A_2\) to the rightmost cell. This strand is pulled out in instruction 3. In instruction 4, we use an identifier \(A_2'\) to cover domains 5, 6, 7 in the rightmost cell while covering all other domains.

Instructions 5 to 8 are essentially the same as instructions 1 to 4, but with two significant differences. Firstly, since \(A_3\) is the second symbol in the current level of query, we only search for even-numbered symbols (2, 4, 6, etc.). Secondly, instead of rewriting the right half of the symbol, we write the left half. We make the new identifier \(A_3'\) to cover domains 2, 3, 4 in the left-most cell. In instruction 9, we use identifier (\(B_{11}\) 5 6 7 1 2 3 4) to recognize the two consecutive symbols \(A_2\) and \(A_3\). Since, in the regular encoding, no strand starts from domain 5 or ends at domain 4, it will only form a perfect binding with a matched result.

After the identifier \({\text {B}}_{11}\) binds, we also need to clean up the imperfect bindings in case of a mismatch. Figure 14 shows the instructions for the cleanup process. In instruction 10, we first use the complementary strand (5* 6* 7* 1* 2* 3* 4*) to pull out the imperfect bond identifier \(B_{11}\). Then we issue strands covering the exposed domain. We first issue strands covering fewer domains, then in following instructions, we issue strands covering more domains. As a result, we always obtain a perfect fit; the strands will not be pulled out in potentially unrelated rewriting processes (Fig. 15).

Fig. 14
figure 14

Instructions for a search operation of target sequence 1011

Fig. 15
figure 15

Instructions for the cleanup process for a failed searching. These instructions won’t affect the result of a successful search

Appendix F Example of parallel bubble sort on an arbitrary bitstring

Consider the 12-bit long string \(S = 1100 1001 0110\). In each iteration of bubble sort, we first identify all (1, 0) pairs (shown in red) and then rewrite them to (0, 1) (shown in blue). For this string, the numerous iterations of the sorting algorithm are:

figure b

After 7 iterations, the final sorted string is 000000111111.

Appendix G Simulating the XOR algorithm

We used the SIMD DNA simulator written by Dave Doty and Aaron Ong (Doty and Ong 2021) to validate our XOR algorithm. We simulated two iterations of the XOR algorithm as shown in Figs. 11 and 12. First, we operated on odd-to-even bit pairs on a strand storing 11101 to obtain 00101. Then we operated on even-to-odd bit pairs on the strand storing 00011 (the sorted result of the first iteration) to obtain the XOR 00000. To ensure non-overlapping pairing, we used different domain sequences for odd bits compared to even bits—odd bits were sequenced as domains 1 to 7, while even bits were sequenced as domains 8–14.

The simulator validated our algorithm: all instructions shown in Figs. 11 and 12 simulated correctly. We have attached the simulation files and the predicted results for the two iterations in the supplementary data. These simulations show that the operation that rewrites non-overlapping (1, 1) pairs to (0, 0) preserves the parity of the register.

Appendix H Gibson assembly of a 2 bit register

Gibson Assembly of DNA molecules is achieved through the use of “sticky ends"—single stranded sequences at the ends of these molecules that allow them to concatenate. To create registers storing unique bit sequences, we use two different molecules to start off: pre-cell molecules (domains 2 to 7, with sticky ends on domains 2 and 7), and linker molecules (domains 7 1 2, with sticky ends on domains 7 and 2). To store a bit value of 0 in a pre-cell, a toehold on domain 4 is created. To store a bit value of 1, a toehold on domain 5 is created. This is shown in Fig. 16b.

Fig. 16
figure 16

Using Gibson Assembly to construct a register storing 01 from cells with the same sequence

Before concatenating two different pre-cells, their particular sticky ends must be “sealed"—those ends are no longer single stranded and cannot link together anymore. Sealing a particular sticky end can easily be done by adding a single strand of DNA that binds to that sticky end. For example, by sealing the sticky end on domain 2 of a pre-cell, that pre-cell can no longer concatenate with itself when the linker molecule is mixed. In Fig. 16c, pre-cell A only has a sticky end on domain 7, and pre-cell B only has a sticky end on domain 7. When these pre-cells are mixed together with the linker molecule, they will bind to each other in the order A to B. This creates a pre-register of those two pre-cells. The starting end of the pre-register has a domain 1 concatenated through a “cap" molecule (domains 1 and 2, with a sticky end at domain 2) as shown in Fig. 16e and f. After this stage, the pre-register can be treated with DNA ligase to seal all nicks. The resulting DNA strand contains the cells A and B which contain toeholds at domain 4 and 5 respectively. All 1 domains across all cells in this strand can be exposed into to toehold domains through nicking and gentle denaturing. Finally, this DNA molecule (which encodes 01 based on the pre-cell encoding scheme) can be converted to the bit encoding scheme used in this paper 2 through the procedure described in Sects. 3.2 and 8.3. This entire procedure yields a 2 bit register storing the bits 0 and 1 in that order.

This approach can be used to construct registers of any arbitrary number of bits despite all cells having the same sequence. This is because pre-registers can also be concatenated in the same manner as pre-cells as shown in Fig. 16c. For this, the sealed ends of a pre-register must be unsealed (through the use of an exonuclease) to create sticky ends again.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Solanki, A., Chen, T. & Riedel, M. Parallel pairwise operations on data stored in DNA: sorting, XOR, shifting, and searching. Nat Comput (2023). https://doi.org/10.1007/s11047-023-09964-z

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11047-023-09964-z

Keywords

Navigation