Parallel pairwise operations on data stored in DNA: sorting, XOR, shifting, and searching

Solanki, Arnav; Chen, Tonglin; Riedel, Marc

doi:10.1007/s11047-023-09964-z

Parallel pairwise operations on data stored in DNA: sorting, XOR, shifting, and searching

Published: 21 September 2023

(2023)
Cite this article

Natural Computing Aims and scope Submit manuscript

120 Accesses
Explore all metrics

Abstract

Prior research has introduced the Single-Instruction-Multiple-Data paradigm for DNA computing (SIMD DNA). It offers the potential for storing information and performing in-memory computations on DNA, with massive parallelism. This paper introduces three new SIMD DNA operations: sorting, shifting, and searching. Each is a fundamental operation in computer science. Our implementations demonstrate the effectiveness of parallel pairwise operations with this new paradigm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 4

Performance improvement of the triangular matrix product in commodity clusters

Article Open access 15 April 2024

The New Hardware Development Trend and the Challenges in Data Management and Analysis

Article Open access 24 September 2018

Efficient High-Level Programming in Plain Java

Article 05 December 2022

Notes

Perhaps counter-intuitively, sorting binary values in hardware is as difficult algorithmically as sorting arbitrary values such as integers or real numbers (Cormen et al. 2009).

References

Adleman LM (1994) Molecular computation of solutions to combinatorial problems. Science 266:1021–1024
Article Google Scholar
Athreya N, Milenkovic O, Leburton J-P (2019) Detection and mapping of dsDNA breaks using graphene nanopore transistor. Biophys J 116(3):292
Article Google Scholar
Broadwater DB, Kim HD (2016) The effect of basepair mismatch on DNA strand displacement. Biophys J 110(7):1476–1484
Article Google Scholar
Ceze L, Nivala J, Strauss K (2019) Molecular digital data storage using DNA. Nat Rev Genet 20(8):456–466. https://doi.org/10.1038/s41576-019-0125-3
Article Google Scholar
Chen T, Solanki A, Riedel M (2021) Parallel pairwise operations on data stored in DNA: sorting, shifting, and searching. In: 27th international conference on DNA computing and molecular programming (DNA 27). Schloss Dagstuhl-Leibniz-Zentrum für Informatik
Church G, Gao Y, Kosuri S (2012) Next-generation digital information storage in DNA. Science (New York, N.Y.) 337:1628. https://doi.org/10.1126/science.1226355
Article Google Scholar
Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn. The MIT Press, London
MATH Google Scholar
Doty D, Ong A (2021) Simulating 3-symbol turing machines with SIMD||DNA. arXiv preprint arXiv:2105.08559
Flynn MJ (1972) Some computer organizations and their effectiveness. IEEE Trans Comput 21(9):948–960
Article MATH Google Scholar
Krug J, Spohn H (1988) Universality classes for deterministic surface growth. Phys Rev A 38(8):4271
Article MathSciNet Google Scholar
Li W (1987) Power spectra of regular languages and cellular automata. Complex Syst 1(1):107–130
MathSciNet MATH Google Scholar
Li L, Jiang W, Lu Y (2018) A modified Gibson assembly method for cloning large DNA fragments with high GC contents. Synth Metab Pathw Methods Protoc 203–209
Liu K, Pan C, Kuhn A, Nievergelt AP, Fantner GE, Milenkovic O, Radenovic A (2019) Detecting topological variations of DNA at single-molecule level. Nat Commun 10(1):1–9
Google Scholar
Radding C.M, Beattie K.L, Holloman W.K, Wiegand R.C (1977) Uptake of homologous single-stranded fragments by superhelical dna: Iv. branch migration. Journal of molecular biology 116(4), 825–839
Salehi SA, Jiang H, Riedel MD, Parhi KK (2015) Molecular sensing and computing systems. IEEE Trans Mol Biol Multi-Scale Commun 1(3):249–264
Article Google Scholar
Soloveichik D, Seelig G, Winfree E (2010) DNA as a universal substrate for chemical kinetics. Proc Natl Acad Sci 107(12):5393–5398. https://doi.org/10.1073/pnas.0909380107
Article Google Scholar
Tabatabaei S, Wang B, Athreya N, Enghiad B, Hernandez A, Fields C, Leburton J-P, Soloveichik D, Zhao H, Milenkovic O (2020) DNA punch cards for storing data on native DNA sequences via enzymatic nicking. Nat Commun. https://doi.org/10.1038/s41467-020-15588-z
Article Google Scholar
Tabatabaei SK, Wang B, Athreya NBM, Enghiad B, Hernandez AG, Fields CJ, Leburton J-P, Soloveichik D, Zhao H, Milenkovic O (2020) DNA punch cards for storing data on native DNA sequences via enzymatic nicking. Nat Commun 11(1):1–10
Article Google Scholar
Wang B, Chalk C, Soloveichik D (2019) SIMD||DNA: single instruction, multiple data computation with DNA strand displacement cascades. In: Thachuk C, Liu Y (eds) DNA computing and molecular programming. Springer, Cham, pp 219–235
Chapter MATH Google Scholar
Yurke B, Turberfield AJ, Mills AP, Simmel FC, Neumann JL (2000) A DNA-fuelled molecular machine made of DNA. Nature 406(6796):605–608
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, 55454, USA
Arnav Solanki, Tonglin Chen & Marc Riedel

Authors

Arnav Solanki
View author publications
You can also search for this author in PubMed Google Scholar
Tonglin Chen
View author publications
You can also search for this author in PubMed Google Scholar
Marc Riedel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marc Riedel.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Instructions for converting to another scheme

Instruction 1 identifies and distinguishes the two different bits. In instruction 1, strand (\({\text {S}}_1\) 1 2 3) is issued. In bit 0, the strand will displace the short strand over domains 2 and 3 but does not edit bit 1 since domain 1 is the only open domain for binding. In instruction 2, all domains in bit 1 are replaced by a single strand covering all domains with identifier \({\text {S}}_a\). Then in instruction 3, the strand \({\text {S}}_1\) is detached, so domains 1, 2, and 3 on bit 0 are exposed. In Instruction 4, all domains in bit 0 are replaced by a single strand covering all the domains with the identifier \({\text {S}}_b\). Then any encoding scheme with 7 domains in 1 cell could be written to the bits by first detaching strand \({\text {S}}_a\) and writing the encoding for bit 1, then detaching strand \({\text {S}}_b\) and writing the encoding for bit 0 (Fig. 9).

Appendix B Detailed implementation of each step for parallel sorting

Here we give an instruction set for parallel binary bubble sort with the previously defined encoding scheme. We can implement each step of the sorting algorithm in 12 individual operations. Details of the design are shown in Fig. 10.

The 12 instructions fall into 2 stages. The first stage is “identifying.” During instructions 1-4, all the pairs (0, 1) are identified, and in both bit 0 and 1, a toehold is exposed for writing new data. More specifically, Instructions 1 and 2 identify the combination of (1, 0). In instruction 1, (\({\text {S}}_1\) 6 7 1 2 3) is issued to each pair of bits. In pair (0, 0), \({\text {S}}_1\) and domains 6, 7 are exposed. In pair (0, 1), since the only open domain is 1, it will not form a strong enough bond. In pair (1, 0), only \({\text {S}}_1\) is exposed. In pair (1, 1), \({\text {S}}_1\) and domains 2, 3 are exposed. In instruction 2, strand (6* 7* 1* 2* 3*) is issued to each pair of bits. Since pair (1, 0) is the only pair that does not have exposure 5 or 2, this strand will detach strand \({\text {S}}_1\) in each pair except pair (1, 0). After Instruction 2, the toehold between a bit value of 1 and a bit value of 0 in the pair (1, 0) is replaced by a strand with an identifier of \({\text {S}}_1\). Instruction 3 seals off the domain exposed in the other pairs during Instruction 1 and 2 so that it will not be edited later. In instruction 4, the strand with identifier \({\text {S}}_1\) is detached, exposing domains 6 and 7 in the left cell containing bit 1, or domains 2 and 3, in the right cell containing bit 0. After this instruction, toeholds are exposed only in the 1 s and 0 s in pair (1, 0). Other bits are not affected.

The second stage is flipping the bits in the pair (1, 0). In instruction 5, in the case of a bit value of 0, domains 2 and 3 are temporarily covered by a strand with identifier \({\text {S}}_2\) so that the writing process will not interfere with the identified 0 s at this moment. In instruction 6, a bit value of 1 is replaced by a strand with identifier \({\text {S}}_3\) via the open toehold at domains 6 and 7. The strand is then detached in instruction 8, exposing all the domains of that bit. Then, the bit value of 0 is written to the location of a bit value of 1 in instruction 8. In instruction 9, the temporary cover for a bit 0 is lifted. Then, in instructions 10 through 12, a bit 1 is written to the location of a bit value of 0 using the same scheme as instructions 6 through 8. Throughout the process, only bits identified in the first stage with toeholds exposed are affected.

Appendix C Detailed implementation of each step for parallel exclusive OR

The instructions are shown below, alongside an example of the Exclusive OR algorithm for sequence 11101 to 00000 in two iterations.

In each XOR iteration, the f(1,1) = (0,0) rewriting must be performed on non-overlapping pairs of bits. In the first iteration, the pairing is as follows: cell 0 with cell 1, cell 2 with cell 3, and so on. This means that all instruction strands only operate on these pairs. For this algorithm specifically, this can be achieved by using different sequences for the even versus the odd cells on the strand. In instruction 1, the strand (\({\text {S}}_1\) 6 7 1 2 3) is issued to identify (1,0) pairs. In instruction 2, strand (6* 7* 1* 2* 3*) is issued to detach any \({\text {S}}_1\) strands with exposed domains of 6 and 7, or 2 and 3. In instruction 3, the strands (\({\text {S}}_2\) 6 7 1) and (\({\text {S}}_3\) 1 2 3) are issued to identify (1,1) and (0,0) pairs respectively. Finally, (0,1) pairs are identified with strand (\({\text {S}}_4\) 4 5 6 7 1) for instruction 4. Now that all 1 domain toeholds are covered, strand (\({\text {S}}_2\)* 6* 7* 1*) is issued in instruction 5 to detach all \({\text {S}}_2\) and expose (1,1) pairs. In instruction 6, strand (\({\text {S}}_5\) 2 3 4 5 6 7 1 2 3 4 5 6 7) is issued to cover both cells in (1,1) pairs. Both \({\text {S}}_5\) and \({\text {S}}_4\) are now detached using strands (\({\text {S}}_5\)* 2* 3* 4* 5* 6* 7* 1* 2* 3* 4* 5* 6* 7*) and (\({\text {S}}_4\)* 4* 5* 6* 7*) in instruction 7. Then in instruction 8, all exposed cells are written to 0 using strands (2 3) and (4 5 6 7). In instruction 9, all \({\text {S}}_1\) and \({\text {S}}_3\) are detached using (\({\text {S}}_1\)* 6* 7* 1* 2* 3*) and (\({\text {S}}_3\)* 1* 2* 3*). By covering all exposed domains using strands (2 3) and (6 7) in instruction 10, all (1,1) pairs identified in the register are rewritten to (0,0) pairs. At this point, instructions 1-11 of the parallel sorting in Sect. 1 are implemented to write all (1,0) pairs to (0,1). For these sorting steps, the cell pairing can be overlapping. The result of this whole iteration of the XOR algorithm is a DNA sequence that has the same bit parity as the input, but is more ordered (i.e., closer to being sorted), and contains the same or fewer 1’s. In Fig. 11, the first iteration is carried out with non-overlapping pairs for cells 0 with 1, and so on. However, in Fig. 12, depicting a second iteration of the XOR algorithm, the pairing is: cell 1 with cell 2, cell 3 with cell 4, and so on. In the third iteration, the pairing can return to the original pairing in the first iteration. For a n bit register, after n iterations of the XOR algorithm, the last cell contains the output of the n bit XOR.

Appendix D Detailed implementation of each step for parallel left shift cell

The instructions are shown as followed, with an example of shifting 11001 to 10011.

The first three instructions are exactly the same as those for identifying bit pairs in Sect. 3.1. In instruction 1, the strand (\({\text {S}}_1\) 6 7 1 2 3), which identifies the different patterns of two bits, is issued to each pair of bits. In instruction 2, strand (6* 7* 1* 2* 3*) is issued, detaching strands with open domains 6 and 7, or 2 and 3. After this instruction, strands with identifier \({\text {S}}_1\) only remain at pair (1, 0). In instruction 3, we issue two species of strands at the same time: (\({\text {S}}_2\) 6 7 1) and (\({\text {S}}_3\) 1 2 3). (\({\text {S}}_2\) 6 7 1) will bind with pair (1, 1) and (\({\text {S}}_3\) 1 2 3) will bind with pair (0, 0). \({\text {S}}_2\) will not form a stable binding with pair (0, 0) or (0, 1) because the binding area is only one domain. Same goes with \({\text {S}}_3\) and pair (1, 1) or (0, 1). After this instruction, only domain 1 between pair (0, 1) is still exposed. In instruction 4, strand (\({\text {S}}_4\) 4 5 6 7 1) is issued. Through the open domain 1 between pair (0, 1), the strand in bit 0 is replaced by \({\text {S}}_4\). After this step, the first bit in pair (1, 0) is identified with the strand \({\text {S}}_1\), and the first bit in pair (0, 1) is replaced with the strand \({\text {S}}_4\).

Instructions 5 to 9 rewrite the first bit in pair (1, 0) to 0. In instruction 5, the strand \({\text {S}}_1\) is detached, exposing domains 6, 7, 1, 2 and 3. The exposed domains 2 and 3 are sealed off in instruction 6 to not interfere with subsequent instructions. In instruction 7, strand (\({\text {S}}_5\) 2 3 4 5 6 7) is issued through the open toehold on domains 6 and 7 in the bit 1 in pair (1, 0), and displaces the strand in that bit. Since domains 2 and 3 are sealed off, bit 0 will not be modified in this instruction. In instruction 8, strand \({\text {S}}_5\) is detached, leaving the domains in the bit open. In instruction 9, strands (2 3) and (4 5 6 7), which represent 0, are written to the bit containing open domains.

In the final two instructions, we write 1 to the first bit in pair (0, 1). In instruction 10, 3 strands are issued to each pair of bits: (\({\text {S}}_2\)* 6* 7* 1*), (\({\text {S}}_3\)* 1* 2* 3*) and (\({\text {S}}_4\)* 4* 5* 6* 7* 1*). \({\text {S}}_2\), \({\text {S}}_3\) and \({\text {S}}_4\) are detached through these strands. Since \({\text {S}}_4\) covers the bit 0 in pair (0, 1), after this step, domains 3 and 4 are exposed in these bits, ready to be written to 1. In the final step, strands (2 3), (2 3 4 5), and (6 7) are issued to each cell. Strands (2 3) and (6 7) will fix the exposed domains from strand \({\text {S}}_2\) or \({\text {S}}_3\), and strand (2 3 4 5) will write bit 1 to the bit with domain 3 and 4 exposed. Details of the design are shown in Fig. 13.

For all the pairs of (0, 0) and (1, 1), the first bit in those pairs will not be modified since the toehold 1 will be covered with \({\text {S}}_2\) or \({\text {S}}_3\) in the process.

Appendix E Detailed implementation of the second level in parallel search

Here we discuss the second level of the parallel search operation. The first level of the search operation uses the instructions that were described in Sect. 3.1, except we now only issue strands to non-overlapping bit pairs. We use identifiers \(A_0=00, A_1=01, A_2=10, A_3=11\) to represent symbols in this level. For instance, to search for the target string 1011, we search for the symbol \(A_2\) in odd symbols and \(A_3\) in even symbols. The cases of \(A_2\) in even symbols and \(A_3\) in odd symbols are covered by searching with an offset.

In the first instruction of the second level, we uncover the \(A_2\) in the odd symbols, creating an open region. In instruction 2, we use a long strand to cover the entire right half of the symbol, from the start of identifier \(A_2\) to the rightmost cell. This strand is pulled out in instruction 3. In instruction 4, we use an identifier \(A_2'\) to cover domains 5, 6, 7 in the rightmost cell while covering all other domains.

Instructions 5 to 8 are essentially the same as instructions 1 to 4, but with two significant differences. Firstly, since \(A_3\) is the second symbol in the current level of query, we only search for even-numbered symbols (2, 4, 6, etc.). Secondly, instead of rewriting the right half of the symbol, we write the left half. We make the new identifier \(A_3'\) to cover domains 2, 3, 4 in the left-most cell. In instruction 9, we use identifier (\(B_{11}\) 5 6 7 1 2 3 4) to recognize the two consecutive symbols \(A_2\) and \(A_3\). Since, in the regular encoding, no strand starts from domain 5 or ends at domain 4, it will only form a perfect binding with a matched result.

After the identifier \({\text {B}}_{11}\) binds, we also need to clean up the imperfect bindings in case of a mismatch. Figure 14 shows the instructions for the cleanup process. In instruction 10, we first use the complementary strand (5* 6* 7* 1* 2* 3* 4*) to pull out the imperfect bond identifier \(B_{11}\). Then we issue strands covering the exposed domain. We first issue strands covering fewer domains, then in following instructions, we issue strands covering more domains. As a result, we always obtain a perfect fit; the strands will not be pulled out in potentially unrelated rewriting processes (Fig. 15).

Appendix F Example of parallel bubble sort on an arbitrary bitstring

Consider the 12-bit long string \(S = 1100 1001 0110\). In each iteration of bubble sort, we first identify all (1, 0) pairs (shown in red) and then rewrite them to (0, 1) (shown in blue). For this string, the numerous iterations of the sorting algorithm are:

After 7 iterations, the final sorted string is 000000111111.

Appendix G Simulating the XOR algorithm

We used the SIMD DNA simulator written by Dave Doty and Aaron Ong (Doty and Ong 2021) to validate our XOR algorithm. We simulated two iterations of the XOR algorithm as shown in Figs. 11 and 12. First, we operated on odd-to-even bit pairs on a strand storing 11101 to obtain 00101. Then we operated on even-to-odd bit pairs on the strand storing 00011 (the sorted result of the first iteration) to obtain the XOR 00000. To ensure non-overlapping pairing, we used different domain sequences for odd bits compared to even bits—odd bits were sequenced as domains 1 to 7, while even bits were sequenced as domains 8–14.

The simulator validated our algorithm: all instructions shown in Figs. 11 and 12 simulated correctly. We have attached the simulation files and the predicted results for the two iterations in the supplementary data. These simulations show that the operation that rewrites non-overlapping (1, 1) pairs to (0, 0) preserves the parity of the register.

Appendix H Gibson assembly of a 2 bit register

Gibson Assembly of DNA molecules is achieved through the use of “sticky ends"—single stranded sequences at the ends of these molecules that allow them to concatenate. To create registers storing unique bit sequences, we use two different molecules to start off: pre-cell molecules (domains 2 to 7, with sticky ends on domains 2 and 7), and linker molecules (domains 7 1 2, with sticky ends on domains 7 and 2). To store a bit value of 0 in a pre-cell, a toehold on domain 4 is created. To store a bit value of 1, a toehold on domain 5 is created. This is shown in Fig. 16b.

Before concatenating two different pre-cells, their particular sticky ends must be “sealed"—those ends are no longer single stranded and cannot link together anymore. Sealing a particular sticky end can easily be done by adding a single strand of DNA that binds to that sticky end. For example, by sealing the sticky end on domain 2 of a pre-cell, that pre-cell can no longer concatenate with itself when the linker molecule is mixed. In Fig. 16c, pre-cell A only has a sticky end on domain 7, and pre-cell B only has a sticky end on domain 7. When these pre-cells are mixed together with the linker molecule, they will bind to each other in the order A to B. This creates a pre-register of those two pre-cells. The starting end of the pre-register has a domain 1 concatenated through a “cap" molecule (domains 1 and 2, with a sticky end at domain 2) as shown in Fig. 16e and f. After this stage, the pre-register can be treated with DNA ligase to seal all nicks. The resulting DNA strand contains the cells A and B which contain toeholds at domain 4 and 5 respectively. All 1 domains across all cells in this strand can be exposed into to toehold domains through nicking and gentle denaturing. Finally, this DNA molecule (which encodes 01 based on the pre-cell encoding scheme) can be converted to the bit encoding scheme used in this paper 2 through the procedure described in Sects. 3.2 and 8.3. This entire procedure yields a 2 bit register storing the bits 0 and 1 in that order.

This approach can be used to construct registers of any arbitrary number of bits despite all cells having the same sequence. This is because pre-registers can also be concatenated in the same manner as pre-cells as shown in Fig. 16c. For this, the sealed ends of a pre-register must be unsealed (through the use of an exonuclease) to create sticky ends again.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Solanki, A., Chen, T. & Riedel, M. Parallel pairwise operations on data stored in DNA: sorting, XOR, shifting, and searching. Nat Comput (2023). https://doi.org/10.1007/s11047-023-09964-z

Download citation

Accepted: 15 August 2023
Published: 21 September 2023
DOI: https://doi.org/10.1007/s11047-023-09964-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel pairwise operations on data stored in DNA: sorting, XOR, shifting, and searching

Abstract

Access this article

Similar content being viewed by others

Performance improvement of the triangular matrix product in commodity clusters

The New Hardware Development Trend and the Challenges in Data Management and Analysis

Efficient High-Level Programming in Plain Java

Notes

References