Data storage in cellular DNA: contextualizing diverse encoding schemes

Abstract

Nature has been using DNA to store biological data for millions of years, and finally humans are learning to use the same medium for our own data. In this paper, we survey the field of cellular DNA encoding, where encoding schemes are used to insert data into pcDNA and ncDNA areas while bypassing the biological restrictions associated with those areas. We first characterize the unique bio-restrictions associated with existing cellular DNA encoding schemes, then we contrast the schemes with respect to the restrictions they meet, supported features, and implementation details. We discuss the pros and cons of the implementation of each encoding scheme, and make recommendations accordingly. Finally, we highlight existing gaps, and provide our insight into future research directions.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

References

  1. 1.

    Brunet T (2016) Aims and methods of biosteganography. J Biotechnol 226:56–64

    Article  Google Scholar 

  2. 2.

    Zhirnov V, Zadegan R, Sandhu G, Church G, Hughes W (2016) Nucleic acid memory. Nat Mater 15:336–370

    Article  Google Scholar 

  3. 3.

    Tanaka H (2008) Evaluation of information leakage via electromagnetic emanation and effectiveness of tempest. IEICE Trans Inform Syst 91(5):1439–1446

    Article  Google Scholar 

  4. 4.

    Lee S-H (2014) Dwt based coding DNA watermarking for DNA copyright protection. Inform Sci 273:263–286

    Article  Google Scholar 

  5. 5.

    Palkopoulou E, Mallick S, Skoglund P, Enk J, Rohland N, Li H, Omrak A, Vartanyan S, Poinar H, Götherström A (2015) Complete genomes reveal signatures of demographic and genetic declines in the woolly mammoth. Curr Biol 25(10):1395–1400

    Article  Google Scholar 

  6. 6.

    Clelland C, Risca V, Bancroft C (1999) Hiding messages in DNA microdots. Nature 399:533–534

    Article  Google Scholar 

  7. 7.

    Church GM, Gao Y, Kosuri S (2012) Next-generation digital information storage in DNA. Science 337(6102):1628–1628

    Article  Google Scholar 

  8. 8.

    Goldman N, Bertone P, Chen S, Dessimoz C, LeProust EM, Sipos B, Birney E (2013) Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature 494(7435):77–80

    Article  Google Scholar 

  9. 9.

    Huffman D (1952) A method for the construction of minimum-redundancy codes. Proc IRE 40(9):1098–1101

    MATH  Article  Google Scholar 

  10. 10.

    Grass RN, Heckel R, Puddu M, Paunescu D, Stark WJ (2015) Robust chemical preservation of digital information on DNA in silica with error-correcting codes. Angewandte Chemie Int Ed 54(8):2552–2555

    Article  Google Scholar 

  11. 11.

    Reed IS, Solomon G (1960) Polynomial codes over certain finite fields. J Soc Indus Appl Math 8(2):300–304

    MathSciNet  MATH  Article  Google Scholar 

  12. 12.

    Yazdi SHT, Yuan Y, Ma J, Zhao H, Milenkovic O (2015) A rewritable, random-access DNA-based storage system. Sci Rep 5:14138

    Article  Google Scholar 

  13. 13.

    Blawat M, Gaedke K, Huetter I, Chen X-M, Turczyk B, Inverso S, Pruitt B, Church G (2016) Forward error correction for DNA data storage. Procedia Comput Sci 80:1011–1022

    Article  Google Scholar 

  14. 14.

    Bose DR-CRC (1960) On a class of error correcting binary group codes. Inform Control 3(1):68–79

    MathSciNet  MATH  Article  Google Scholar 

  15. 15.

    Lin S, Costello DJ (2004) Error control coding. Pearson Education India

  16. 16.

    Bornholt J, Lopez R, Carmean D M,Ceze L, Seelig G, Strauss K (2016) A DNA-based archival storagesystem. In: Proceedings of the twenty-first international conference on architectural support for programming languages and operating systems. ACM, pp 637–649

  17. 17.

    Potdar V M, Han S, Chang E (2005) Fingerprinted secret sharing steganography for robustness againstimage cropping attacks. In: INDIN’05. 2005 3rd IEEE International Conference on Industrial Informatics, IEEE, pp 717–724

  18. 18.

    Jung K-H, Yoo K-Y (2009) Data hiding method using image interpolation. Comput Stand Interfaces 31(2):465–470

    Article  Google Scholar 

  19. 19.

    Li Z, Chen X, Pan X, Zeng X (2009) Losslessdata hiding scheme based on adjacent pixel difference. In: Computer Engineering and Technology, ICCET’09. International Conferenceon, vol 1, IEEE, pp 588–592

  20. 20.

    Manikopoulos C, Shi Y-Q, Song S, Zhang Z, Ni Z, Zou D (2002) Detection of block dct-based steganography in gray-scale images. In: Multimedia signal processing, 2002 IEEE Workshop on IEEE, pp 355–358

  21. 21.

    McKeon R T (2007) Strange fourier steganography in movies. In: 2007 IEEE International Conference on Electro/Information Technology, IEEE, pp 178–182

  22. 22.

    Chen W-Y (2007) Color image steganography scheme using set partitioning in hierarchical trees coding, digital fourier transform and adaptive phase modulation. Appl Math Comput 185(1):432–448

    MathSciNet  MATH  Google Scholar 

  23. 23.

    Potdar V M, Han S, Chang E (2005) A survey of digital image watermarking techniques. In: INDIN’05. 2005 3rd IEEE international conference on industrial informatics, IEEE, pp 709–716

  24. 24.

    Verma B, Jain S, Agarwal D (2005) Watermarking image databases: a review. In: Proceedings of the international conference on cognition and recognition, Mandya, Karnataka, India, pp 171–179

  25. 25.

    Abdulaziz N, Pang K (2000) Robust data hiding for images. In: Communication technology proceedings, 2000. WCC-ICCT 2000. International Conference on IEEE, vol 1, pp 380–383

  26. 26.

    Fard A M, Akbarzadeh-T M-R, Varasteh-A F,Varasteh-A F (2006) A new genetic algorithm approach for securejpeg steganography. In: 2006 IEEE International conference on engineering of intelligent systems, IEEE, pp 1–6

  27. 27.

    Dailey Paulson L (2006) New system fights steganography

  28. 28.

    Abdelwahab AA, Hassaan LA (2008) A discrete wavelet transform based technique for image data hiding. In: Radio science conference, NRSC National, IEEE, pp 1–9

  29. 29.

    Sallee P (2004) Model-based steganography. In: Kalker T, Cox I, Ro YM (eds) International workshop on digital watermarking, vol 2939. Springer, Berlin, Heidelberg, pp 154–167

    Google Scholar 

  30. 30.

    Chang C-C, Tsai P, Lin M-H (2004) Anadaptive steganography for index-based images using codewordgrouping. In: Aizawa K, Nakamura Y, Satoh S (eds) Advances in multimedia information processing—PCM 2004, vol 3333. Springer, Berlin, Heidelberg, pp 731–738

    Google Scholar 

  31. 31.

    Hirohisa H (2002) A data embedding method using bpcs principle with new complexity measures. In: Proceedings of pacific rim workshop on digital steganography, pp 30–47

  32. 32.

    Wu Y-T, Shih FY (2006) Genetic algorithm based methodology for breaking the steganalytic systems. IEEE Trans Syst Man Cybern Part B (Cybernetics) 36(1):24–31

    Article  Google Scholar 

  33. 33.

    Adleman LM (1994) Molecular computation of solutions to combinatorial problems. Nature 369:40

    Article  Google Scholar 

  34. 34.

    Boneh D, Dunworth C, Lipton RJ, Sgall J (1996) On the computational power of DNA. Discrete Appl Math 71(1):79–94

    MathSciNet  MATH  Article  Google Scholar 

  35. 35.

    Kari L, Gloor G, Yu S (2000) Using DNA to solve the bounded post correspondence problem. Theor Comput Sci 231(2):193–203

    MathSciNet  MATH  Article  Google Scholar 

  36. 36.

    Ogihara M, Ray A (1999) Executing parallellogical operations with DNA. In: Evolutionary computation, 1999. CEC 99. Proceedings of the 1999 Congress on IEEE, vol 2

  37. 37.

    Stojanovic MN, Stefanovic D (2003) A deoxyribozyme-based molecular automaton. Nat Biotechnol 21(9):1069–1074

    Article  Google Scholar 

  38. 38.

    Macdonald J, Li Y, Sutovic M, Lederman H, Pendri K, Lu W, Andrews BL, Stefanovic D, Stojanovic MN (2006) Medium scale integration of molecular logic gates in an automaton. Nano Lett 6(11):2598–2603

    Article  Google Scholar 

  39. 39.

    Benenson Y, Gil B, Ben-Dor U, Adar R, Shapiro E (2004) An autonomous molecular computer for logical control of gene expression. Nature 429(6990):423–429

    Article  Google Scholar 

  40. 40.

    Nayebi A (2009) Fast matrix multiplication techniques based on the adleman-lipton model. arXiv preprintarXiv:0912.0750

  41. 41.

    Bonnet J, Yin P, Ortiz ME, Subsoontorn P, Endy D (2013) Amplifying genetic logic gates. Science 340(6132):599–603

    Article  Google Scholar 

  42. 42.

    Brophy JA, Voigt CA (2014) Principles of genetic circuit design. Nat Methods 11(5):508–520

    Article  Google Scholar 

  43. 43.

    Nielsen A A, Der B S, Shin J,Vaidyanathan P, Paralanov V, Strychalski E A, Ross D,Densmore D, Voigt C A (2016) Genetic circuit design automation. Science 352(6281):aac7341

  44. 44.

    Watson J, Crick F (1953) Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid. Nature 171(4356):737–738

    Article  Google Scholar 

  45. 45.

    Watson J, Baker T, Bell S, Gann A, Levine M, Losich R (2008) Molecular biology of the gene, 6th edn. Pearson, London

    Google Scholar 

  46. 46.

    Angov E (2011) Codon usage: nature’s roadmap to expression and folding of proteins. Biotechnol J 6(6):650–659

    Article  Google Scholar 

  47. 47.

    Consortium et al EP (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489(414):57–74

  48. 48.

    Blattner FEA (1997) The complete genome sequence of Escherichia coli k-12. Science 277(5331):1453–1462

    Article  Google Scholar 

  49. 49.

    Viguera E, Conceill D, Ehrlich S (2001) Replication slippage involves DNA polymerase pausing and dissociation. Embo J 20(10):2587–2595

    Article  Google Scholar 

  50. 50.

    Smith G, Fiddles C, Hawkins J, Cox J (2003) Some possible codes for encrypting data in DNA. Biotechnol Lett 25:1125–1130

    Article  Google Scholar 

  51. 51.

    Brenner S, Williams SR, Vermaas EH, Storck T, Moon K, McCollum C, Mao JI, Luo S, Kirchner JJ, Eletr S, DuBridge RB (2000) In:vitro cloning of complex mixtures of DNA on micro beads: physical separation of differentially expressed CDNAS. Proc Natl Acad Sci 97(4):1665–1670

  52. 52.

    Mousa H, Moustafa K, Abdel-Wahed W, Hadhoud MM (2011) Data biding based on contrast mapping using DNA medium. Int Arab J Inform Technolol 8(2):147–154

    Google Scholar 

  53. 53.

    Arita M, Yoshiaki O (2004) Secret signatures inside genomic DNA. Biotechnol Progr 20:1605–1607

    Article  Google Scholar 

  54. 54.

    Heider D, Barnekow A (2007) DNA-based watermarks using the DNA-crypt algorithm. BMC Bioinform 8:176. https://doi.org/10.1186/1471-2105-8-176

    Article  Google Scholar 

  55. 55.

    Liss M, Daubert D, Kliche K, Hammes U, Leiherer A, Wagner R (2012) Embedding permanent watermarks in synthetic genes. PLOS One 7:8

    Article  Google Scholar 

  56. 56.

    Khalifa A, Hamad S (2015) Hiding secret information in DNA sequences using silent mutations. Br J Math Comput Sci 11(5):1–11

    Article  Google Scholar 

  57. 57.

    Haughton D, Balado F (2013) Biocode: two biologically compatible algorithms for embedding data in non-coding and coding regions of DNA. BMC Bioinform 14(1):1

    Article  Google Scholar 

  58. 58.

    Lee S (2014) Dwt based coding DNA watermarking for DNA copyright protection. Inform Sci 273:263–286

    Article  Google Scholar 

  59. 59.

    Wong P, Wong K, Foote H (2003) Organic data memory using the DNA approach. Commun ACM 46(1):95–98

    Article  Google Scholar 

  60. 60.

    Yachie N, Sekiyama K, Sugahard J, Ohashi Y, Tomita M (2007) Alignment-based approach for durable data storage into living organisms. Biotechnol Progr 23:501–505

    Article  Google Scholar 

  61. 61.

    Ailenberg M, Rotstein O (2009) An improved huffman coding method for archiving text, images, and music characters in DNA. BioTechniques 47:747–754

    Article  Google Scholar 

  62. 62.

    Haughton D, Balado F (2011) Repetition coding as an effective error correction code for information encodedin DNA. In: 2011 IEEE 11th International Conference on Bioinformatics and Bioengineering. IEEE, Taichung, Taiwan

  63. 63.

    Heider D, Pyka M, Barnekow A (2009) DNA watermarks in non-coding regulatory sequences. BMC Res Notes 2:123

    Article  Google Scholar 

  64. 64.

    Kracht D, Schober S (2015) Insertion and deletion correcting DNA barcodes based watermarks. BMC Bioinform 16:50

    Article  Google Scholar 

  65. 65.

    Chun J, Lee H, Yoon J (2013) Passing go with DNA sequencing: delivering messages in a covert transgenic channel. IEEE CS Secur Priv Workshop 14:121

    Google Scholar 

  66. 66.

    De Silva P, Ganegoda G (2016) New trends ofdigital data storage in DNA. Biomed Res Int 2016:8072463. https://doi.org/10.1155/2016/8072463

    Article  Google Scholar 

  67. 67.

    Heider D, Kessler D, Barnekow A (2008) Watermarking sexually reproducing diploid organisms. Bioinformatics 24(17):1961–1962

    Article  Google Scholar 

  68. 68.

    Garesse R, Vallejo C (2001) Animal mitochondrial biogenesis and function: a regulatory cross-talk between two genomes. Gene 263:1–16

    Article  Google Scholar 

  69. 69.

    Ratel D, Ravanat J, Berger F, Wion D (2006) N6-methyladenine: the other methylated base of DNA. Bioessays 28(3):309–315

    Article  Google Scholar 

  70. 70.

    Srinivas N, Deb K (1994) Muiltiobjective optimization using nondominated sorting in genetic algorithms. Evol Comput 2(3):221–248

    Article  Google Scholar 

  71. 71.

    Dorigo M, Stützle T (2004) Ant colony optimization. Bradford Company

  72. 72.

    Zhou Z, Dang Y, Zhou M, Li L, Yu C-H, Fu J, Chen S, Liu Y (2016) Codon usage is an important determinant of gene expression levels largely through its effects on transcription. Proc Natl Acad Sci 113(41):E6117–E6125

    Article  Google Scholar 

  73. 73.

    Kunkel TA (2004) DNA replication fidelity. J Biol Chem 279(17):16895–16898

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Gaby G. Dagher.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Dagher, G.G., Machado, A.P., Davis, E.C. et al. Data storage in cellular DNA: contextualizing diverse encoding schemes. Evol. Intel. (2019). https://doi.org/10.1007/s12065-019-00202-z

Download citation

Keywords

  • Cellular DNA
  • Encoding
  • Data storage