Data storage in cellular DNA: contextualizing diverse encoding schemes

  • Gaby G. DagherEmail author
  • Anthony P. Machado
  • Eddie C. Davis
  • Thomas Green
  • John Martin
  • Matthew Ferguson
Special Issue


Nature has been using DNA to store biological data for millions of years, and finally humans are learning to use the same medium for our own data. In this paper, we survey the field of cellular DNA encoding, where encoding schemes are used to insert data into pcDNA and ncDNA areas while bypassing the biological restrictions associated with those areas. We first characterize the unique bio-restrictions associated with existing cellular DNA encoding schemes, then we contrast the schemes with respect to the restrictions they meet, supported features, and implementation details. We discuss the pros and cons of the implementation of each encoding scheme, and make recommendations accordingly. Finally, we highlight existing gaps, and provide our insight into future research directions.


Cellular DNA Encoding Data storage 



  1. 1.
    Brunet T (2016) Aims and methods of biosteganography. J Biotechnol 226:56–64Google Scholar
  2. 2.
    Zhirnov V, Zadegan R, Sandhu G, Church G, Hughes W (2016) Nucleic acid memory. Nat Mater 15:336–370Google Scholar
  3. 3.
    Tanaka H (2008) Evaluation of information leakage via electromagnetic emanation and effectiveness of tempest. IEICE Trans Inform Syst 91(5):1439–1446Google Scholar
  4. 4.
    Lee S-H (2014) Dwt based coding DNA watermarking for DNA copyright protection. Inform Sci 273:263–286Google Scholar
  5. 5.
    Palkopoulou E, Mallick S, Skoglund P, Enk J, Rohland N, Li H, Omrak A, Vartanyan S, Poinar H, Götherström A (2015) Complete genomes reveal signatures of demographic and genetic declines in the woolly mammoth. Curr Biol 25(10):1395–1400Google Scholar
  6. 6.
    Clelland C, Risca V, Bancroft C (1999) Hiding messages in DNA microdots. Nature 399:533–534Google Scholar
  7. 7.
    Church GM, Gao Y, Kosuri S (2012) Next-generation digital information storage in DNA. Science 337(6102):1628–1628Google Scholar
  8. 8.
    Goldman N, Bertone P, Chen S, Dessimoz C, LeProust EM, Sipos B, Birney E (2013) Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature 494(7435):77–80Google Scholar
  9. 9.
    Huffman D (1952) A method for the construction of minimum-redundancy codes. Proc IRE 40(9):1098–1101zbMATHGoogle Scholar
  10. 10.
    Grass RN, Heckel R, Puddu M, Paunescu D, Stark WJ (2015) Robust chemical preservation of digital information on DNA in silica with error-correcting codes. Angewandte Chemie Int Ed 54(8):2552–2555Google Scholar
  11. 11.
    Reed IS, Solomon G (1960) Polynomial codes over certain finite fields. J Soc Indus Appl Math 8(2):300–304MathSciNetzbMATHGoogle Scholar
  12. 12.
    Yazdi SHT, Yuan Y, Ma J, Zhao H, Milenkovic O (2015) A rewritable, random-access DNA-based storage system. Sci Rep 5:14138Google Scholar
  13. 13.
    Blawat M, Gaedke K, Huetter I, Chen X-M, Turczyk B, Inverso S, Pruitt B, Church G (2016) Forward error correction for DNA data storage. Procedia Comput Sci 80:1011–1022Google Scholar
  14. 14.
    Bose DR-CRC (1960) On a class of error correcting binary group codes. Inform Control 3(1):68–79MathSciNetzbMATHGoogle Scholar
  15. 15.
    Lin S, Costello DJ (2004) Error control coding. Pearson Education IndiaGoogle Scholar
  16. 16.
    Bornholt J, Lopez R, Carmean D M,Ceze L, Seelig G, Strauss K (2016) A DNA-based archival storagesystem. In: Proceedings of the twenty-first international conference on architectural support for programming languages and operating systems. ACM, pp 637–649Google Scholar
  17. 17.
    Potdar V M, Han S, Chang E (2005) Fingerprinted secret sharing steganography for robustness againstimage cropping attacks. In: INDIN’05. 2005 3rd IEEE International Conference on Industrial Informatics, IEEE, pp 717–724Google Scholar
  18. 18.
    Jung K-H, Yoo K-Y (2009) Data hiding method using image interpolation. Comput Stand Interfaces 31(2):465–470Google Scholar
  19. 19.
    Li Z, Chen X, Pan X, Zeng X (2009) Losslessdata hiding scheme based on adjacent pixel difference. In: Computer Engineering and Technology, ICCET’09. International Conferenceon, vol 1, IEEE, pp 588–592Google Scholar
  20. 20.
    Manikopoulos C, Shi Y-Q, Song S, Zhang Z, Ni Z, Zou D (2002) Detection of block dct-based steganography in gray-scale images. In: Multimedia signal processing, 2002 IEEE Workshop on IEEE, pp 355–358Google Scholar
  21. 21.
    McKeon R T (2007) Strange fourier steganography in movies. In: 2007 IEEE International Conference on Electro/Information Technology, IEEE, pp 178–182Google Scholar
  22. 22.
    Chen W-Y (2007) Color image steganography scheme using set partitioning in hierarchical trees coding, digital fourier transform and adaptive phase modulation. Appl Math Comput 185(1):432–448MathSciNetzbMATHGoogle Scholar
  23. 23.
    Potdar V M, Han S, Chang E (2005) A survey of digital image watermarking techniques. In: INDIN’05. 2005 3rd IEEE international conference on industrial informatics, IEEE, pp 709–716Google Scholar
  24. 24.
    Verma B, Jain S, Agarwal D (2005) Watermarking image databases: a review. In: Proceedings of the international conference on cognition and recognition, Mandya, Karnataka, India, pp 171–179Google Scholar
  25. 25.
    Abdulaziz N, Pang K (2000) Robust data hiding for images. In: Communication technology proceedings, 2000. WCC-ICCT 2000. International Conference on IEEE, vol 1, pp 380–383Google Scholar
  26. 26.
    Fard A M, Akbarzadeh-T M-R, Varasteh-A F,Varasteh-A F (2006) A new genetic algorithm approach for securejpeg steganography. In: 2006 IEEE International conference on engineering of intelligent systems, IEEE, pp 1–6Google Scholar
  27. 27.
    Dailey Paulson L (2006) New system fights steganographyGoogle Scholar
  28. 28.
    Abdelwahab AA, Hassaan LA (2008) A discrete wavelet transform based technique for image data hiding. In: Radio science conference, NRSC National, IEEE, pp 1–9Google Scholar
  29. 29.
    Sallee P (2004) Model-based steganography. In: Kalker T, Cox I, Ro YM (eds) International workshop on digital watermarking, vol 2939. Springer, Berlin, Heidelberg, pp 154–167Google Scholar
  30. 30.
    Chang C-C, Tsai P, Lin M-H (2004) Anadaptive steganography for index-based images using codewordgrouping. In: Aizawa K, Nakamura Y, Satoh S (eds) Advances in multimedia information processing—PCM 2004, vol 3333. Springer, Berlin, Heidelberg, pp 731–738Google Scholar
  31. 31.
    Hirohisa H (2002) A data embedding method using bpcs principle with new complexity measures. In: Proceedings of pacific rim workshop on digital steganography, pp 30–47Google Scholar
  32. 32.
    Wu Y-T, Shih FY (2006) Genetic algorithm based methodology for breaking the steganalytic systems. IEEE Trans Syst Man Cybern Part B (Cybernetics) 36(1):24–31Google Scholar
  33. 33.
    Adleman LM (1994) Molecular computation of solutions to combinatorial problems. Nature 369:40Google Scholar
  34. 34.
    Boneh D, Dunworth C, Lipton RJ, Sgall J (1996) On the computational power of DNA. Discrete Appl Math 71(1):79–94MathSciNetzbMATHGoogle Scholar
  35. 35.
    Kari L, Gloor G, Yu S (2000) Using DNA to solve the bounded post correspondence problem. Theor Comput Sci 231(2):193–203MathSciNetzbMATHGoogle Scholar
  36. 36.
    Ogihara M, Ray A (1999) Executing parallellogical operations with DNA. In: Evolutionary computation, 1999. CEC 99. Proceedings of the 1999 Congress on IEEE, vol 2Google Scholar
  37. 37.
    Stojanovic MN, Stefanovic D (2003) A deoxyribozyme-based molecular automaton. Nat Biotechnol 21(9):1069–1074Google Scholar
  38. 38.
    Macdonald J, Li Y, Sutovic M, Lederman H, Pendri K, Lu W, Andrews BL, Stefanovic D, Stojanovic MN (2006) Medium scale integration of molecular logic gates in an automaton. Nano Lett 6(11):2598–2603Google Scholar
  39. 39.
    Benenson Y, Gil B, Ben-Dor U, Adar R, Shapiro E (2004) An autonomous molecular computer for logical control of gene expression. Nature 429(6990):423–429Google Scholar
  40. 40.
    Nayebi A (2009) Fast matrix multiplication techniques based on the adleman-lipton model. arXiv preprintarXiv:0912.0750Google Scholar
  41. 41.
    Bonnet J, Yin P, Ortiz ME, Subsoontorn P, Endy D (2013) Amplifying genetic logic gates. Science 340(6132):599–603Google Scholar
  42. 42.
    Brophy JA, Voigt CA (2014) Principles of genetic circuit design. Nat Methods 11(5):508–520Google Scholar
  43. 43.
    Nielsen A A, Der B S, Shin J,Vaidyanathan P, Paralanov V, Strychalski E A, Ross D,Densmore D, Voigt C A (2016) Genetic circuit design automation. Science 352(6281):aac7341Google Scholar
  44. 44.
    Watson J, Crick F (1953) Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid. Nature 171(4356):737–738Google Scholar
  45. 45.
    Watson J, Baker T, Bell S, Gann A, Levine M, Losich R (2008) Molecular biology of the gene, 6th edn. Pearson, LondonGoogle Scholar
  46. 46.
    Angov E (2011) Codon usage: nature’s roadmap to expression and folding of proteins. Biotechnol J 6(6):650–659Google Scholar
  47. 47.
    Consortium et al EP (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489(414):57–74Google Scholar
  48. 48.
    Blattner FEA (1997) The complete genome sequence of Escherichia coli k-12. Science 277(5331):1453–1462Google Scholar
  49. 49.
    Viguera E, Conceill D, Ehrlich S (2001) Replication slippage involves DNA polymerase pausing and dissociation. Embo J 20(10):2587–2595Google Scholar
  50. 50.
    Smith G, Fiddles C, Hawkins J, Cox J (2003) Some possible codes for encrypting data in DNA. Biotechnol Lett 25:1125–1130Google Scholar
  51. 51.
    Brenner S, Williams SR, Vermaas EH, Storck T, Moon K, McCollum C, Mao JI, Luo S, Kirchner JJ, Eletr S, DuBridge RB (2000) In:vitro cloning of complex mixtures of DNA on micro beads: physical separation of differentially expressed CDNAS. Proc Natl Acad Sci 97(4):1665–1670Google Scholar
  52. 52.
    Mousa H, Moustafa K, Abdel-Wahed W, Hadhoud MM (2011) Data biding based on contrast mapping using DNA medium. Int Arab J Inform Technolol 8(2):147–154Google Scholar
  53. 53.
    Arita M, Yoshiaki O (2004) Secret signatures inside genomic DNA. Biotechnol Progr 20:1605–1607Google Scholar
  54. 54.
    Heider D, Barnekow A (2007) DNA-based watermarks using the DNA-crypt algorithm. BMC Bioinform 8:176. Google Scholar
  55. 55.
    Liss M, Daubert D, Kliche K, Hammes U, Leiherer A, Wagner R (2012) Embedding permanent watermarks in synthetic genes. PLOS One 7:8Google Scholar
  56. 56.
    Khalifa A, Hamad S (2015) Hiding secret information in DNA sequences using silent mutations. Br J Math Comput Sci 11(5):1–11Google Scholar
  57. 57.
    Haughton D, Balado F (2013) Biocode: two biologically compatible algorithms for embedding data in non-coding and coding regions of DNA. BMC Bioinform 14(1):1Google Scholar
  58. 58.
    Lee S (2014) Dwt based coding DNA watermarking for DNA copyright protection. Inform Sci 273:263–286Google Scholar
  59. 59.
    Wong P, Wong K, Foote H (2003) Organic data memory using the DNA approach. Commun ACM 46(1):95–98Google Scholar
  60. 60.
    Yachie N, Sekiyama K, Sugahard J, Ohashi Y, Tomita M (2007) Alignment-based approach for durable data storage into living organisms. Biotechnol Progr 23:501–505Google Scholar
  61. 61.
    Ailenberg M, Rotstein O (2009) An improved huffman coding method for archiving text, images, and music characters in DNA. BioTechniques 47:747–754Google Scholar
  62. 62.
    Haughton D, Balado F (2011) Repetition coding as an effective error correction code for information encodedin DNA. In: 2011 IEEE 11th International Conference on Bioinformatics and Bioengineering. IEEE, Taichung, TaiwanGoogle Scholar
  63. 63.
    Heider D, Pyka M, Barnekow A (2009) DNA watermarks in non-coding regulatory sequences. BMC Res Notes 2:123Google Scholar
  64. 64.
    Kracht D, Schober S (2015) Insertion and deletion correcting DNA barcodes based watermarks. BMC Bioinform 16:50Google Scholar
  65. 65.
    Chun J, Lee H, Yoon J (2013) Passing go with DNA sequencing: delivering messages in a covert transgenic channel. IEEE CS Secur Priv Workshop 14:121Google Scholar
  66. 66.
    De Silva P, Ganegoda G (2016) New trends ofdigital data storage in DNA. Biomed Res Int 2016:8072463. Google Scholar
  67. 67.
    Heider D, Kessler D, Barnekow A (2008) Watermarking sexually reproducing diploid organisms. Bioinformatics 24(17):1961–1962Google Scholar
  68. 68.
    Garesse R, Vallejo C (2001) Animal mitochondrial biogenesis and function: a regulatory cross-talk between two genomes. Gene 263:1–16Google Scholar
  69. 69.
    Ratel D, Ravanat J, Berger F, Wion D (2006) N6-methyladenine: the other methylated base of DNA. Bioessays 28(3):309–315Google Scholar
  70. 70.
    Srinivas N, Deb K (1994) Muiltiobjective optimization using nondominated sorting in genetic algorithms. Evol Comput 2(3):221–248Google Scholar
  71. 71.
    Dorigo M, Stützle T (2004) Ant colony optimization. Bradford CompanyGoogle Scholar
  72. 72.
    Zhou Z, Dang Y, Zhou M, Li L, Yu C-H, Fu J, Chen S, Liu Y (2016) Codon usage is an important determinant of gene expression levels largely through its effects on transcription. Proc Natl Acad Sci 113(41):E6117–E6125Google Scholar
  73. 73.
    Kunkel TA (2004) DNA replication fidelity. J Biol Chem 279(17):16895–16898Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Gaby G. Dagher
    • 1
    Email author
  • Anthony P. Machado
    • 1
  • Eddie C. Davis
    • 1
  • Thomas Green
    • 1
  • John Martin
    • 2
  • Matthew Ferguson
    • 2
  1. 1.Department of Computer ScienceBoise State UniversityBoiseUSA
  2. 2.Department of PhysicsBoise State UniversityBoiseUSA

Personalised recommendations