Skip to main content
Log in

On conflict free DNA codes

  • Published:
Cryptography and Communications Aims and scope Submit manuscript

Abstract

DNA storage has emerged as an important area of research. The reliability of a DNA storage system depends on designing those DNA strings (called DNA codes) that are sufficiently dissimilar. In this work, we introduce DNA codes that satisfy the newly introduced constraint, a generalization of the non-homopolymers constraint. In particular, each codeword of the DNA code has the specific property that any two consecutive sub-strings of the DNA codeword will not be the same. This is apart from the usual constraints such as Hamming, reverse, reverse-complement and GC-content. We believe that the new constraints proposed in this paper will provide significant achievements in reducing the errors, during reading and writing data into the synthetic DNA strings. We also present a construction (based on a variant of stochastic local search algorithm) to determine the size of the DNA codes with a constraint that each DNA codeword is free from secondary structures in addition to the usual constraint. This further improves the lower bounds from the existing literature, in some specific cases. A recursive isometric map between binary vectors and DNA strings is also proposed. By applying this map over the well known binary codes, we obtain classes of DNA codes with all of the above constraints, including the property that the constructed DNA codewords are free from the hairpin like secondary structures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Blawat, M., Gaedke, K., Hütter, I., Chen, X.M., Turczyk, B., Inverso, S., Pruitt, B.W., Church, G.M.: Forward error correction for DNA data storage. Procedia Comput. Sci. 80, 1011–1022 (2016)

    Article  Google Scholar 

  2. Bornholt, J., Lopez, R., Carmean, D.M., Ceze, L., Seelig, G., Strauss, K.: A DNA-based archival storage system. ACM SIGOPS Operating Syst. Rev. 50(2), 637–649 (2016)

    Article  Google Scholar 

  3. Chee, Y.M., Ling, S.: Improved lower bounds for constant GC-content DNA codes. IEEE Trans. Inf. Theory 54(1), 391–394 (2008). https://doi.org/10.1109/TIT.2007.911167

    Article  MathSciNet  MATH  Google Scholar 

  4. Chheda, N., Gupta, M.K.: RNA As a permutation. arXiv:1403.5477v1 (2014)

  5. Church, G.M., Gao, Y., Kosuri, S.: Next-generation digital information storage in DNA. Science 337(6102), 1628–1628 (2012). https://doi.org/10.1126/science.1226355

    Article  Google Scholar 

  6. Erlich, Y., Zielinski, D.: DNA Fountain enables a robust and efficient storage architecture. Science 355(6328), 950–954 (2017). https://doi.org/10.1126/science.aaj2038

    Article  Google Scholar 

  7. Gaborit, P., King, O.D.: Linear constructions for DNA codes. Theor. Comput. Sci. 334, 99–113 (2005)

    Article  MathSciNet  Google Scholar 

  8. Goldman, N., Bertone, P., Chen, S., Dessimoz, C., LeProust, E.M., Sipos, B., Birney, E.: Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature 494(7435), 77 (2013)

    Article  Google Scholar 

  9. Guenda, K., Gulliver, T.A., Solé, P.: On cyclic DNA codes. In: Proceedings IEEE International Symposium on Information Theory (ISIT), pp. 121–125. https://doi.org/10.1109/ISIT.2013.6620200 (2013)

  10. Immink, K.A.S., Cai, K.: Properties and constructions of constrained codes for DNA-based data storage. arXiv:1812.06798 (2018)

  11. Jacobs, A.: Data-storage for eternity (ETH Zürich, 13th of February 2015 https://www.ethz.ch/en/news-and-events/eth-news/news/2015/02/data-storage-for-eternity.html)

  12. Jain, S., Hassanzadeh, F.F., Schwartz, M., Bruck, J.: Duplication-correcting codes for data storage in the DNA of living organisms. IEEE Trans. Inf. Theory 63(8), 4996–5010 (2017). https://doi.org/10.1109/TIT.2017.2688361

    Article  MathSciNet  MATH  Google Scholar 

  13. Kari, L., Konstantinidis, S., Losseva, E., Sosík, P., Thierrin, G.: Hairpin structures in DNA words. In: DNA Computing, Pp. 158–170 (2006)

  14. Kiah, H.M., Puleo, G.J., Milenkovic, O.: Codes for DNA sequence profiles. In: Proceedings IEEE International Symposium on Information Theory (ISIT), pp. 814–818. https://doi.org/10.1109/ISIT.2015.7282568 (2015)

  15. Kim, Y.S., Kim, S.H.: New construction of DNA codes with constant-GC contents from binary sequences with ideal autocorrelation. In: Proceedings IEEE International Symposium on Information Theory (ISIT), pp. 1569–1573. https://doi.org/10.1109/ISIT.2011.6033808 (2011)

  16. Kovačević, M., Tan, V.Y.F.: Asymptotically optimal codes correcting fixed-length duplication errors in DNA storage systems. IEEE Commun. Lett. 22(11), 2194–2197 (2018). https://doi.org/10.1109/LCOMM.2018.2868666

    Article  Google Scholar 

  17. Limbachiya, D., Benerjee, K.G., Rao, B., Gupta, M.K.: On DNA codes using the ring \(\mathbb {Z}_{4}+w\mathbb {Z}_{4}\). In: Proceedings IEEE International Symposium on Information Theory (ISIT), pp. 2401–2405. https://doi.org/10.1109/ISIT.2018.8437313 (2018)

  18. Limbachiya, D., Gupta, M.K.: Natural Data Storage: A Review on sending Information from now to then via Nature. arXiv:1505.04890 (2015)

  19. Limbachiya, D., Gupta, M.K., Aggarwal, V.: Family of constrained codes for archival DNA data storage. IEEE Commun. Lett. 22(10), 1972–1975 (2018). https://doi.org/10.1109/LCOMM.2018.2861867

    Article  Google Scholar 

  20. Limbachiya, D., Rao, B., Gupta, M.K.: The Art of DNA Strings: Sixteen Years of DNA Coding Theory. arXiv:1607.00266 (2016)

  21. Loman, N., Misra, R., Dallman, T., Constantinidou, C., Gharbia, S., Wain, J., Pallen, M.: Performance comparison of benchtop high-throughput sequencing platforms. Nat. Biotechnol. 30(6), 434–439 (2012)

    Article  Google Scholar 

  22. Lothaire, M.: Combinatorics on Words, 2nd edn. Cambridge Mathematical Library. Cambridge University Press, Cambridge (1997). https://doi.org/10.1017/CBO9780511566097

    Book  Google Scholar 

  23. Marathe, A., Condon, A.E., Corn, R.M.: On combinatorial DNA word design. J. Comput. Biol. 8(3), 201–219 (2001). https://doi.org/10.1089/10665270152530818

    Article  MATH  Google Scholar 

  24. Milenkovic, O., Kashyap, N.: DNA Codes that avoid secondary structures. In: Proceedings IEEE International Symposium on Information Theory (ISIT), pp. 288–292. https://doi.org/10.1109/ISIT.2005.1523340 (2005)

  25. Muller, D.E.: Application of boolean algebra to switching circuit design and to error detection. Transactions of the I. R. E. Professional Group on Electronic Computers EC-3(3), 6–12 (1954). https://doi.org/10.1109/IREPGELC.1954.6499441

    Article  Google Scholar 

  26. Myers, P., Sebaihia, M., Cerdeño-tárraga Bentley, S., Crossman, L., Parkhill, J.: Tandem repeats and morphological variation. Nature (2007)

  27. Nelms, B.L., Labosky, P.A.: A predicted hairpin cluster correlates with barriers to PCR. sequencing and possibly BAC recombineering Scientific Reports 1, 106 (2011)

  28. Ridge, P., Carroll, H., Sneddon, D., Clement, M., Snell, Q.: Large grain size stochastic optimization alignment. In: Proceedings IEEE Symposium on BioInformatics and BioEngineering (BIBE), pp. 127–134. https://doi.org/10.1109/BIBE.2006.253325 (2006)

  29. Rykov, V.V., Macula, A.J., Torney, D.C., White, P.S.: DNA Sequences and quaternary cyclic codes. In: Proceedings IEEE International Symposium on Information Theory (ISIT), pp. 248–248. https://doi.org/10.1109/ISIT.2001.936111 (2001)

  30. Smith, D.H., Aboluion, N., Montemanni, R., Perkins, S.: Linear and nonlinear constructions of DNA codes with Hamming distance d and constant GC-content. Discret. Math. 311(13), 1207–1219 (2011)

    Article  MathSciNet  Google Scholar 

  31. Song, W., Cai, K., Zhang, M., Yuen, C.: Codes with run-length and GC-content constraints for DNA-based data storage. IEEE Commun. Lett. 22(10), 2004–2007 (2018). https://doi.org/10.1109/LCOMM.2018.2866566

    Article  Google Scholar 

  32. Thomson, N., Sebaihia, M., Cerdeño-tárraga Bentley, S., Crossman, L., Parkhill, J.: The value of comparison. Nat. Rev. Microbiology 1(11), 11–12 (2003)

    Article  Google Scholar 

  33. Tulpan, D., Smith, D.H., Montemanni, R.: Thermodynamic post-processing versus GC-content pre-processing for DNA codes satisfying the hamming distance and reverse-complement constraints. IEEE/ACM Trans. Comput. Biol. Bioinform. 11(2), 441–452 (2014). https://doi.org/10.1109/TCBB.2014.2299815

    Article  Google Scholar 

  34. Tulpan, D.C., Hoos, H.H., Condon, A.E.: Stochastic local search algorithms for DNA word design. In: DNA Computing, pp. 229–241 (2003)

  35. Yakovchuk, P., Protozanova, E., Frank-Kamenetskii, M.D.: Base-stacking and base-pairing contributions into thermal stability of the DNA double helix. Nuclice Acis Res. 34(2), 564–574 (2006). https://doi.org/10.1093/nar/gkj454

    Article  Google Scholar 

  36. Yazdi, S.H.T., Yuan, Y., Ma, J., Zhao, H., Milenkovic, O.: A rewritable, random-access DNA-based storage system. Scientific Reports 5, 14138 (2015)

    Article  Google Scholar 

  37. Zhu, X., Sun, C., Liu, W., Wu, W.: Research on the counting problem based on linear constructions for DNA coding. In: Proceedings Computational Intelligence and Bioinformatics, pp. 294–302 (2006)

  38. Zuker, M.: Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31(13), 3406–3415 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manish K. Gupta.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The preliminary version of the paper is available at https://arxiv.org/abs/1902.04419

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Benerjee, K.G., Deb, S. & Gupta, M.K. On conflict free DNA codes. Cryptogr. Commun. 13, 143–171 (2021). https://doi.org/10.1007/s12095-020-00459-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12095-020-00459-7

Keywords

Mathematics Subject Classification (2010)

Navigation