Abstract
The high-storage density, long-life cycle, and low-energy consumption of DNA molecules make it the future of next-generation storage technology. However, DNA storage has the disadvantages of high-synthesis cost and low-random access efficiency. A high-density DNA-coding scheme can effectively reduce the cost of DNA synthesis. This paper first proposes a DNA-mapping method based on codebook and a random access method for DNA information based on encoded content. The mapping method satisfies the two biological constraints of homopolymer length and GC content. The random access method can efficiently and selectively read specific files in the DNA pool. To increase storage density, convolutional neural networks are combined with mapping methods to generate base sequences. In the experiments, our method was compared with the results of existing DNA information storage methods, which showed that the proposed scheme has better information storage density.
Similar content being viewed by others
Availability of data and material
The data and material do not be opened.
References
Akhmetov A, Ellington AD, Marcotte EM (2018) A highly parallel strategy for storage of digital information in living cells. BMC Biotechnol 18(1):64
Anavy L, Vaknin I, Atar O et al (2019) Data storage in DNA with fewer synthesis cycles using composite DNA letters. Nat Biotechnol 37(10):1229–1236
Ballé J, Laparra V, Simoncelli EP (2015) Density modeling of images using a generalized normalization transformation[J]. arXiv e-prints, arXiv:1511.06281
Biswas S, Nath S, Sing JK et al (2019) Storing digital data in nucleic acid memory with extended genetic alphabet. Proceedings of 2019 devices for integrated circuit. IEEE, Kalyani, pp 236–239
Blawat M, Gaedke K, Hütter I et al (2016) Forward error correction for DNA data storage. Procedia Comput Sci 80:1011–1022
Ceze L, Nivala J, Strauss K (2019) Molecular digital data storage using DNA. Nat Rev Genet 20(8):456–466
Choi Y, Ryu T, Lee AC et al (2019) High information capacity DNA-based data storage with augmented encoding characters using degenerate bases. Sci Rep 9(1):1–7
Dimopoulou M, Antonini M, Barbry P et al (2019) A biologically constrained encoding solution for long-term storage of images onto synthetic DNA. Proceedings of 27th European signal processing conference. IEEE, A Coruna, pp 1–5
Dong Y et al (2020) DNA storage: research landscape and future prospects. Natl Sci Rev 7(6):1092–1107
Erlich Y, Zielinski D (2017) DNA Fountain enables a robust and efficient storage architecture. Science 355(6328):950–954
Girshick R (2015) Fast R-CNN. In: 2015 IEEE international conference on computer vision (ICCV), pp 1440–1448. https://doi.org/10.1109/ICCV.2015.169
Jia D, Wei D, Socher R et al (2009) ImageNet: a large-scale hierarchical image database. Proc of IEEE Computer Vision & Pattern Recognition, pp 248–255
Organick L, Ang SD, Chen YJ et al (2018) Random access in large-scale DNA data storage. Nat Biotechnol 36(3):242
Russakovsky O, Deng J, Su H et al (2015) ImageNet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252
Shipman SL, Nivala J, Macklis JD et al (2017) CRISPR-Cas encoding of a digital movie into the genomes of a population of living bacteria. Nature 547(7663):345–349
Sun L, He J, Luo J et al (2019) DNA and the digital data storage. Health Sci J 13(3):659
Wang Y, Noor-A-Rahim M, Gunawan E et al (2019) Construction of bio-constrained code for DNA data storage. IEEE Commun Lett 23(6):963–966
Yazdi SMHT, Yuan Y, Ma J et al (2015) A Rewritable Random-Access DNA-Based Storage System. Sci Rep 5:14138
Zhang S, Huang B, Song X et al (2019) A high storage density strategy for digital information based on synthetic DNA. 3 Biotech 9(9):342
Acknowledgements
Thanks all co-author for their contribution. The authors would like to thank Shufang Zhang for the insightful discussions and feedback.
Funding
The authors did not receive support from any organization for the submitted work.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Shufang zhang, Jianjun Wu, BeiBei Huang and Yuhong Liu. The first draft of the manuscript was written by Jianjun Wu, BeiBei Huang and Yuhong Liu. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose. The authors have no conflicts of interest to declare that are relevant to the content of this article. All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript. The authors have no financial or proprietary interests in any material discussed in this article.
Ethics approval
The research is not involving human participants and animals. The authors inform all information of this research.
Consent to participate
The authors agree to participate.
Consent for publication
The submission is published by approval of authors.
Rights and permissions
About this article
Cite this article
Zhang, S., Wu, J., Huang, B. et al. High-density information storage and random access scheme using synthetic DNA. 3 Biotech 11, 328 (2021). https://doi.org/10.1007/s13205-021-02882-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13205-021-02882-w