Advertisement

One-Bit DNA Compression Algorithm

  • Deloula Mansouri
  • Xiaohui Yuan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11307)

Abstract

Recently, the ever-increasing growth of genomic sequences DNA or RNA stored in databases poses a serious challenge to the storage, process and transmission of these data. Hence effective management of genetic data is very necessary which makes data compression unavoidable. The current standard compression tools are insufficient for DNA sequences compression. In this paper we proposed an efficient lossless DNA compression algorithm based One-Bit Compression method (OBComp) that will compress both repeated and non-repeated sequences. Unlike direct coding technique where two bits are assigned to each nucleotide resulting compression ratio of 2 bits per byte (bpb), OBComp used just a single bit 0 or 1 to code the two highest occurrence nucleotides. The positions of the two others are saved. To further enhance the compression, modified version of Run Length Encoding technique and Huffman coding algorithm are then applied respectively. The proposed algorithm has efficiently reduced the original size of DNA sequences. The easy way to implement our algorithm and the remarkable compression ratio makes its use interesting.

Keywords

DNA sequences Redundancies Lossless compression One-Bit algorithm 

References

  1. 1.
    Saada, B., Zhang, J.: Vertical DNA sequences compression algorithm based on hexadecimal representation. In: Proceedings of the World Congress on Engineering and Computer Science, pp. 21–25. WCECS, San Francisco (2015)Google Scholar
  2. 2.
    Jahaan, A., Ravi, T., Arokiaraj, S.: A comparative study and survey on existing DNA compression techniques. Int. J. Adv. Res. Comput. Sci. 8, 732–735 (2017)CrossRefGoogle Scholar
  3. 3.
    Majumder, A.B., Gupta, S.: CBSTD: a cloud based symbol table driven DNA compression algorithm. In: Bhattacharyya, S., Sen, S., Dutta, M., Biswas, P., Chattopadhyay, H. (eds.) Industry Interactive Innovations in Science, Engineering and Technology. LNNS, vol. 11, pp. 467–476. Springer, Singapore (2018).  https://doi.org/10.1007/978-981-10-3953-9_45CrossRefGoogle Scholar
  4. 4.
    Aly, W., Yousuf, B., Zohdy, B.: A Deoxyribonucleic acid compression algorithm using auto-regression and swarm intelligence. J. Comput. Sci. 9, 690–698 (2013)CrossRefGoogle Scholar
  5. 5.
    Kuruppu, S., Puglisi, S.J., Zobel, J.: Reference sequence construction for relative compression of genomes. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 420–425. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-24583-1_41CrossRefGoogle Scholar
  6. 6.
  7. 7.
    Behzadi, B., Le Fessant, F.: DNA compression challenge revisited: a dynamic programming approach. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 190–200. Springer, Heidelberg (2005).  https://doi.org/10.1007/11496656_17CrossRefGoogle Scholar
  8. 8.
    Keerthy, A.S., Appadurai, A.: An empirical study of DNA compression using dictionary methods and pattern matching in compressed sequences. Int. J. Appl. Eng. Res. 10, 35064–35067 (2015)Google Scholar
  9. 9.
    Al-Okaily, A., Almarri, B., Al Yami, S., Huang, C.H.: Toward a better compression for DNA sequences using huffman encoding. J. Comput. Biol. 24, 280–288 (2017)CrossRefGoogle Scholar
  10. 10.
    Arya, G.P., Bharti, R.K., Prasad, D., Rana, S.S.: An Improvement over direct coding technique to compress repeated & non-repeated nucleotide data. In: 2016 International Conference on Computing, Communication and Automation, pp. 193–196. IEEE Press, Noida (2016)Google Scholar
  11. 11.
    Rastogi, K., Segar, K.: Analysis and performance comparison of lossless compression techniques for text data. Int. J. Eng. Comput. Res. 3, 123–127 (2014)Google Scholar
  12. 12.
    Singh, A.V., Singh, G.: A survey on different text data compression techniques. Int. J. Sci. Res. 3, 1999–2002 (2014)Google Scholar
  13. 13.
    Brar, R., Singh, B.: A survey on different compression techniques and bit reduction algorithm for compression of text/lossless data. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3, 579–582 (2013)Google Scholar
  14. 14.
    Priyanka, M., Goel, S.: A compression algorithm for DNA that uses ASCII values. In: 2014 IEEE International Advance Computing Conference, pp. 739–743. IEEE Press, Gurgaon (2014)Google Scholar
  15. 15.
  16. 16.
    Nour, S.B., Amr, A.S.: DNA lossless compression algorithms: review. Am. J. Bioinform. Res. 3, 72–81 (2013)Google Scholar
  17. 17.
    Khalid, S.: Introduction to Data Compression. Morgan Kaufmann, San Francisco (2006)zbMATHGoogle Scholar
  18. 18.
    Mark, N., Jean-Loup, G.: The Data Compression Book. Morgan Kaufmann, New York (2012)Google Scholar
  19. 19.
    Grumbach, S., Tahi, F.: Compression of DNA Sequences. In: Proceedings of the Data Compression Conference, DCC 1993, pp. 340–350. IEEE Press, Snowbird (1993)Google Scholar
  20. 20.
    Korodi, S., Tabus, I., Rissanen, J., Astola, J.: DNA sequence compression based on the normalized maximum likelihood model. IEEE Signal Process. Mag. 24, 47–53 (2007)CrossRefGoogle Scholar
  21. 21.
    Rajeswari, P.R., Apparao, A., Kumar, V.K.: Genbit compress tool (GBC): a Java based tool to compress DNA sequences and compute compression ratio (BITS/BASE) of genomes. Int. J. Comput. Sci. Inf. Technol. 2, 181–191 (2010)Google Scholar
  22. 22.
    Rajeswari, P.R., Apparao, A.: DNABIT compress - genome compression algorithm. Bioinformation 5, 350–360 (2011)CrossRefGoogle Scholar
  23. 23.
    Roy, S., Bhagot, A., Sharma, K., Khatua, S.: BVRLDNAComp: an effective DNA sequence compression algorithm. Int. J. Comput. Sci. Appl. 5, 73–85 (2015)Google Scholar
  24. 24.
    Rexline, S.J., Aju, R.G., Trujilla, L.F.: Higher compression from burrows-wheeler transform for DNA sequence. Int. J. Comput. Appl. 173, 11–15 (2017)Google Scholar
  25. 25.
    Habib, N., Ahmed, K., Jabin, I., Rahman, M.M.: Modified HuffBit compress algorithm – an application of R. J. Integr. Bioinform. 15, 1–13 (2018)CrossRefGoogle Scholar
  26. 26.
    National Center for Biotechnology Information. https://www.ncbi.nlm.nih.gov/
  27. 27.
    Roy, S., Khatua, S.: DNA data compression algorithms based on redundancy. Int. J. Found. Comput. Sci. Technol. 4, 49–58 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyWuhan University of TechnologyWuhanChina

Personalised recommendations