Skip to main content
Log in

Comparison of Numerical Representations of Genomic Sequences: Choosing the Best Mapping for Wavelet Analysis

  • Original Paper
  • Published:
International Journal of Applied and Computational Mathematics Aims and scope Submit manuscript

Abstract

Due to the rapid growth of genome database after the completion of human genome project, developing efficient computational techniques to investigate the underlying information in the genomic data is the need of the hour. Analysis of genomic data with digital signal processing (DSP) techniques can characterize this information much more rapidly and efficiently in comparison to standard laboratory methods. Before applying signal processing methods genomic data needs to be mapped into adequate mathematical representations. Therefore choosing appropriate representations can significantly affect the analysis of genomic data with DSP methods. This paper presents the comparison of various mathematical representations of genomic sequences analyzed with different orthogonal and biorthogonal wavelet transforms based on the calculation of reconstruction errors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Abbasi, O., Rostami, A., Karimian, G.: Identification of exonic regions in DNA sequences using cross-correlation and noise suppression by discrete wavelet transform. BMC Bioinform. 12, 1–10 (2011)

    Article  Google Scholar 

  2. Ahmad, M., Abdullah, A., Buragga, K.: A novel optimized approach for gene identification in DNA sequences. J. Appl. Sci. 11(5), 806–814 (2011)

    Article  Google Scholar 

  3. Akhtar, M., Epps, J., Ambikairajah, E.: On DNA numerical representations for period-3 based exon prediction. In: Proceedings of IEEE International Workshop on Genomic Signal Processing and Statistics, pp. 1–4. GENSIPS, Tuusula, Finland (2007)

  4. Akhtar, M., Epps, J., Ambikairajah, E.: Signal processing in sequence analysis: advances in eukaryotic gene prediction. IEEE J. Sel. Top. Signal Process. 2(3), 310–321 (2008)

    Article  Google Scholar 

  5. Anastassiou, D.: Genomic signal processing. IEEE Signal Process. Mag. 18(4), 8–20 (2001)

    Article  Google Scholar 

  6. Arneodo, A., Aubenton-Carafa, Y.D., Audit, B., Bacry, E., Muzy, J.F., Thermes, C.: What can we learn with wavelets about DNA sequences? Phys. A 249, 439–448 (1998)

    Article  Google Scholar 

  7. Audit, B., Vaillant, C., Arneodo, A., D’Aubenton, Carafa Y., Thermes, C.: Long-range correlations between DNA bending sites: relation to the structure and dynamics of nucleosomes. J. Mol. Biol. 316, 903–918 (2002)

    Article  Google Scholar 

  8. Berger, J.A., Mitra, S.K., Carli, M., Neri, A.: New approaches to genome sequence analysis based on digital signal processing. In: Proceedings of IEEE Workshop on Genomic Signal Processing and Statistics (GENSIPS), pp. 1–4. Raleigh, NC (2002)

  9. Berger, J.A., Mitra, S.K., Carli, M., Neri, A.: Visualization and analysis of DNA sequences using DNA walks. J. Frankl. Inst. 341, 37–53 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  10. Brodzik, A.K., Peters, O.: Symbol-balanced quaternionic periodicity transform for latent pattern detection in DNA sequences. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 5, v/373–v/376 (2005). doi:10.1109/ICASSP.2005.1416318

  11. Chakravarthy, N., Spanias, A., Iasemidis, L.D., Tsakalis, K.: Autoregressive modeling and feature analysis of DNA sequences. EURASIP J. Adv. Signal Process. 2004(1), 952689 (2004). doi:10.1155/S111086570430925X

    Article  MATH  Google Scholar 

  12. Cristea, P.D.: Genetic signals: an emerging concept. In: Proceedings of 8th International Workshop on Systems, Signals and Image Processing, pp. 17–22, Bucharest, Romania (2001)

  13. Cristea, P.D.: Genetic signal representation and analysis. In: Proceedings of SPIE 4623, Functional Monitoring and Drug-Tissue Interaction, vol. 4623, pp. 77–84 (2002). doi: 10.1117/12.491244

  14. Cristea, P.D.: Phase analysis of DNA genomic signals. In: Proceedings of the 2003 International Symposium on Circuits and Systems, vol. 5, pp. V25–V28, Thailand (2003). doi:10.1109/ISCAS.2003.1206163

  15. Cristea, P.D.: Genomic Signal Processing and Statistics (Eurasip Book Series on Signal Processing and Communications), pp. 15–65. Hindawi Publishing Corporation (2005)

  16. Cristea, P.D., Tuduce, R., Banica, D., Rodewald, K.: Genomic signals for the study of multiresistance mutations in M Tuberculosis. In: Proceedings of International Symposium on Signals, Circuits and Systems, vol. 1, pp. 1–4, Romania (2007). doi:10.1109/ISSCS.2007.4292708

  17. Cosic, I.: Macromolecular bioactivity: Is it resonant interaction between macromolecules? Theory and applications. IEEE Trans. Bio-med. Eng. 41, 1101–1114 (1994)

    Article  Google Scholar 

  18. Demeler, B., Zhou, G.W.: Neural network optimization for E. coli promoter prediction. Nucl. Acids Res. 19(7), 1539–1599 (1991)

    Article  Google Scholar 

  19. Dodin, G., Vandergheynst, P., Levoir, P., Cordier, C., Marcour, L.: Fourier and wavelet transform analysis, a tool for visualising regular patterns in DNA sequences. J. Theor. Biol. 206, 323–326 (2000)

    Article  Google Scholar 

  20. Galvan, B.P., Carpena, P., Roman-Roldanet, R., Oliver, J.L.: Study of statistical correlations in DNA sequences. Gene 300(1–2), 105–115 (2002)

    Article  Google Scholar 

  21. George, T.P., Thomas, T.: Discrete wavelet transform de-noising in eukaryotic gene splicing. BMC Bioinform. 11(Suppl 1), S50 (2010). doi:10.1186/1471-2105-11-S1-S50

    Article  MathSciNet  Google Scholar 

  22. Haimovich, A.D., Byrne, B., Ramaswamy, R., Welsh, W.J.: Wavelet analysis of DNA walks. J. Comput. Biol. 13(7), 1289–1298 (2006)

    Article  MathSciNet  Google Scholar 

  23. Hur, Y., Lee, H.: Wavelet-based identification of DNA focal genomic aberrations from single nucleotide polymorphism arrays. BMC Bioinform. (2011). doi:10.1186/1471-2105-12-146

  24. Inbamalar, T.M., Sivakumar, R.: Improved algorithm for analysis of DNA sequences using multiresolution transformation. Sci. World J. 2015(2015); Article ID 786497. doi:10.1155/2015/786497

  25. Krishnan, A., Li, K.B., Issac, P.: Rapid detection of conserved regions in protein sequences using wavelets. Silico Biol. 4(2), 133–148 (2004)

    Google Scholar 

  26. Lió, P., Vannucci, M.: Finding pathogenicity islands and gene transfer events in genome data. Bioinformatics 16(10), 932–940 (2000)

    Article  Google Scholar 

  27. Machado, J.A.T., Costa, A.C., Quelhas, M.D.: Wavelet analysis of human DNA. Genomics 98(3), 155–163 (2011)

    Article  MATH  Google Scholar 

  28. Mallat, S.: A wavelet tour of signal processing, 2nd edn. Academic Press, New York (2000)

    MATH  Google Scholar 

  29. Murray, K.B., Gorse, D., Thornton, J.M.: Wavelet transforms for the characterization and detection of repeating motifs. J. Mol. Biol. 316(2), 341–363 (2002)

    Article  Google Scholar 

  30. Nair, A.S.S., Mahalakshmi, T.: Visualization of genomic data using inter-nucleotide distance signals. In: Proceedings of IEEE Genomic Signal Processing, Bucharest, Romania (2005)

  31. National Center for Biotechnology Information (NCBI):http://www.ncbi.nlm.nih.gov

  32. Ning, J., Moore, C.N., Nelson, J.C.: Preliminary wavelet analysis of genomic sequences. In: Proceedings of the IEEE Computer Society Conference on Bioinformatics, pp. 509–510, Stanford, CA (2003)

  33. Peng, C.K., Buldyrev, S.V., Goldberger, A.V., Havlin, S., Sciortino, F., Simons, M., Stanley, H.E.: Long-range correlations in nucleotide sequences. Nature 356, 168–170 (1992)

    Article  Google Scholar 

  34. Ranawana, R., Palade, V.: A neural network based multi-classifier system for gene identification in DNA sequence. Neural Comput. Appl 14(2), 122–131 (2005)

    Article  Google Scholar 

  35. Rao, K.D., Swamy, M.N.S.: Analysis of genomics and proteomics using DSP techniques. IEEE Trans. Circuits Syst.-I 55(1), 370–378 (2008)

    Article  MathSciNet  Google Scholar 

  36. Song, J., Ware, A., Liu, S.: Wavelet to predict bacterial ori and ter:a tendency towards a physical balance. BMC Genomics (2003). doi:10.1186/1471-2164-4-17

  37. Tsonis, A.A., Kumar, P., Elsner, J.B., Tsonis, P.A.: Wavelet analysis of DNA sequences. Phys. Rev. E Stat. Phys. Plasmas Fluids Relat. Interdiscip. Top. 53(2), 1828–1834 (1996)

    MATH  Google Scholar 

  38. Vaidyanathan, P.P., Yoon, B.J.: Digital filters for gene prediction applications. In: Proceedings of IEEE asilomar conference on signals systems and computers, Monterey, CA (2002)

  39. Vaidyanathan, P.P., Yoon, B.J.: The role of signal-processing concepts in genomics and proteomics. J. Frankl. Inst. 341, 111–135 (2004)

    Article  MATH  Google Scholar 

  40. Voss, R.F.: Evolution of long-range fractal correlations and 1/f noise in DNA base sequence. Phys. Rev. Lett. 68, 3805–3808 (1992)

    Article  Google Scholar 

  41. Wang, J., Zhang, C.T.: Identification of protein-coding genes in the genome of Vibrio cholerae with more than 98% accuracy using occurrence frequencies of single nucleotides. Eur. J. Biochem. 268, 4261–4268 (2001)

    Article  Google Scholar 

  42. Yu, X., Randolph, T.W., Tang, H., Hsu, L.: Detecting genomic aberrations using products in a multiscale analysis. Biometrics 66, 684–693 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  43. Zhang, R., Zhang, C.T.: Z curves, an intuitive tool for visualizing and analyzing the DNA sequences. J. Biomol. Struct. Dyn. 11(4), 767–782 (1994)

    Article  Google Scholar 

  44. Zhang, R., Zhang, C.T.: Identification of replication origins in archaeal genomes based on the Z curve method. Archaea 1, 335–346 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shiwani Saini.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saini, S., Dewan, L. Comparison of Numerical Representations of Genomic Sequences: Choosing the Best Mapping for Wavelet Analysis. Int. J. Appl. Comput. Math 3, 2943–2958 (2017). https://doi.org/10.1007/s40819-016-0277-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40819-016-0277-1

Keywords

Navigation