Abstract
Due to the rapid growth of genome database after the completion of human genome project, developing efficient computational techniques to investigate the underlying information in the genomic data is the need of the hour. Analysis of genomic data with digital signal processing (DSP) techniques can characterize this information much more rapidly and efficiently in comparison to standard laboratory methods. Before applying signal processing methods genomic data needs to be mapped into adequate mathematical representations. Therefore choosing appropriate representations can significantly affect the analysis of genomic data with DSP methods. This paper presents the comparison of various mathematical representations of genomic sequences analyzed with different orthogonal and biorthogonal wavelet transforms based on the calculation of reconstruction errors.
Similar content being viewed by others
References
Abbasi, O., Rostami, A., Karimian, G.: Identification of exonic regions in DNA sequences using cross-correlation and noise suppression by discrete wavelet transform. BMC Bioinform. 12, 1–10 (2011)
Ahmad, M., Abdullah, A., Buragga, K.: A novel optimized approach for gene identification in DNA sequences. J. Appl. Sci. 11(5), 806–814 (2011)
Akhtar, M., Epps, J., Ambikairajah, E.: On DNA numerical representations for period-3 based exon prediction. In: Proceedings of IEEE International Workshop on Genomic Signal Processing and Statistics, pp. 1–4. GENSIPS, Tuusula, Finland (2007)
Akhtar, M., Epps, J., Ambikairajah, E.: Signal processing in sequence analysis: advances in eukaryotic gene prediction. IEEE J. Sel. Top. Signal Process. 2(3), 310–321 (2008)
Anastassiou, D.: Genomic signal processing. IEEE Signal Process. Mag. 18(4), 8–20 (2001)
Arneodo, A., Aubenton-Carafa, Y.D., Audit, B., Bacry, E., Muzy, J.F., Thermes, C.: What can we learn with wavelets about DNA sequences? Phys. A 249, 439–448 (1998)
Audit, B., Vaillant, C., Arneodo, A., D’Aubenton, Carafa Y., Thermes, C.: Long-range correlations between DNA bending sites: relation to the structure and dynamics of nucleosomes. J. Mol. Biol. 316, 903–918 (2002)
Berger, J.A., Mitra, S.K., Carli, M., Neri, A.: New approaches to genome sequence analysis based on digital signal processing. In: Proceedings of IEEE Workshop on Genomic Signal Processing and Statistics (GENSIPS), pp. 1–4. Raleigh, NC (2002)
Berger, J.A., Mitra, S.K., Carli, M., Neri, A.: Visualization and analysis of DNA sequences using DNA walks. J. Frankl. Inst. 341, 37–53 (2004)
Brodzik, A.K., Peters, O.: Symbol-balanced quaternionic periodicity transform for latent pattern detection in DNA sequences. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 5, v/373–v/376 (2005). doi:10.1109/ICASSP.2005.1416318
Chakravarthy, N., Spanias, A., Iasemidis, L.D., Tsakalis, K.: Autoregressive modeling and feature analysis of DNA sequences. EURASIP J. Adv. Signal Process. 2004(1), 952689 (2004). doi:10.1155/S111086570430925X
Cristea, P.D.: Genetic signals: an emerging concept. In: Proceedings of 8th International Workshop on Systems, Signals and Image Processing, pp. 17–22, Bucharest, Romania (2001)
Cristea, P.D.: Genetic signal representation and analysis. In: Proceedings of SPIE 4623, Functional Monitoring and Drug-Tissue Interaction, vol. 4623, pp. 77–84 (2002). doi: 10.1117/12.491244
Cristea, P.D.: Phase analysis of DNA genomic signals. In: Proceedings of the 2003 International Symposium on Circuits and Systems, vol. 5, pp. V25–V28, Thailand (2003). doi:10.1109/ISCAS.2003.1206163
Cristea, P.D.: Genomic Signal Processing and Statistics (Eurasip Book Series on Signal Processing and Communications), pp. 15–65. Hindawi Publishing Corporation (2005)
Cristea, P.D., Tuduce, R., Banica, D., Rodewald, K.: Genomic signals for the study of multiresistance mutations in M Tuberculosis. In: Proceedings of International Symposium on Signals, Circuits and Systems, vol. 1, pp. 1–4, Romania (2007). doi:10.1109/ISSCS.2007.4292708
Cosic, I.: Macromolecular bioactivity: Is it resonant interaction between macromolecules? Theory and applications. IEEE Trans. Bio-med. Eng. 41, 1101–1114 (1994)
Demeler, B., Zhou, G.W.: Neural network optimization for E. coli promoter prediction. Nucl. Acids Res. 19(7), 1539–1599 (1991)
Dodin, G., Vandergheynst, P., Levoir, P., Cordier, C., Marcour, L.: Fourier and wavelet transform analysis, a tool for visualising regular patterns in DNA sequences. J. Theor. Biol. 206, 323–326 (2000)
Galvan, B.P., Carpena, P., Roman-Roldanet, R., Oliver, J.L.: Study of statistical correlations in DNA sequences. Gene 300(1–2), 105–115 (2002)
George, T.P., Thomas, T.: Discrete wavelet transform de-noising in eukaryotic gene splicing. BMC Bioinform. 11(Suppl 1), S50 (2010). doi:10.1186/1471-2105-11-S1-S50
Haimovich, A.D., Byrne, B., Ramaswamy, R., Welsh, W.J.: Wavelet analysis of DNA walks. J. Comput. Biol. 13(7), 1289–1298 (2006)
Hur, Y., Lee, H.: Wavelet-based identification of DNA focal genomic aberrations from single nucleotide polymorphism arrays. BMC Bioinform. (2011). doi:10.1186/1471-2105-12-146
Inbamalar, T.M., Sivakumar, R.: Improved algorithm for analysis of DNA sequences using multiresolution transformation. Sci. World J. 2015(2015); Article ID 786497. doi:10.1155/2015/786497
Krishnan, A., Li, K.B., Issac, P.: Rapid detection of conserved regions in protein sequences using wavelets. Silico Biol. 4(2), 133–148 (2004)
Lió, P., Vannucci, M.: Finding pathogenicity islands and gene transfer events in genome data. Bioinformatics 16(10), 932–940 (2000)
Machado, J.A.T., Costa, A.C., Quelhas, M.D.: Wavelet analysis of human DNA. Genomics 98(3), 155–163 (2011)
Mallat, S.: A wavelet tour of signal processing, 2nd edn. Academic Press, New York (2000)
Murray, K.B., Gorse, D., Thornton, J.M.: Wavelet transforms for the characterization and detection of repeating motifs. J. Mol. Biol. 316(2), 341–363 (2002)
Nair, A.S.S., Mahalakshmi, T.: Visualization of genomic data using inter-nucleotide distance signals. In: Proceedings of IEEE Genomic Signal Processing, Bucharest, Romania (2005)
National Center for Biotechnology Information (NCBI):http://www.ncbi.nlm.nih.gov
Ning, J., Moore, C.N., Nelson, J.C.: Preliminary wavelet analysis of genomic sequences. In: Proceedings of the IEEE Computer Society Conference on Bioinformatics, pp. 509–510, Stanford, CA (2003)
Peng, C.K., Buldyrev, S.V., Goldberger, A.V., Havlin, S., Sciortino, F., Simons, M., Stanley, H.E.: Long-range correlations in nucleotide sequences. Nature 356, 168–170 (1992)
Ranawana, R., Palade, V.: A neural network based multi-classifier system for gene identification in DNA sequence. Neural Comput. Appl 14(2), 122–131 (2005)
Rao, K.D., Swamy, M.N.S.: Analysis of genomics and proteomics using DSP techniques. IEEE Trans. Circuits Syst.-I 55(1), 370–378 (2008)
Song, J., Ware, A., Liu, S.: Wavelet to predict bacterial ori and ter:a tendency towards a physical balance. BMC Genomics (2003). doi:10.1186/1471-2164-4-17
Tsonis, A.A., Kumar, P., Elsner, J.B., Tsonis, P.A.: Wavelet analysis of DNA sequences. Phys. Rev. E Stat. Phys. Plasmas Fluids Relat. Interdiscip. Top. 53(2), 1828–1834 (1996)
Vaidyanathan, P.P., Yoon, B.J.: Digital filters for gene prediction applications. In: Proceedings of IEEE asilomar conference on signals systems and computers, Monterey, CA (2002)
Vaidyanathan, P.P., Yoon, B.J.: The role of signal-processing concepts in genomics and proteomics. J. Frankl. Inst. 341, 111–135 (2004)
Voss, R.F.: Evolution of long-range fractal correlations and 1/f noise in DNA base sequence. Phys. Rev. Lett. 68, 3805–3808 (1992)
Wang, J., Zhang, C.T.: Identification of protein-coding genes in the genome of Vibrio cholerae with more than 98% accuracy using occurrence frequencies of single nucleotides. Eur. J. Biochem. 268, 4261–4268 (2001)
Yu, X., Randolph, T.W., Tang, H., Hsu, L.: Detecting genomic aberrations using products in a multiscale analysis. Biometrics 66, 684–693 (2010)
Zhang, R., Zhang, C.T.: Z curves, an intuitive tool for visualizing and analyzing the DNA sequences. J. Biomol. Struct. Dyn. 11(4), 767–782 (1994)
Zhang, R., Zhang, C.T.: Identification of replication origins in archaeal genomes based on the Z curve method. Archaea 1, 335–346 (2005)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Saini, S., Dewan, L. Comparison of Numerical Representations of Genomic Sequences: Choosing the Best Mapping for Wavelet Analysis. Int. J. Appl. Comput. Math 3, 2943–2958 (2017). https://doi.org/10.1007/s40819-016-0277-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40819-016-0277-1