Similarity Searching in DNA Sequences by Spectral Distortion Measures

  • Tuan D. Pham
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4065)


Searching for similarity among biological sequences is an important research area of bioinformatics because it can provide insight into the evolutionary and genetic relationships between species that open doors to new scientific discoveries such as drug design and treament. In this paper, we introduce a novel measure of similarity between two biological sequences without the need of alignment. The method is based on the concept of spectral distortion measures developed for signal processing. The proposed method was tested using a set of six DNA sequences taken from Escherichia coli K-12 and Shigella flexneri, and one random sequence. It was further tested with a complex dataset of 40 DNA sequences taken from the GenBank sequence database. The results obtained from the proposed method are found superior to some existing methods for similarity measure of DNA sequences.


Markov Chain Model Biological Sequence Distortion Measure Chaos Game Representation Signal Analysis Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ewens, W.J., Grant, G.R.: Statistical Methods in Bioinformatics. Springer, NY (2001)MATHGoogle Scholar
  2. 2.
    Miller, W.: Comparison of genomic DNA sequences: solved and unsolved problems. Bioinformatics 17, 391–397 (2001)CrossRefGoogle Scholar
  3. 3.
    Vinga, S., Almeida, J.: Alignment-free sequence comparison—a review. Bioinformatics 19, 513–523 (2003)CrossRefGoogle Scholar
  4. 4.
    Blaisdell, B.E.: Ameasure of the similarity of sets of sequences not requiring sequence alignment. Proc. Natl Acad. Sci. USA 83, 5155–5159 (1986)MATHCrossRefGoogle Scholar
  5. 5.
    Wu, T.J., Burke, J.P., Davison, D.B.: A measure of DNA sequence dissimilarity based on Mahalanobis distance between frequencies of words. Biometrics 53, 1431–1439 (1997)MATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Wu, T.J., Hsieh, Y.C., Li, L.A.: Statistical measures of DNA dissimilarity under Markov chain models of base composition. Biometrics 57, 441–448 (2001)CrossRefMathSciNetMATHGoogle Scholar
  7. 7.
    Stuart, G.W., Moffett, K., Baker, S.: Integrated gene and species phylogenies from unaligned whole genome protein sequences. Bioinformatics 18, 100–108 (2002)CrossRefGoogle Scholar
  8. 8.
    Li, M., Badger, J.H., Chen, X., Kwong, S., Kearney, P., Zhang, H.: An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics 17, 149–154 (2001)CrossRefGoogle Scholar
  9. 9.
    Almeida, J.S., Carrico, J.A., Maretzek, A., Noble, P.A., Fletcher, M.: Analysis of genomic sequences by chaos game representation. Bioinformatics 17, 429–437 (2001)CrossRefGoogle Scholar
  10. 10.
    Pham, T.D., Zuegg, J.: A probabilistic measure for alignment-free sequence comparison. Bioinformatics 20, 3455–3461 (2004)CrossRefGoogle Scholar
  11. 11.
    Nocerino, N., Soong, F.K., Rabiner, L.R., Klatt, D.H.: Comparative study of several distortion measures for speech recognition. IEEE Proc. Int. Conf. Acoustics, Speech, and Signal Processing 11.4.1, 387–390 (1985)Google Scholar
  12. 12.
    Veljkovic, V., Slavic, I.: General model of pseudopotentials. Physical Review Lett. 29, 105–108 (1972)CrossRefGoogle Scholar
  13. 13.
    Cosic, I.: Macromolecular bioactivity: Is it resonant interaction between macromolecules? – theory and applications. IEEE trans. Biomedical Engineering 41, 1101–1114 (1994)CrossRefGoogle Scholar
  14. 14.
    Veljkovic, V., Cosic, I., Dimitrijevic, B., Lalovic, D.: Is it possible to analyze DNA and protein sequences by the methods of digital signal processing? IEEE Trans. Biomed. Eng. 32, 337–341 (1985)CrossRefGoogle Scholar
  15. 15.
    de Trad, C.H., Fang, Q., Cosic, I.: Protein sequence comparison based on the wavelet transform approach. Protein Engineering 15, 193–203 (2002)CrossRefGoogle Scholar
  16. 16.
    Anatassiou, D.: Frequency-domain analysis of biomolecular sequences. Bioinformatics 16, 1073–1082 (2000)CrossRefGoogle Scholar
  17. 17.
    Anatassiou, D.: Genomic signal processing. IEEE Signal Processing Magazine 18, 8–20 (2001)CrossRefGoogle Scholar
  18. 18.
    Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63, 561–580 (1975)CrossRefGoogle Scholar
  19. 19.
    Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition, New Jersey. Prentice Hall, Englewood Cliffs (1993)Google Scholar
  20. 20.
    Ingle, V.K., Proakis, J.G.: Digital Signal Processing Using Matlab V.4. PWS Publishing, Boston (1997)Google Scholar
  21. 21.
    Gray, R.M.: Vector quantization. IEEE ASSP Mag. 1, 4–29 (1984)CrossRefGoogle Scholar
  22. 22.
    Itakura, F., Saito, S.S.: A statistical method for estimation of speech spectral density and formant frequencies. Electronics and Communications in Japan 53A, 36–43 (1970)Google Scholar
  23. 23.
    O’Shaughnessy, D.: Speech Communication – Human and Machine. Addison-Wesley, Reading (1987)Google Scholar
  24. 24.
    Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994)CrossRefGoogle Scholar
  25. 25.
    Felsenstein, J.: PHYLIP (Phylogeny Inference Package), version 3.5c. Distributed by the Author, Department of Genetics, University of Washington, Seattle, WA (1993)Google Scholar
  26. 26.
    Kimura, M.: A simple method for estimating evolutionary rate of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16, 111–120 (1980)CrossRefGoogle Scholar
  27. 27.
    Jukes, T.H., Cantor, C.R.: Evolution of protein molecules. In: Munro, H.N. (ed.) Mammalian Protein Metabolism, pp. 21–132. Academic Press, London (1969)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Tuan D. Pham
    • 1
    • 2
  1. 1.Bioinformatics Applications Research Centre 
  2. 2.School of Information TechnologyJames Cook UniversityTownsvilleAustralia

Personalised recommendations