Data Mining for Bioinformatics

  • A. W. -C. Liew
  • Hong Yan
  • Mengsu Yang


Codon Usage Protein Data Bank Secondary Structure Prediction Average Mutual Information Protein Structure Prediction 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Alberts, B., Bray, D., Lewis, J., Raff, M., Roberts, K. and Watson, J.D., (1989) Molecular Biology of the Cell. Garland Publishing, New York and London.Google Scholar
  2. Altschul, S.F., Gish, W., Miller, W., Myers E.W. and Lipman, D.J., (1990) Basic local alignment search tool. J. Mol. Bio. 215: 403–410.Google Scholar
  3. Altschul, S.F. and Gish, G., (1996) Local alignment statistics. Methods Enzymol. (Series title: Computer methods for macromolecular sequence analysis) 266: 460–480.Google Scholar
  4. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J., (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acids Res. 25: 3389–3402.CrossRefGoogle Scholar
  5. Andreeva, A., Howorth, D., Brenner, S.E., Hubbard, T.J.P., Chothia, C., and Murzin, A.G. (2004). SCOP database in 2004: refinements integrate structure and sequence family data. Nucl. Acid Res. 32: D226–D229.Google Scholar
  6. Attwood, T.K. (2002) The PRINTS database: a resource for identification of protein families.Google Scholar
  7. Ball, C.A., Sherlock, G., Parkinson, H., Rocca-Sera, P., Brooksbank, C., Causton, H.C., Cavalieri, D., Gaasterland, T., Hingamp, P., Holstege, F., Ringwald, M., Spellman, P., Stoeckert, C.J. Jr, Stewart, J.E., Taylor, R., Brazma, A. and Quackenbush, J. (2002) An open letter to the scientific journals. Published in Science 298(5593): 539 and Bioinformatics 18(11):1409.Google Scholar
  8. Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J. and Wheeler, D.L. (2003) GenBank. Nucl. Acids. Res. 31: 23–27.CrossRefGoogle Scholar
  9. Bowie, J.U., Luthy, R. and Eisenberg, D. (1991) A method to identify protein sequences that fold into a known three-dimensional structure. Science 253: 164–170.Google Scholar
  10. Bowie, J.U., Zhang, K., Wilmanns, M. and Eisenberg D (1996) Three-dimensional profiles for measuring compatibility of amino acid sequence with threedimensional structure. Methods Enzymol. (Series title: Computer methods for macromolecular sequence analysis) 266: 598–616.Google Scholar
  11. Branden, C. and Tooze, J. (1999) Introduction to Protein Structure. 2nd Ed., Garland Science Publishing, New YorkGoogle Scholar
  12. Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C, Aach J, Ansorge, W., Ball, C.A., Causton, H.C., Gaasterland, T., Glenisson, P., Holstege, F.C.P., Kim, I.F., Markowitz, V., Matese, J.C., Parkinson H., Robinson, A., Sarkans, U., Schulze-Kremer, S., Stewart, J., Taylor, R., Vilo, J. and Vingron, M. (2001) Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nature Genetics 29: 365–371CrossRefGoogle Scholar
  13. Bryant, S.H. and Lawrence, C.E. (1993) An empirical energy function for threading protein sequence through the fold motif. Proteins Struct. Funct. Genet. 16: 92–112.Google Scholar
  14. Burset, M. and Guigo, R. (1996) Evaluation of Gene Structure Prediction Programs. Genomics 34: 353–367CrossRefGoogle Scholar
  15. Chou, P.Y. and Fasman, G.D. (1978) Prediction of the secondary structure of proteins from their amino acid sequence. Adv. Enzymol. Relat. Areas Mol. Biol. 47: 45–147.Google Scholar
  16. Dayhoff, M.O., Schwartz, R.M. and Orcutt BC (1978) A model of evolutionary change in proteins. Atlas of Protein Science and Structure, vol. 5,supplement 3, National Biomedical Research Foundation, Washington, DC, pp. 345–351Google Scholar
  17. Dovichi, N.J. and Zhang, J.Z. (2001) DNA sequencing by capillary array electrophoresis. Methods Mol. Bio. 167: 225–239.Google Scholar
  18. Eddy, S.R. (1998) Profile Hidden Markov models. Bioinformatics 14: 755–763.CrossRefGoogle Scholar
  19. Felsenstein, J. (1993) PHYLIP 3.5 (phylogeny inference package). Department of Genetics, University of Washington, Seattle.Google Scholar
  20. Felsenstein, J. (1996) Inferring phylogeny from protein sequences by parsimony, distance and likelihood methods. Methods Enzymol. (Series title: Computer methods for macromolecular sequence analysis) 266: 418–427.Google Scholar
  21. Feng, D.F. and Doolittle, R.F. (1996) Progressive alignment of amino acid sequences and construction of phylogenetic trees from them. Methods Enzymol. (Series title: Computer methods for macromolecular sequence analysis) 266: 368–382.Google Scholar
  22. Fickett, J.W. (1982) Recognition of protein coding regions in DNA sequences. Nucl. Acids Res. 10: 5303–5318.Google Scholar
  23. Fickett, J.W. (1996) Finding genes by computer: the state of the art. Trends Genet. 12: 316–320.CrossRefGoogle Scholar
  24. Fickett, J.W. and Tung, C.S. (1992). Assessment of protein coding measures. Nucl. Acids Res. 20: 6641–6450.Google Scholar
  25. Frishman, D. and Argos, P. (1996) Incorporation of long-distance interactions into a secondary structure prediction algorithm. Protein Engineering 9: 133–142.Google Scholar
  26. Frishman D and Argos, P. (1997) 75% accuracy in protein secondary structure prediction. Proteins 27: 329–335.CrossRefGoogle Scholar
  27. Galperin, M.Y. (2004) The Molecular Biology Database Collection: 2004 update. Nucl. Acids Res. 32: D3–D22.CrossRefGoogle Scholar
  28. Garnier, J., Osguthorpe, D.J. and Robson, B. (1978) Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J. Mol. Biol. 120: 97–120.CrossRefGoogle Scholar
  29. Garnier, J., Gilbrat, J.F. and Robson, B. (1996) GOR method for predicting protein secondary structure from amino acid sequence. Methods Enzymol. (Series title: Computer methods for macromolecular sequence analysis) 266: 540–553.Google Scholar
  30. Geer, R.C. and Sayers, E.W. (2003) Entrez: Making use of its power. Briefings in Bioinformatics 4: 1779–184Google Scholar
  31. Gibbs, A.J., McIntyre, G.A. (1970) The diagram, a method for comparing sequences. Its use with amino acid and nucleotide sequences. Eur. J. Biochem. 16: 1–11.CrossRefGoogle Scholar
  32. Graur, D., Li, W.H. (2000) Fundamentals of molecular evolution. (2nd ed.) Sinauer Associates, Sunderland, Massachusetts.Google Scholar
  33. Grosse, I., Buldyrev, S.V., Stanley, H.E., Holste, D. and Herzel, H. (2000) Average mutual information of coding and noncoding DNA. Pacific Symposium on Biocomputing 5: 611–620.Google Scholar
  34. Guigo, R. (1999) DNA Composition, Codon Usage and Exon Prediction. In: Genetic Databases, (ed. M.J. Bishop), chap. 4, pp. 53–80, Academic Press.Google Scholar
  35. Henikoff, S., Henikoff, J.G. (1992) Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89: 10915–10919.Google Scholar
  36. Henikoff, S. and Henikoff, J.G. (1994) Protein family classification based on searching a database of blocks. Genomics 19: 97–107.CrossRefGoogle Scholar
  37. Henikoff, J.G., Greene, E.A., Pietrokovski, S. and Henikoff S (2000) Increased coverage of protein families with the blocks database servers. Nucl. Acids Res. 28: 228–230.CrossRefGoogle Scholar
  38. Herzel, H. and Grosse, I. (1995) Measuring correlations in symbol sequences. Physica A 216: 518–542.MathSciNetGoogle Scholar
  39. Hawkins, J.D. (1988) A survey on intron and exon lengths. Nucl. Acids Res. 16: 9893–9908.Google Scholar
  40. Helt, G.A., Lewis, S., Loraine, A.E. and Rubin, G.M. (1998) BioViews: Java-based tools for genomic data visualization. Genome Res. 8: 291–305.Google Scholar
  41. Hoersch, S., Leroy, C., Brown, N.P., Andrade, M.A., and Sander, C. (2000) The GeneQuiz Web server: protein functional analysis through the Web. Trends in Biochem. Sci. 25: 33–35.Google Scholar
  42. Holm, L. and Sander, C. (1993) Protein structure comparison by alignment of distance matrices. J. Mol. Biol. 233: 23–138CrossRefGoogle Scholar
  43. Holm, L. and Sander, C. (1996a) Mapping the protein universe. Science 273: 595–602.Google Scholar
  44. Holm, L. and Sander, C. (1996b) The FSSP database: fold classification based on structure-structure alignment of proteins. Nucl. Acids Res. 24: 206–209CrossRefGoogle Scholar
  45. Hughey, R. and Krogh, A. (1996) Hidden Markov models for sequence analysis: Extension and the analysis of the basic method. Comput. Appl. Biosci. 12: 95–107.Google Scholar
  46. Huang, J.Y. and Brutlag, D.L. (2001). The eMOTIF database. Nucl. Acids Res. 29: 202–204.Google Scholar
  47. Johnson, M.S., May, A.C. and Ridionov, M.A., Overington JP (1996) Discrimination of common protein folds: Application of protein structure to sequence/structure comparisons. Methods Enzymol. (Series title: Computer methods for macromolecular sequence analysis) 266: 575–598.Google Scholar
  48. Jones, D.T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292: 195–202.CrossRefGoogle Scholar
  49. Jones, D.T., Taylor, W.R. and Thornton, J.M. (1992a) A new approach to protein fold recognition. Nature 358: 86–89.CrossRefGoogle Scholar
  50. Jones, D.T., Taylor, W.R. and Thornton, J.M. (1992b) The rapid generation of mutation data matrices from protein sequences. Comp. Appl. Biosci. 8: 275–282.Google Scholar
  51. Kabsch, W. and Sander, C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22: 2577–2637.CrossRefGoogle Scholar
  52. Kim, J., Pramanik. S. and Chung, M.J. (1994). Multiple sequence alignment by simulated annealing. Comput. Appl. Biosci. 10: 419–426.Google Scholar
  53. Konopka, A.K. (1994) Structure and Methods: VI. Human Genome Initiative and DNA Recombination, chapter Towards Mapping Functional Domains in Indiscriminantly Sequenced Nucleic Acids: A Computational Approach. Adenine Press, Guilderland, New York.Google Scholar
  54. Kulikova, T., Aldebert, P., Althorpe, N., Baker, W., Bates, K. and Browne, P., van den Broek A, Cochrane G, Duggan K, Eberhardt R, Faruque N, Garcia-Pastor M, Harte N, Kanz C, Leinonen R, Lin Q, Lombard V, Lopez R, Mancuso R, McHale M, Nardone F, Silventoinen V, Stoehr P, Stoesser G, Tuli MA, Tzouvara K, Vaughan R, Wu D and Zhu W, Apweiler R (2004) The EMBL Nucleotide Sequence Database. Nucl. Acids Res. 32: D27–D30.CrossRefGoogle Scholar
  55. Lathrop, R.H., Rogers R.G. Jr., Bienkowska J., Bryant B.K.M, Buturovic L.J., Gaitatzes C., Nambudripad R., White J.V., and Smith T.F. (1998). Analysis and algorithms for protein sequence-structure alignment. Computational methods in molecular biology. S. Salzberg, D. Searls, and S. Kasif Eds. Elsevier Press. Amsterdam, Chapter 12, pp. 227–283.Google Scholar
  56. Lathrop, R.H., Rogers, R.G. Jr., Bienkowska, J., Bryant, B.K.M., Buturovic, L.J., Gaitatzes, C., Nambudripad, R., White, J.V., Smith, T.F. (1988) Analysis and algorithms for protein sequence-structure alignment. New Compr. Biochem. (Series title: Computational methods in molecular biology) 32: 337–355.Google Scholar
  57. Lemer, C.M., Rooman, M.J. and Wodak, S.J. (1995) Protein structure prediction by threading methods: evaluation of current techniques. Proteins 23(3): 337–55.CrossRefGoogle Scholar
  58. Li, W. (1997) The study of correlation structures of DNA sequences: a critical review. Computer and Chemistry 21: 257–271.Google Scholar
  59. Li, W.H. (1997) Molecular evolution. Sinauer Associates, Sunderland, Massachusetts.Google Scholar
  60. Liew, A.W.C., Wu, Y., Yan, H. and Yang, M. (2004) A Study on the Effective Statistical Coding Features for Coding/Non-coding DNA Sequence Classification for Yeast, C. elegans and Human. Submitted.Google Scholar
  61. Lippmann, R.P. (1987) An introduction to computing with neural nets. IEEE ASSP Magazine. 4(2): 4–22.CrossRefGoogle Scholar
  62. Lo Conte, L., Brenner, S.E., Hubbard, T.J.P., Chothia, C. and Murzin, A. (2002) SCOP database in 2002: refinements accommodate structural genomics. Nucl. Acid Res. 30: 264–267.Google Scholar
  63. Lukashin, A.V., Borodovsky, M. (1998) GeneMark.hmm: new solutions for gene finding. Nucl. Acids Res. 26: 1107–1115.CrossRefGoogle Scholar
  64. Madej, T., Gibrat, J.F. and Bryant, S.H. (1995) Threading a database of protein cores. Proteins, 23: 356–369.CrossRefGoogle Scholar
  65. Markel, S. and Leon, D. (2003) Sequence Analysis in a nutshell: a guide to common tools and databases. O’Reilly and Associates, Inc., USAGoogle Scholar
  66. Martz, E. (2003) 3D molecular visualization with Protein Explorer. In: Introduction to Bioinformatics: A Theoretical and Practical Approach, (S.A. Krawetz, D.D. Womble eds.), Humana Press, Totowa, New JerseyGoogle Scholar
  67. Maizel, J.V. Jr. and Lenk, R.P. (1981) Enhanced Graphic Matrix Analysis of Nucleic Acid and Protein Sequences. Proc. Natl. Acad. Sci. USA. 78; 7665–7669MathSciNetGoogle Scholar
  68. Mathe, C., Sagot, M.F., Schiex, T. and Rouze, P. (2002) Current methods of gene prediction, their strengths and weakness — survey and summary. Nucl. Acids Res. 30: 4103–4117CrossRefGoogle Scholar
  69. McGuffin, L.J., Bryson, K. and Jones D.T. (2000) The PSIPRED protein structure prediction server. Bioinformatics 16: 404–405.CrossRefGoogle Scholar
  70. Mirny, L.A. and Shakhnovich, E.I. (1998) Protein structure prediction by threading-Why it works and why it does not. J. Mol. Biol. 283(2): 507–526.CrossRefGoogle Scholar
  71. Mirny, L.A., Finkelstein, A.V. and Shakhnovich, E.I. (2000) Statistical significance of protein structure prediction by threading. Proc. Natl. Acad. Sci. USA. 97(18): 9978–9983.CrossRefGoogle Scholar
  72. Miyazaki, S., Sugawara, H., Gojobori, T. and Tateno, Y. (2003) DNA Data Bank of Japan (DDBJ) in XML. Nucl. Acids. Res. 31: 13–16.CrossRefGoogle Scholar
  73. Miyazaki, S., Sugawara, H., Ikeo, K., Gojobori, T. and Tateno, Y. (2004). DDBJ in the stream of various biological data. Nucl. Acids. Res. 32: D31–D34.CrossRefGoogle Scholar
  74. Mizuguchi, K., Blundell, T.L. (2000) Analysis of conservation and substitutions of secondary structure elements within protein superfamilies. Bioinformatics 16: 1111–1119.CrossRefGoogle Scholar
  75. Mount, D.W. (2001) Bioinformatics — Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press, New York.Google Scholar
  76. Murzin, A.G., Brenner, S.E., Hubbard, T. and Chothia, C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247: 536–540.CrossRefGoogle Scholar
  77. Needleman, S.B. and Wunsch, C.D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Bio. 48: 443–453.Google Scholar
  78. Notredame, C. and Higgins, D.G. (1996) SAGA: Sequence alignment by genetic algorithm. Nucl. Acids Res. 24: 1515–1524.CrossRefGoogle Scholar
  79. Orengo, C.A., Michie, A.D., Jones S., Jones D.T., Swindells M.B., and Thornton J.M. (1997). CATH-A Hierarchic Classification of Protein Domain Structures. Structure 5(8): 1093–1108.CrossRefGoogle Scholar
  80. Panchenko, A.R., Marchler-Bauer, A., Bryant, S.H. (2000) Combination of threading potentials and sequence profiles improves fold recognition. J. Mol. Biol. 296(5): 1319–1331.CrossRefGoogle Scholar
  81. Pearl, F.M.G., Lee, D., Bray, J.E,, Sillitoe, I., Todd A.E. and Harrison A.P., Thornton J.M., and Orengo C.A. (2000). Assigning genomic sequences to CATH. Nucl. Acids Res. 28(1): 277–282. Andrade MA, Brown NP, Leroy C, Hoersch S, de Daruvar A, Reich C, Franchini A, Tamames J, Valencia A, Ouzounis C, Sander C (1999) Automated genome sequence analysis and annotation. Bioinformatics 15: 391–412.CrossRefGoogle Scholar
  82. Pearson, W.R. (1990) Rapid and sensitive comparison with FASTP and FASTA. Methods Enzymol. (Series tile: Molecular evolution: computer analysis of protein and nucleic acid sequences) 183: 63–98.Google Scholar
  83. Pearson, W.R. and Lipman, D.J. (1988) Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U.S.A. 85: 2444–8.Google Scholar
  84. Rost, B. (1996) PHD: predicting one-dimensional protein structure by profile based neural networks. Methods Enzymol. (Series title: Computer methods for macromolecular sequence analysis) 266: 525–539.Google Scholar
  85. Rost, B. and Sander, C. (1993) Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol. 232: 584–599.CrossRefGoogle Scholar
  86. Rost, B. and Sander, C. (1994) Combining evolutionary information and neural networks to predict protein secondary structure. Proteins 19: 55–77.CrossRefGoogle Scholar
  87. Rumelhart, D.E., Hinton, G.E. and Williams, R.J. (1986) Learning internal representations by error propagation. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Vol. 1: Foundations. D.E. Rumelhart and J.L. McClelland Eds. MIT Press, pp 318–362.Google Scholar
  88. Salamov, A.A. and Solovyev, V.V. (1995) Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiply sequence alignments. J. Mol. Biol. 247: 11–15.CrossRefGoogle Scholar
  89. Salamov, A.A. and Solovyev, V.V. (1997) Protein secondary structure prediction using local alignments. J. Mol. Biol. 268: 31–36.CrossRefGoogle Scholar
  90. Salzberg, S.L., Delcher, A.L., Kasif, S. and White, O. (1998a) Microbial gene identification using interpolated Markov models. Nucl. Acids Res. 26: 544–548.CrossRefGoogle Scholar
  91. Salzberg, S.L., Delcher, A.L., Fasman, K.H. and Henderson, J. (1998b) A decision tree system for finding genes in DNA. J. of Comp. Biol. 5: 667–680.Google Scholar
  92. Sanger, F., Nicklen, S. and Coulson, A.R. (1977) DNA sequencing with chain terminating inhibitors. Proc. Natl. Acad. Sci. USA. 74: 5463–5467.Google Scholar
  93. Schwartz, R.M. and Dayhoff, M.O. (1978) Matrices for detecting distant relationships. Atlas of Protein Science and Structure, vol. 5,supplement 3, National Biomedical Research Foundation, Washington, DC, pp 353–358.Google Scholar
  94. Serov, V.N. and Spirov, A.V., Samsonova MG (1998) Graphical interface to the genetic network database GeNet. Bioinformatics 14: 546–547.CrossRefGoogle Scholar
  95. Shapiro, L. and Harris, T. (2000) Finding function through structural genomics. Current Opinion in Biotechnology 11: 31–35.CrossRefGoogle Scholar
  96. Siddiqui, A.S., Dengler, U. and Barton, G.J. (2001) 3Dee: A database of protein structural domains. Bioinformatics 17: 200–201.CrossRefGoogle Scholar
  97. Sigrist, C.J., Cerutti, L., Hulo, N., Gattiker, A., Falquet, L., Pagni, M., Bairoch, A., Bucher, P.. (2002) PROSITE: a documented database using patterns and profiles as motif descriptors.Google Scholar
  98. Smith, T.F. and Waterman, M.S. (1981a) Identification of common molecular subsequences. J. Mol. Bio. 147: 195–197.Google Scholar
  99. Smith, T.F. and Waterman, M.S. (1981b). Comparison of biosequences. Adv. Appl. Math. 2: 482–489.CrossRefMathSciNetMATHGoogle Scholar
  100. Sonnhammer, E.L.L., Eddy, S.R., Birney, E., Bateman, A. and Durbin, R. (1998) Pfam: multiple sequence alignments and HMM-profiles of protein domains. Nucl. Acids Res. 26: 320–322.CrossRefGoogle Scholar
  101. Staden, R. (1990) Finding protein coding regions in genomic sequences. Methods Enzymol. (Series title: Molecular evolution: computer analysis of protein and nucleic acid sequences) 183: 163–80.Google Scholar
  102. Staden R, McLachlan AD (1982) Codon preference and its use in identifying proteinGoogle Scholar
  103. States, D.J., Boguski, M.S. (1991) Similarity and homology. In: Sequence Analysis Primer, (ed. M. Gribskov and J. Devereux), pp. 92–124, Stockton Press, New York.Google Scholar
  104. Swofford, D.L., Olsen, G.J., Waddell, P.J., Hillis, D.M. (1996) Phylogenetic inference. In Molecular Systematics 2nd ed., (ed. D.M. Hillis et al.), chap. 5, pp 407–514, Sinauer Associates, Sunderland, Massachusetts.Google Scholar
  105. Tamura, K. and Nei, M. (1993) Estimation of the number of nucleotide substitutions in the control region of mitochandrail DNA in humans and chimpanzees. Mol. Bio. Evol. 10: 512–526.Google Scholar
  106. Tateno, Y., Imanishi, T., Miyazaki, S., Fukami-Kobayashi, K. and Saitou, N., Sugawara H, Gojobori T (2002) DNA Data Bank of Japan (DDBJ) for genome scale research in life science. Nucl. Acids. Res. 30: 27–30.CrossRefGoogle Scholar
  107. Thanaraj, T.A. (2000) Positional characterisation of false positives from computational prediction of human splice sites. Nucl. Acids Res. 28: 744–754.CrossRefGoogle Scholar
  108. Thiele, R., Zimmer, R. and Lengauer, T. (1999) Protein threading by recursive dynamic programming. J. Mol. Biol. 290(3): 757–779.CrossRefGoogle Scholar
  109. Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl. Acids Res. 22: 4673–4680.Google Scholar
  110. Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F. and Higgins, D.G. (1997) The CLUSTAL X windows interface: Flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucl. Acids Res. 25: 4876–4882.CrossRefGoogle Scholar
  111. Tiwari, S., Ramachandran, S., Bhattacharya, A., Bhattacharya, S., Ramaswamy, R. (1997) Prediction of probable genes by fourier analysis of genomic sequences. Computer Applications in the Biosciences 13: 263–270.Google Scholar
  112. Uberbacher, E.C., Xu, Y. and Mural, R.J. (1996) Discovering and understanding genes in human DNA sequence using GRAIL. Methods Enzymol. (Series title: Computer methods for macromolecular sequence analysis) 266: 259–281.Google Scholar
  113. Vilo, J., Kapushesky, M., Kemmeren, P., Sarkans, U. and Brazma, A. (2003) Expression Profiler. In: The analysis of gene expression data: methods and software (Parmigiani G; Garrett E; Irizarry R; Zeger S L, eds.), Springer, NY.Google Scholar
  114. Wang Y., Zhang C.T., and Dong P. (2002). Recognizing shorter coding regions of human genes based on the statistics of stop codons. Biopolymers 63(3): 207–216.CrossRefGoogle Scholar
  115. Williams, G. (1999) Nucleic acid and protein sequence databases. In: Genetic Databases, (ed. M.J. Bishop), chap.2, pp. 11–37, Academic Press.Google Scholar
  116. Wu, S., Liew, A.W.C. and Yan, H. (2003) Cluster Analysis of Gene Expression Data Based on Self-Splitting and Merging Competitive Learning. To appear in IEEE Transactions on Information Technology in Biomedicine.Google Scholar
  117. Wu, Y., Liew, A.W.C., Yan, H. and Yang, M. (2003a) DB-Curve: A Novel 2D Method of DNA Sequence Visualization and Representation. Chem. Phys. Lett. 367: 170–176.CrossRefGoogle Scholar
  118. Wu, Y., Liew, A.W.C., Yan, H. and Yang, M. (2003b) Classification of short human exons and introns based on statistical features. Phys. Rev. E. 67(6): Art. No. 061916.Google Scholar
  119. Zhang, M.Q. (1997) Identification of protein coding regions in the human genome by quadratic discriminant analysis. Proc. Natl. Acad. Sci. USA. 94: 565–568.Google Scholar
  120. Zhang, C.T., Wang, J. (2000) Recognition of protein coding genes in the Yeast genome at better than 95% accuracy based on the Z curve. Nucl. Acids Res. 28: 2804–2814.Google Scholar
  121. Zhang, R. and Zhang, C.T. (1994). Z Curves, an Intuitive Tool for Visualizing and Analyzing DNA sequences. Journal Biomolecular Structure Dynamics 11: 767–782.Google Scholar

Copyright information

© Springer-Verlag Berlin Hiedelberg 2005

Authors and Affiliations

  • A. W. -C. Liew
    • 1
  • Hong Yan
    • 1
    • 2
  • Mengsu Yang
    • 3
  1. 1.Department of Computer Engineering and Information TechnologyCity University of Hong KongKowloonHong Kong
  2. 2.School of Electrical and Information EngineeringUniversity of SydneyAustralia
  3. 3.Department of Biology and ChemistryCity University of Hong KongKowloonHong Kong

Personalised recommendations