Skip to main content

A Discriminative Method for Protein Remote Homology Detection Based on N-nary Profiles

  • Conference paper
Bioinformatics Research and Development (BIRD 2008)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 13))

Included in the following conference series:

Abstract

Protein homology detection is a key problem in computational biology. In this paper, a novel building block for protein called N-nary profile which contains the evolutionary information of protein sequence frequency profiles has been presented. The protein sequence frequency profiles calculated from the multiple sequence alignments outputted by PSI-BLAST are converted into N-nary profiles. Such N-nary profiles are filtered by a feature selection algorithm called chi-square algorithm. The protein sequences are transformed into fixed-dimension feature vectors by the occurrence times of each N-nary profile and then the corresponding vectors are inputted to support vector machine (SVM). The latent semantic analysis (LSA) model, an efficient feature extraction algorithm, is adopted to further improve the performance of this method. When tested on the SCOP 1.53 data set, the prediction performance of N-nary profile method outperforms all compared methods of protein remote homology detection. The ROC50 score is 0.736, which is higher than the current best method for nearly 4 percent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Smith, T.F., Waterman, M.S.: Identification of Common Molecular Subsequences. J. Mol. Biol. 147, 195–197 (1981)

    Article  Google Scholar 

  2. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic Local Alignment Search Tool. J. Mol. Biol. 215, 403–410 (1990)

    Google Scholar 

  3. Pearson, W.R.: Rapid and Sensitive Sequence Comparison with Fastp and Fasta. Methods Enzymol. 183, 63–98 (1990)

    Article  Google Scholar 

  4. Rost, B.: Twilight zone of protein sequence alignments. Protein Eng. 12, 85–94 (1999)

    Article  Google Scholar 

  5. Thomas, L.: Remote homology detection based on oligomer distances. Bioinformatics 22, 2224–2231 (2006)

    Article  Google Scholar 

  6. Karplus, K., Barrett, C., Hughey, R.: Hidden Markov Models for Detecting Remote Protein Homologies. Bioinformatics 14, 846–856 (1998)

    Article  Google Scholar 

  7. Qian, B., Goldstein, R.A.: Performance of an Iterated T-Hmm for Homology Detection. Bioinformatics 20, 2175–2180 (2004)

    Article  Google Scholar 

  8. Vapnik, V.N.: Statistical Learning Theory. New York (1998)

    Google Scholar 

  9. Jaakkola, T., Diekhans, M., Haussler, D.: A Discriminative Framework for Detecting Remote Protein Homologies. J. Comput. Biol. 7, 95–114 (2000)

    Article  Google Scholar 

  10. Li, L., Noble, W.S.: Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural Relationships. J. Comput. Biol. 10, 857–868 (2003)

    Article  Google Scholar 

  11. Leslie, C., Eskin, E., Noble, W.S.: The Spectrum Kernel: A String Kernel for svm Protein Classification. In: Pacific Symposium on Biocomputing, pp. 566–575 (2002)

    Google Scholar 

  12. Leslie, C., Eskin, E., Cohen, A., Weston, J., Noble, S.W.: Mismatch String Kernels for Discriminative Protein Classification. Bioinformatics 20, 467–476 (2004)

    Article  Google Scholar 

  13. Hou, Y., Hsu, W., Lee, M.L., Bystroff, C.: Efficient Remote Homology Detection Using Local Structure. Bioinformatics 19, 2294–2301 (2003)

    Article  Google Scholar 

  14. Ogul, H., Mumcuoglu, E.: A discriminative method for remote homology detection based on n-peptide compositions with reduced amino acid alphabets. BioSystems 87, 75–81 (2007)

    Article  Google Scholar 

  15. Håndstad, T., Hestnes, A.J., Sætrom, P.: Motif kernel generated by genetic programming improves remote homology and fold detection. BMC Bioinformatics 8, 23 (2007)

    Article  Google Scholar 

  16. Saigo, H., Vert, J.P., Ueda, N., Akutsu, T.: Protein Homology Detection Using String Alignment Kernels. Bioinformatics 20, 1682–1689 (2004)

    Article  Google Scholar 

  17. Saigo, H., Vert, J.P., Akutsu, T., Ueda, N.: Comparison of Svm-Based Methods for Remote Homology Detection. Genome Informatics 13, 396–397 (2002)

    Google Scholar 

  18. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J.H., Zhang, Z., Miller, W., Lipman, D.J.: Gapped Blast and Psi-Blast: A New Generation of Protein Database Search Programs. Nucleic Acids Research 25, 3389–3402 (1997)

    Article  Google Scholar 

  19. Dowd, S.E., Zaragoza, J., Rodriguez, J.R., Oliver, M.J., Payton, P.R.: Windows.Net Network Distributed Basic Local Alignment Search Toolkit (W.Nd-Blast). BMC Bioinformatics. 6, 93 (2005)

    Article  Google Scholar 

  20. Dong, Q.W., Lin, L., Wang, X.L.: Protein Remote Homology Detection Based on Binary Profiles. In: Hochreiter, S., Wagner, R. (eds.) BIRD 2007. LNCS (LNBI), vol. 4414, pp. 212–223. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  21. Dong, Q.W., Wang, X.L., Lin, L.: Application of Latent Semantic Analysis to Protein Remote Homology Detection. Bioinformatics 22, 285–290 (2006)

    Article  Google Scholar 

  22. Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for remote protein homology detection. In: 6th Annual International Conference on Research in Computational Molecular Biology, pp. 225–232 (2002)

    Google Scholar 

  23. Chandonia, J.M., Hon, G., Walker, N.S., Conte, L.L., Koehl, P., Levitt, M., Brenner, S.E.: The astral compendium in 2004. Nucleic acids research 32, 189–192 (2004)

    Article  Google Scholar 

  24. Holm, L., Sander, C.: Removing near-neighbour redundancy from large protein sequence collections. Bioinformatics 14, 423–429 (1998)

    Article  Google Scholar 

  25. Henikoff, S., Henikoff, J.G.: Position-Based Sequence Weights. J. Mol. Biol. 243, 574–578 (1994)

    Article  Google Scholar 

  26. Andreeva, A., Howorth, D., Brenner, S.E., Hubbard, T.J.P., Chothia, C., Murzin, A.G.: Scop Database in 2004: Refinements Integrate Structure and Sequence Family Data. Nucleic Acids Research 32, 226–229 (2004)

    Article  Google Scholar 

  27. Dong, Q.W., Lin, L., Wang, X.L., Li, M.H.: A Pattern-Based svm for Protein Remote Homology Detection. In: 4th international conference on machine learning and cybernetics, GuangZhou, China, pp. 3363–3368 (2005)

    Google Scholar 

  28. Yang, Y., Pedersen, J.A.: A comparative study on feature selection in text categorization. In: 14th international conference on machine learning, San Francisco, USA, pp. 412–420 (1997)

    Google Scholar 

  29. Ganapathiraju, M., et al.: Characterization of protein secondary structure, Application of latent semantic analysis using different vocabularies. IEEE Signal Processing Magazine 21, 78–87 (2004)

    Article  Google Scholar 

  30. Landauer, T.K., Foltz, P.W., Laham, D.: Introduction to Latent Semantic Analysis. Discourse Processes 25, 259–284 (1998)

    Article  Google Scholar 

  31. Ben-Hur, A., Brutlag, D.: Remote homology detection: A motif based approach. Bioinformatics 19(suppl. 1), i26–i33 (2003)

    Article  Google Scholar 

  32. Gribskov, M., Robinson, N.L.: Use of Receiver Operating Characteristic (Roc) Analysis to Evaluate Sequence Matching. Computers and Chemistry 20, 25–33 (1996)

    Article  Google Scholar 

  33. Bailey, T.L., Grundy, W.N.: Classifying Proteins by Family Using the Product of Correlated P-Values. In: 3rd international conference on computational molecular biology (RECOMB 1999), pp. 10–14 (1999)

    Google Scholar 

  34. Krogh, A., Brown, M., Mian, I.S., Sjolander, K., Haussler, D.: Hidden Markov Models in Computational Biology: Applications to Protein Modeling. Journal of Molecular Biology 235, 1501–1531 (1994)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Mourad Elloumi Josef Küng Michal Linial Robert F. Murphy Kristan Schneider Cristian Toma

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, B., Lin, L., Wang, X., Dong, Q., Wang, X. (2008). A Discriminative Method for Protein Remote Homology Detection Based on N-nary Profiles. In: Elloumi, M., Küng, J., Linial, M., Murphy, R.F., Schneider, K., Toma, C. (eds) Bioinformatics Research and Development. BIRD 2008. Communications in Computer and Information Science, vol 13. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70600-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-70600-7_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-70598-7

  • Online ISBN: 978-3-540-70600-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics