Skip to main content

Protein Remote Homology Detection Based on Binary Profiles

  • Conference paper
Bioinformatics Research and Development (BIRD 2007)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4414))

Included in the following conference series:

Abstract

Remote homology detection is a key element of protein structure and function analysis in computational and experimental biology. This paper presents a simple representation of protein sequences, which uses the evolutionary information of profiles for efficient remote homology detection. The frequency profiles are directly calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into binary profiles with a probability threshold. Such binary profiles make up of a new building block for protein sequences. The protein sequences are mapped into high-dimensional vectors by the occurrence times of each binary profile. The resulting vectors are then evaluated by support vector machine to train classifiers that are then used to classify the test protein sequences. The method is further improved by applying an efficient feature extraction algorithm from natural language processing, namely, the latent semantic analysis model. Testing on the SCOP 1.53 database shows that the method based on binary profiles outperforms those based on many other basic building blocks including N-grams, patters and motifs. The ROC50 score is 0.698, which is higher than other methods by nearly 10 percent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Weston, J., Leslie, C., Zhou, D., Noble, W.S.: Semi-supervised protein classification using cluster kernels. Journal. Cambridge, Mass., 595-602

    Google Scholar 

  2. Darzentas, N., Rigoutsos, I., Ouzounis, C.A.: Sensitive detection of sequence similarity using combinatorial pattern discovery: A challenging study of two distantly related protein families. Proteins 61, 926–937 (2005)

    Article  Google Scholar 

  3. Li, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. Journal of computational biology 10, 857–868 (2003)

    Article  Google Scholar 

  4. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)

    Article  Google Scholar 

  5. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)

    Google Scholar 

  6. Pearson, W.R.: Rapid and sensitive sequence comparison with fastp and fasta. Methods Enzymol. 183, 63–98 (1990)

    Article  Google Scholar 

  7. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J.H., Zhang, Z., Miller, W., Lipman, D.J.: Gapped blast and psi-blast: A new generation of protein database search programs. Nucleic Acids Research. 25, 3389–3402 (1997)

    Article  Google Scholar 

  8. Karplus, K., Barrett, C., Hughey, R.: Hidden markov models for detecting remote protein homologies. Bioinformatics 14, 846–856 (1998)

    Article  Google Scholar 

  9. Qian, B., Goldstein, R.A.: Performance of an iterated t-hmm for homology detection. Bioinformatics 20, 2175–2180 (2004)

    Article  Google Scholar 

  10. Vapnik, V.N.: Statistical learning theory. Wiley, Chichester (1998)

    MATH  Google Scholar 

  11. Jaakkola, T., Diekhans, M., Haussler, D.: A discriminative framework for detecting remote protein homologies. J. Comput. Biol. 7, 95–114 (2000)

    Article  Google Scholar 

  12. Leslie, C., Eskin, E., Noble, W.S.: The spectrum kernel: A string kernel for svm protein classification. Journal, 564-575

    Google Scholar 

  13. Leslie, C., Eskin, E., Cohen, A., Weston, J., Noble, S.W.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20, 467–476 (2004)

    Article  Google Scholar 

  14. Hou, Y., Hsu, W., Lee, M.L., Bystroff, C.: Efficient remote homology detection using local structure. Bioinformatics 19, 2294–2301 (2003)

    Article  Google Scholar 

  15. Saigo, H., Vert, J.P., Ueda, N., Akutsu, T.: Protein homology detection using string alignment kernels. Bioinformatics 20, 1682–1689 (2004)

    Article  Google Scholar 

  16. Saigo, H., Vert, J.P., Akutsu, T., Ueda, N.: Comparison of svm-based methods for remote homology detection. Genome Informatics 13, 396–397 (2002)

    Google Scholar 

  17. Dowd, S.E., Zaragoza, J., Rodriguez, J.R., Oliver, M.J., Payton, P.R.: Windows. Net network distributed basic local alignment search toolkit (w.Nd-blast). BMC Bioinformatics 6, 93 (2005)

    Article  Google Scholar 

  18. Dong, Q.W., Wang, X.L., Lin, L.: Application of latent semantic analysis to protein remote homology detection. Bioinformatics 22, 285–290 (2006)

    Article  Google Scholar 

  19. Bellegarda, J.: Exploiting latent semantic information in statistical language modeling. Proc. IEEE. 88, 1279–1296 (2000)

    Article  Google Scholar 

  20. Dong, Q.W., Lin, L., Wang, X.L., Li, M.H.: A pattern-based svm for protein remote homology detection. Journal 4, 3363-3368, Guangzhou, China

    Google Scholar 

  21. Ben-Hur, A., Brutlag, D.: Remote homology detection: A motif based approach. Bioinformatics 19(Suppl. 1), i26–33 (2003)

    Article  Google Scholar 

  22. Holm, L., Sander, C.: Removing near-neighbour redundancy from large protein sequence collections. Bioinformatics 14, 423–429 (1998)

    Article  Google Scholar 

  23. Henikoff, S., Henikoff, J.G.: Position-based sequence weights. J. Mol. Biol. 243, 574–578 (1994)

    Article  Google Scholar 

  24. Landauer, T.K., Foltz, P.W., Laham, D.: Introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)

    Article  Google Scholar 

  25. Andreeva, A., Howorth, D., Brenner, S.E., Hubbard, T.J.P., Chothia, C., Murzin, A.G.: Scop database in 2004: Refinements integrate structure and sequence family data. Nucleic Acids Research 32, D226–D229 (2004)

    Article  Google Scholar 

  26. Chandonia, J.M., Hon, G., Walker, N.S., Conte, L.L., Koehl, P., Levitt, M., Brenner, S.E.: The astral compendium in 2004. Nucleic acids research 32, 189–192 (2004)

    Article  Google Scholar 

  27. Gribskov, M., Robinson, N.L.: Use of receiver operating characteristic(roc) analysis to evaluate sequence matching. Computers and Chemistry 20, 25–33 (1996)

    Article  Google Scholar 

  28. Bailey, T.L., Grundy, W.N.: Classifying proteins by family using the product of correlated p-values. Journal, 10-14

    Google Scholar 

  29. Krogh, A., Brown, M., Mian, I.S., Sjolander, K., Haussler, D.: Hidden markov models in computational biology: Applications to protein modeling. Journal of Molecular Biology 235, 1501–1531 (1994)

    Article  Google Scholar 

  30. Dong, Q.W., Wang, X.I., Lin, L.: Novel knowledge-based mean force potential at the profile level. BMC Bioinformatics 7, 324 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Sepp Hochreiter Roland Wagner

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Dong, Q., Lin, L., Wang, X. (2007). Protein Remote Homology Detection Based on Binary Profiles. In: Hochreiter, S., Wagner, R. (eds) Bioinformatics Research and Development. BIRD 2007. Lecture Notes in Computer Science(), vol 4414. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71233-6_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71233-6_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71232-9

  • Online ISBN: 978-3-540-71233-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics