Skip to main content

PSS-3D1D: an improved 3D1D profile method of protein fold recognition for the annotation of twilight zone sequences

Abstract

Annotation of any newly determined protein sequence depends on the pairwise sequence identity with known sequences. However, for the twilight zone sequences which have only 15–25% identity, the pair-wise comparison methods are inadequate and the annotation becomes a challenging task. Such sequences can be annotated by using methods that recognize their fold. Bowie et al. described a 3D1D profile method in which the amino acid sequences that fold into a known 3D structure are identified by their compatibility to that known 3D structure. We have improved the above method by using the predicted secondary structure information and employ it for fold recognition from the twilight zone sequences. In our Protein Secondary Structure 3D1D (PSS-3D1D) method, a score (w) for the predicted secondary structure of the query sequence is included in finding the compatibility of the query sequence to the known fold 3D structures. In the benchmarks, the PSS-3D1D method shows a maximum of 21% improvement in predicting correctly the α + β class of folds from the sequences with twilight zone level of identity, when compared with the 3D1D profile method. Hence, the PSS-3D1D method could offer more clues than the 3D1D method for the annotation of twilight zone sequences. The web based PSS-3D1D method is freely available in the PredictFold server at http://bioinfo.bdu.ac.in/servers/.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. 1.

    Doolittle RF (1986) Of urfs and orfs: a primer on how to analyze derived amino acid sequences. University Science Books, Mill Valley

    Google Scholar 

  2. 2.

    Goonesekere NC, Shipely K, O’Connor K (2010) The challenge of annotating protein sequences: the tale of eight domains of unknown function in Pfam. Comput Biol Chem 34(3):210–214

    PubMed  Article  CAS  Google Scholar 

  3. 3.

    Pieper U, Eswar N, Stuart AC, Ilyin VA, Sali A (2002) MODBASE, a database of annotated comparative protein structure models. Nucleic Acids Res 30(1):255–259

    PubMed  Article  CAS  Google Scholar 

  4. 4.

    Rost B (1999) Twilight zone of protein sequence alignments. Protein Eng 12(2):85–94

    PubMed  Article  CAS  Google Scholar 

  5. 5.

    Zhou H, Zhou Y (2005) Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments. Proteins 58(2):321–328

    PubMed  Article  CAS  Google Scholar 

  6. 6.

    Przybylski D, Rost B (2004) Improving fold recognition without folds. J Mol Biol 341(1):255–269

    PubMed  Article  CAS  Google Scholar 

  7. 7.

    Jaroszewski L, Rychlewski L, Zhang B, Godzik A (1998) Fold prediction by a hierarchy of sequence, threading, and modeling methods. Protein Sci 7(6):1431–1440

    PubMed  Article  CAS  Google Scholar 

  8. 8.

    Kryshtafovych A, Venclovas C, Fidelis K, Moult J (2005) Progress over the first decade of CASP experiments. Proteins 61(Suppl 7):225–236

    PubMed  Article  CAS  Google Scholar 

  9. 9.

    Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410

    PubMed  CAS  Google Scholar 

  10. 10.

    Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402

    PubMed  Article  CAS  Google Scholar 

  11. 11.

    Koretke KK, Russell RB, Lupas AN (2001) Fold recognition from sequence comparisons. Proteins Suppl 5:68–75

    Article  Google Scholar 

  12. 12.

    David R, Korenberg MJ, Hunter IW (2000) 3D-1D threading methods for protein fold recognition. Pharmacogenomics 1(4):445–455

    PubMed  Article  CAS  Google Scholar 

  13. 13.

    Fischer D, Eisenberg D (1996) Protein fold recognition using sequence-derived predictions. Protein Sci 5(5):947–955

    PubMed  Article  CAS  Google Scholar 

  14. 14.

    Jones DT, Taylor WR, Thornton JM (1992) A new approach to protein fold recognition. Nature 358(6381):86–89

    PubMed  Article  CAS  Google Scholar 

  15. 15.

    Kelley LA, MacCallum RM, Sternberg MJ (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol 299(2):499–520

    PubMed  Article  CAS  Google Scholar 

  16. 16.

    Meller J, Elber R (2002) Protein recognition by sequence-to-structure fitness: bridging efficiency and capacity of threading models. Adv Chem Phys 120:77130

    Google Scholar 

  17. 17.

    Ogata K, Ohya M, Umeyama H (1998) Amino acid similarity matrix for homology modeling derived from structural alignment and optimized by the Monte Carlo method. J Mol Graph Model 16(4-6):178–89

    PubMed  CAS  Google Scholar 

  18. 18.

    Rost B, Schneider R, Sander C (1997) Protein fold recognition by prediction-based threading. J Mol Biol 270(3):471–480

    PubMed  Article  CAS  Google Scholar 

  19. 19.

    Teodorescu O, Galor T, Pillardy J, Elber R (2004) Enriching the sequence substitution matrix by structural information. Proteins 54(1):41–48

    PubMed  Article  CAS  Google Scholar 

  20. 20.

    Abual-Rub M, Abdullah R (2008) A survey of protein fold recognition algorithms. J Comput Sci 4(9):768–776

    Article  Google Scholar 

  21. 21.

    Cheng J, Baldi P (2006) A machine learning information retrieval approach to protein fold recognition. Bioinformatics 22(12):1456–1463

    PubMed  Article  CAS  Google Scholar 

  22. 22.

    Eyrich VA, Przybylski D, Koh IY, Grana O, Pazos F, Valencia A, Rost B (2003) CAFASP3 in the spotlight of EVA. Proteins 53(Suppl 6):548–560

    PubMed  Article  CAS  Google Scholar 

  23. 23.

    Fariselli P, Rossi I, Capriotti E, Casadio R (2006) The WWWH of remote homolog detection: the state of the art. Brief Bioinform 8(2):78–87

    PubMed  Article  Google Scholar 

  24. 24.

    Ginalski K, Pas J, Wyrwicz LS, von Grotthuss M, Bujnicki JM, Rychlewski L (2003) ORFeus: detection of distant homology using sequence profiles and predicted secondary structure. Nucleic Acids Res 31(13):3804–3807

    PubMed  Article  CAS  Google Scholar 

  25. 25.

    Kelley LA, Sternberg MJE (2009) Protein structure prediction on the web: a case study using the phyre server. Nat Protoc 4(3):363–371

    PubMed  Article  CAS  Google Scholar 

  26. 26.

    Zhang Y, Arakaki AK, Skolnick J (2005) TASSER: an automated method for the prediction of protein tertiary structures in CASP6. Proteins 61(Suppl 7):91–98

    PubMed  Article  CAS  Google Scholar 

  27. 27.

    Bowie JU, Luthy R, Eisenberg D (1991) A method to identify protein sequences that fold into a known three-dimensional structure. Science 253(5016):164–170

    PubMed  Article  CAS  Google Scholar 

  28. 28.

    Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247(4):536–540

    PubMed  CAS  Google Scholar 

  29. 29.

    Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28(1):235–242

    PubMed  Article  CAS  Google Scholar 

  30. 30.

    Kelley LA, Sutcliffe MJ (1997) OLDERADO: on-line database of ensemble representatives and domains. Protein Sci 6(12):2628–2630

    PubMed  Article  CAS  Google Scholar 

  31. 31.

    Fano R (1961) Transmission of information: a statistical theory of communications. University Science Books, Cambridge

    Google Scholar 

  32. 32.

    Garnier J, Gibrat JF, Robson B (1996) GOR method for predicting protein secondary structure from amino acid sequence. Methods Enzymol 266:540–553

    PubMed  Article  CAS  Google Scholar 

  33. 33.

    Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197

    PubMed  Article  CAS  Google Scholar 

  34. 34.

    Pearson WR (1998) Empirical statistical estimates for sequence similarity searches. J Mol Biol 276(1):71–84

    PubMed  Article  CAS  Google Scholar 

  35. 35.

    Mitrophanov AY, Borodovsky M (2006) Statistical significance in biological sequence analysis. Briefings Bioinf 7(1):2–24

    Article  CAS  Google Scholar 

  36. 36.

    Combet C, Blanchet C, Geourjon C, Delage G (2000) NPS@: network protein sequence analysis. Trends Biochem Sci 25(3):147–150

    PubMed  Article  CAS  Google Scholar 

  37. 37.

    Chandonia JM, Hon G, Walker NS, LoConte L, Koehl P, Levitt M, Brenner SE (2004) The ASTRAL compendium in 2004. Nucleic Acids Res 32:D189–D192

    PubMed  Article  CAS  Google Scholar 

  38. 38.

    Rice P, Longden I, Bleasby A (2000) EMBOSS: the european molecular biology open software suite. Trends Genet 16(6):276–277

    PubMed  Article  CAS  Google Scholar 

  39. 39.

    Gribskov M, Robinson NL (1996) Use of Receiver Operating Characteristic (ROC) analysis to evaluate sequence matching. Comput Chem 20(6):25–33

    PubMed  Article  CAS  Google Scholar 

  40. 40.

    Jia Y, Huan J, Buhr V, Zhang J, Carayannopoulos LN (2009) Towards comprehensive structural motif mining for better fold annotation in the twilight zone of sequence dissimilarity. BMC Bioinform 10(Suppl 1):S46

    Article  Google Scholar 

  41. 41.

    Gerstein M, Levitt M (1998) Comprehensive assessment of automatic structural alignment against a manual standard, the SCOP classification of proteins. Protein Sci 7(2):445–456

    PubMed  Article  CAS  Google Scholar 

  42. 42.

    McGuffin LJ, Bryson K, Jones DT (2001) What are the baselines for protein fold recognition. Bioinformatics 17(1):63–72

    PubMed  Article  CAS  Google Scholar 

Download references

Acknowledgments

This work forms part of the research project funded by the Department of Information Technology (DIT), Govt. of India, New Delhi. One of the authors (KG) gratefully acknowledges the support provided by the DIT in the form a senior research fellowship.

Author information

Affiliations

Authors

Corresponding author

Correspondence to S. Parthasarathy.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Ganesan, K., Parthasarathy, S. PSS-3D1D: an improved 3D1D profile method of protein fold recognition for the annotation of twilight zone sequences. J Struct Funct Genomics 12, 181–189 (2011). https://doi.org/10.1007/s10969-011-9119-x

Download citation

Keywords

  • Protein fold
  • Twilight zone
  • Sequence annotation