Journal of Structural and Functional Genomics

, Volume 12, Issue 4, pp 181–189 | Cite as

PSS-3D1D: an improved 3D1D profile method of protein fold recognition for the annotation of twilight zone sequences

  • K. Ganesan
  • S. ParthasarathyEmail author


Annotation of any newly determined protein sequence depends on the pairwise sequence identity with known sequences. However, for the twilight zone sequences which have only 15–25% identity, the pair-wise comparison methods are inadequate and the annotation becomes a challenging task. Such sequences can be annotated by using methods that recognize their fold. Bowie et al. described a 3D1D profile method in which the amino acid sequences that fold into a known 3D structure are identified by their compatibility to that known 3D structure. We have improved the above method by using the predicted secondary structure information and employ it for fold recognition from the twilight zone sequences. In our Protein Secondary Structure 3D1D (PSS-3D1D) method, a score (w) for the predicted secondary structure of the query sequence is included in finding the compatibility of the query sequence to the known fold 3D structures. In the benchmarks, the PSS-3D1D method shows a maximum of 21% improvement in predicting correctly the α + β class of folds from the sequences with twilight zone level of identity, when compared with the 3D1D profile method. Hence, the PSS-3D1D method could offer more clues than the 3D1D method for the annotation of twilight zone sequences. The web based PSS-3D1D method is freely available in the PredictFold server at


Protein fold Twilight zone Sequence annotation 



This work forms part of the research project funded by the Department of Information Technology (DIT), Govt. of India, New Delhi. One of the authors (KG) gratefully acknowledges the support provided by the DIT in the form a senior research fellowship.


  1. 1.
    Doolittle RF (1986) Of urfs and orfs: a primer on how to analyze derived amino acid sequences. University Science Books, Mill ValleyGoogle Scholar
  2. 2.
    Goonesekere NC, Shipely K, O’Connor K (2010) The challenge of annotating protein sequences: the tale of eight domains of unknown function in Pfam. Comput Biol Chem 34(3):210–214PubMedCrossRefGoogle Scholar
  3. 3.
    Pieper U, Eswar N, Stuart AC, Ilyin VA, Sali A (2002) MODBASE, a database of annotated comparative protein structure models. Nucleic Acids Res 30(1):255–259PubMedCrossRefGoogle Scholar
  4. 4.
    Rost B (1999) Twilight zone of protein sequence alignments. Protein Eng 12(2):85–94PubMedCrossRefGoogle Scholar
  5. 5.
    Zhou H, Zhou Y (2005) Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments. Proteins 58(2):321–328PubMedCrossRefGoogle Scholar
  6. 6.
    Przybylski D, Rost B (2004) Improving fold recognition without folds. J Mol Biol 341(1):255–269PubMedCrossRefGoogle Scholar
  7. 7.
    Jaroszewski L, Rychlewski L, Zhang B, Godzik A (1998) Fold prediction by a hierarchy of sequence, threading, and modeling methods. Protein Sci 7(6):1431–1440PubMedCrossRefGoogle Scholar
  8. 8.
    Kryshtafovych A, Venclovas C, Fidelis K, Moult J (2005) Progress over the first decade of CASP experiments. Proteins 61(Suppl 7):225–236PubMedCrossRefGoogle Scholar
  9. 9.
    Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410PubMedGoogle Scholar
  10. 10.
    Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402PubMedCrossRefGoogle Scholar
  11. 11.
    Koretke KK, Russell RB, Lupas AN (2001) Fold recognition from sequence comparisons. Proteins Suppl 5:68–75CrossRefGoogle Scholar
  12. 12.
    David R, Korenberg MJ, Hunter IW (2000) 3D-1D threading methods for protein fold recognition. Pharmacogenomics 1(4):445–455PubMedCrossRefGoogle Scholar
  13. 13.
    Fischer D, Eisenberg D (1996) Protein fold recognition using sequence-derived predictions. Protein Sci 5(5):947–955PubMedCrossRefGoogle Scholar
  14. 14.
    Jones DT, Taylor WR, Thornton JM (1992) A new approach to protein fold recognition. Nature 358(6381):86–89PubMedCrossRefGoogle Scholar
  15. 15.
    Kelley LA, MacCallum RM, Sternberg MJ (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol 299(2):499–520PubMedCrossRefGoogle Scholar
  16. 16.
    Meller J, Elber R (2002) Protein recognition by sequence-to-structure fitness: bridging efficiency and capacity of threading models. Adv Chem Phys 120:77130Google Scholar
  17. 17.
    Ogata K, Ohya M, Umeyama H (1998) Amino acid similarity matrix for homology modeling derived from structural alignment and optimized by the Monte Carlo method. J Mol Graph Model 16(4-6):178–89PubMedGoogle Scholar
  18. 18.
    Rost B, Schneider R, Sander C (1997) Protein fold recognition by prediction-based threading. J Mol Biol 270(3):471–480PubMedCrossRefGoogle Scholar
  19. 19.
    Teodorescu O, Galor T, Pillardy J, Elber R (2004) Enriching the sequence substitution matrix by structural information. Proteins 54(1):41–48PubMedCrossRefGoogle Scholar
  20. 20.
    Abual-Rub M, Abdullah R (2008) A survey of protein fold recognition algorithms. J Comput Sci 4(9):768–776CrossRefGoogle Scholar
  21. 21.
    Cheng J, Baldi P (2006) A machine learning information retrieval approach to protein fold recognition. Bioinformatics 22(12):1456–1463PubMedCrossRefGoogle Scholar
  22. 22.
    Eyrich VA, Przybylski D, Koh IY, Grana O, Pazos F, Valencia A, Rost B (2003) CAFASP3 in the spotlight of EVA. Proteins 53(Suppl 6):548–560PubMedCrossRefGoogle Scholar
  23. 23.
    Fariselli P, Rossi I, Capriotti E, Casadio R (2006) The WWWH of remote homolog detection: the state of the art. Brief Bioinform 8(2):78–87PubMedCrossRefGoogle Scholar
  24. 24.
    Ginalski K, Pas J, Wyrwicz LS, von Grotthuss M, Bujnicki JM, Rychlewski L (2003) ORFeus: detection of distant homology using sequence profiles and predicted secondary structure. Nucleic Acids Res 31(13):3804–3807PubMedCrossRefGoogle Scholar
  25. 25.
    Kelley LA, Sternberg MJE (2009) Protein structure prediction on the web: a case study using the phyre server. Nat Protoc 4(3):363–371PubMedCrossRefGoogle Scholar
  26. 26.
    Zhang Y, Arakaki AK, Skolnick J (2005) TASSER: an automated method for the prediction of protein tertiary structures in CASP6. Proteins 61(Suppl 7):91–98PubMedCrossRefGoogle Scholar
  27. 27.
    Bowie JU, Luthy R, Eisenberg D (1991) A method to identify protein sequences that fold into a known three-dimensional structure. Science 253(5016):164–170PubMedCrossRefGoogle Scholar
  28. 28.
    Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247(4):536–540PubMedGoogle Scholar
  29. 29.
    Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28(1):235–242PubMedCrossRefGoogle Scholar
  30. 30.
    Kelley LA, Sutcliffe MJ (1997) OLDERADO: on-line database of ensemble representatives and domains. Protein Sci 6(12):2628–2630PubMedCrossRefGoogle Scholar
  31. 31.
    Fano R (1961) Transmission of information: a statistical theory of communications. University Science Books, CambridgeGoogle Scholar
  32. 32.
    Garnier J, Gibrat JF, Robson B (1996) GOR method for predicting protein secondary structure from amino acid sequence. Methods Enzymol 266:540–553PubMedCrossRefGoogle Scholar
  33. 33.
    Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197PubMedCrossRefGoogle Scholar
  34. 34.
    Pearson WR (1998) Empirical statistical estimates for sequence similarity searches. J Mol Biol 276(1):71–84PubMedCrossRefGoogle Scholar
  35. 35.
    Mitrophanov AY, Borodovsky M (2006) Statistical significance in biological sequence analysis. Briefings Bioinf 7(1):2–24CrossRefGoogle Scholar
  36. 36.
    Combet C, Blanchet C, Geourjon C, Delage G (2000) NPS@: network protein sequence analysis. Trends Biochem Sci 25(3):147–150PubMedCrossRefGoogle Scholar
  37. 37.
    Chandonia JM, Hon G, Walker NS, LoConte L, Koehl P, Levitt M, Brenner SE (2004) The ASTRAL compendium in 2004. Nucleic Acids Res 32:D189–D192PubMedCrossRefGoogle Scholar
  38. 38.
    Rice P, Longden I, Bleasby A (2000) EMBOSS: the european molecular biology open software suite. Trends Genet 16(6):276–277PubMedCrossRefGoogle Scholar
  39. 39.
    Gribskov M, Robinson NL (1996) Use of Receiver Operating Characteristic (ROC) analysis to evaluate sequence matching. Comput Chem 20(6):25–33PubMedCrossRefGoogle Scholar
  40. 40.
    Jia Y, Huan J, Buhr V, Zhang J, Carayannopoulos LN (2009) Towards comprehensive structural motif mining for better fold annotation in the twilight zone of sequence dissimilarity. BMC Bioinform 10(Suppl 1):S46CrossRefGoogle Scholar
  41. 41.
    Gerstein M, Levitt M (1998) Comprehensive assessment of automatic structural alignment against a manual standard, the SCOP classification of proteins. Protein Sci 7(2):445–456PubMedCrossRefGoogle Scholar
  42. 42.
    McGuffin LJ, Bryson K, Jones DT (2001) What are the baselines for protein fold recognition. Bioinformatics 17(1):63–72PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2011

Authors and Affiliations

  1. 1.Department of Bioinformatics, School of Life SciencesBharathidasan UniversityTiruchirappalliIndia

Personalised recommendations