In Silico Protein Motif Discovery and Structural Analysis

  • Catherine Mooney
  • Norman Davey
  • Alberto J.M. Martin
  • Ian Walsh
  • Denis C. Shields
  • Gianluca Pollastri
Part of the Methods in Molecular Biology book series (MIMB, volume 760)


A wealth of in silico tools is available for protein motif discovery and structural analysis. The aim of this chapter is to collect some of the most common and useful tools and to guide the biologist in their use. A detailed explanation is provided for the use of Distill, a suite of web servers for the prediction of protein structural features and the prediction of full-atom 3D models from a protein sequence. Besides this, we also provide pointers to many other tools available for motif discovery and secondary and tertiary structure prediction from a primary amino acid sequence. The prediction of protein intrinsic disorder and the prediction of functional sites and SLiMs are also briefly discussed. Given that user queries vary greatly in size, scope and character, the trade-offs in speed, accuracy and scale need to be considered when choosing which methods to adopt.

Key words

Protein structure prediction secondary structure disorder functional sites SLiMs 



C.M. is supported by Science Foundation Ireland (SFI) grant 08/IN.1/B1864. ND is supported by an EMBL Interdisciplinary Postdoc (EIPOD) fellowship. CM, GP, IW and AJMM were partly supported by SFI grant 05/RFP/CMS0029, grant RP/2005/219 from the Health Research Board of Ireland, a UCD President’s Award 2004 and UCD Seed Funding 2009 award SF371.


  1. 1.
    The UniProt Consortium (2008) The Universal Protein Resource (UniProt). Nucleic Acids Res 36, D190–D195.CrossRefGoogle Scholar
  2. 2.
    Berman, H., Westbrook, J., Feng, Z., et al. (2000) The Protein Data Bank. Nucleic Acids Res 28, 235–242.PubMedCrossRefGoogle Scholar
  3. 3.
    Aloy, P., Pichaud, M., Russell, R. (2005) Protein complexes: structure prediction challenges for the 21st century. Curr Opin Struct Biol 15, 15–22.PubMedCrossRefGoogle Scholar
  4. 4.
    Chothia, C., Lesk, A. (1986) The relation between the divergence of sequence and structure in proteins. EMBO J 5, 823–826.PubMedGoogle Scholar
  5. 5.
    Chandonia, J., Brenner, S. (2006) The impact of structural genomics: expectations and outcomes. Science 311, 347.PubMedCrossRefGoogle Scholar
  6. 6.
    Moult, J. (2008) Comparative modeling in structural genomics. Structure 16, 14–16.PubMedCrossRefGoogle Scholar
  7. 7.
    Altschul, S., Madden, T., Schaffer, A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389.PubMedCrossRefGoogle Scholar
  8. 8.
    Baù D, Martin, A., Mooney, C., et al. (2006) Distill: a suite of web servers for the prediction of one-, two- and three-dimensional structural features of proteins. BMC Bioinformatics 7, 402.PubMedCrossRefGoogle Scholar
  9. 9.
    Pollastri, G., McLysaght, A. (2005) Porter: a new, accurate server for protein secondary structure prediction. Bioinformatics 21, 1719–1720.PubMedCrossRefGoogle Scholar
  10. 10.
    Vullo, A., Walsh, I., Pollastri, G. (2006) A two-stage approach for improved prediction of residue contact maps. BMC Bioinformatics 7, 180.PubMedCrossRefGoogle Scholar
  11. 11.
    Mooney, C., Vullo, A., Pollastri, G. (2006) Protein structural motif prediction in multidimensional phi–psi space leads to improved secondary structure prediction. J Comput Biol 13, 1489–1502.PubMedCrossRefGoogle Scholar
  12. 12.
    Pollastri, G., Martin, A., Mooney, C., Vullo, A. (2007) Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information. BMC Bioinformatics 8, 201.PubMedCrossRefGoogle Scholar
  13. 13.
    Vullo, A., Bortolami, O., Pollastri, G., Tosatto, S. (2006) Spritz: a server for the prediction of intrinsically disordered regions in protein sequences using kernel machines. Nucleic Acids Res 34, W164.PubMedCrossRefGoogle Scholar
  14. 14.
    Walsh, I., Martin, A., Mooney, C., et al. (2009) Ab initio and homology based prediction of protein domains by recursive neural networks. BMC Bioinformatics 10, 195.PubMedCrossRefGoogle Scholar
  15. 15.
    Walsh, I., Baù, D., Martin, A., et al. (2009) Ab initio and template-based prediction of multi-class distance maps by two-dimensional recursive neural networks. BMC Struct Biol 9, 5.PubMedCrossRefGoogle Scholar
  16. 16.
    Sims, G., Choi, I., Kim, S. (2005) Protein conformational space in higher order ψ– ϕ maps. Proc Natl Acad Sci USA 18, 618–621.CrossRefGoogle Scholar
  17. 17.
    Mooney, C., Pollastri, G. (2009) Beyond the Twilight Zone: automated prediction of structural properties of proteins by recursive neural networks and remote homology information. Proteins 77, 181–190.PubMedCrossRefGoogle Scholar
  18. 18.
    Suzek, B., Huang, H., McGarvey, P., et al. (2007) UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23, 1282.PubMedCrossRefGoogle Scholar
  19. 19.
    Montgomerie, S., Sundararaj, S., Gallin, W., Wishart, D. (2006) Improving the accuracy of protein secondary structure prediction using structural alignment. BMC Bioinformatics 7, 301.PubMedCrossRefGoogle Scholar
  20. 20.
    Cheng, J., Randall, A., Sweredoski, M., Baldi, P. (2005) SCRATCH: a protein structure and structural feature prediction server. Nucleic Acids Res 33, W72.PubMedCrossRefGoogle Scholar
  21. 21.
    Cole, C., Barber, J., Barton, G. (2008) The Jpred 3 secondary structure prediction server. Nucleic Acids Res 36, W197–W201.PubMedCrossRefGoogle Scholar
  22. 22.
    Jones, D. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292, 195–202.PubMedCrossRefGoogle Scholar
  23. 23.
    Adamczak, R., Porollo, A., Meller, J. (2005) Combining prediction of secondary structure and solvent accessibility in proteins. Proteins 59, 467–475.PubMedCrossRefGoogle Scholar
  24. 24.
    Moult, J., Fidelis, K., Kryshtafovych, A., et al. (2009) Critical assessment of methods of protein structure prediction – Round VIII. Proteins 77, 1–4.PubMedCrossRefGoogle Scholar
  25. 25.
    Zhang, Y. (2009) I-TASSER: Fully automated protein structure prediction in CASP8. Proteins 77, 100.PubMedCrossRefGoogle Scholar
  26. 26.
    Hildebrand, A., Remmert, M., Biegert, A., Söding, J. (2009) Fast and accurate automatic structure prediction with HHpred. Proteins 77, 128–132.PubMedCrossRefGoogle Scholar
  27. 27.
    Eswar, N., Webb, B., Marti-Renom, M., et al. (2007) Comparative protein structure modeling using Modeller. Curr Protoc Protein Sci 50:2.9.1–2.9.31.Google Scholar
  28. 28.
    Raman, S., Vernon, R., Thompson, J., et al. (2009) Structure prediction for CASP8 with all-atom refinement using Rosetta. Proteins 77, 89–99.PubMedCrossRefGoogle Scholar
  29. 29.
    Kalinina, O., Gelfand, M., Russell, R. (2009) Combining specificity determining and conserved residues improves functional site prediction. BMC Bioinformatics 10, 174.PubMedCrossRefGoogle Scholar
  30. 30.
    Landau, M., Mayrose, I., Rosenberg, Y., et al. (2005) ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures. Nucleic Acids Res 33, W299.PubMedCrossRefGoogle Scholar
  31. 31.
    Morgan, D., Kristensen, D., Mittelman, D., Lichtarge, O. (2006) ET viewer: an application for predicting and visualizing functional sites in protein structures. Bioinformatics 22, 2049.PubMedCrossRefGoogle Scholar
  32. 32.
    Hernandez, M., Ghersi, D., Sanchez, R. (2009) SITEHOUND-web: a server for ligand binding site identification in protein structures. Nucleic Acids Res 37, W413–W416.PubMedCrossRefGoogle Scholar
  33. 33.
    Dyson, H., Wright, P. (2005) Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol 6, 197–208.PubMedCrossRefGoogle Scholar
  34. 34.
    Dosztanyi, Z., Csizmok, V., Tompa, P., Simon, I. (2005) IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 21, 3433.PubMedCrossRefGoogle Scholar
  35. 35.
    Diella, F., Haslam, N., Chica, C., et al. (2008) Understanding eukaryotic linear motifs and their role in cell signaling and regulation. Front Biosci 13, 6580–6603.PubMedCrossRefGoogle Scholar
  36. 36.
    Neduva, V., Russell, R. (2006) Peptides mediating interaction networks: new leads at last. Curr Opin Biotechnol 17, 465–471.PubMedCrossRefGoogle Scholar
  37. 37.
    Neduva, V., Russell, R. (2005) Linear motifs: evolutionary interaction switches. FEBS Lett 579, 3342–3345.PubMedCrossRefGoogle Scholar
  38. 38.
    Puntervoll, P., Linding, R., Gemund, C., et al. (2003) ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins. Nucleic Acids Res 31, 3625.PubMedCrossRefGoogle Scholar
  39. 39.
    Gould, C., Diella, F., Via, A., et al. (2010) ELM: the status of the 2010 eukaryotic linear motif resource. Nucleic Acids Res 38, D167.PubMedCrossRefGoogle Scholar
  40. 40.
    Balla, S., Thapar, V., Verma, S., et al. (2006) Minimotif Miner: a tool for investigating protein function. Nat Methods 3, 175–177.PubMedCrossRefGoogle Scholar
  41. 41.
    Rajasekaran, S., Balla, S., Gradie, P., et al. (2009) Minimotif miner 2nd release: a database and web system for motif search. Nucleic Acids Res 37, D185.PubMedCrossRefGoogle Scholar
  42. 42.
    Bateman, A., Birney, E., Cerruti, L., et al. (2002) The Pfam protein families database. Nucleic Acids Res 30, 276.PubMedCrossRefGoogle Scholar
  43. 43.
    Finn, R., Mistry, J., Tate, J., et al. (2009) The Pfam protein families database. Nucleic Acids Res 36, 281–288.CrossRefGoogle Scholar
  44. 44.
    Letunic, I., Doerks, T., Bork, P. (2008) SMART 6: recent updates and new developments. Nucleic Acids Res 1, 4.Google Scholar
  45. 45.
    Ashburner, M., Ball, C., Blake, J., et al. (2000) Gene Ontology: tool for the unification of biology. Nat Genet 25, 25–29.PubMedCrossRefGoogle Scholar
  46. 46.
    Edwards, R., Davey, N., Shields, D. (2007) SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins. PloS One 2, e967.PubMedCrossRefGoogle Scholar
  47. 47.
    Neduva, V., Linding, R., Su-Angrand, I., et al. (2005) Systematic discovery of new recognition peptides mediating protein interaction networks. PLoS Biol 3, 2090.CrossRefGoogle Scholar
  48. 48.
    Mészáros B, Simon, I., Dosztányi Z (2009) Prediction of protein binding regions in disordered proteins. PLoS Comput Biol 5, 5.CrossRefGoogle Scholar
  49. 49.
    Edwards, R., Davey, N., Shields, D. (2008) CompariMotif: quick and easy comparisons of sequence motifs. Bioinformatics 24, 1307.PubMedCrossRefGoogle Scholar
  50. 50.
    Chica, C., Labarga, A., Gould, C., et al. (2008) A tree-based conservation scoring method for short linear motifs in multiple alignments of protein sequences. BMC Bioinformatics 9, 229.PubMedCrossRefGoogle Scholar
  51. 51.
    Dinkel, H., Sticht, H. (2007) A computational strategy for the prediction of functional linear peptide motifs in proteins. Bioinformatics 23, 3297.PubMedCrossRefGoogle Scholar
  52. 52.
    Petsalaki, E., Stark, A., García-Urdiales, E., Russell, R. (2009) Accurate prediction of peptide binding sites on protein surfaces. PLoS Comput Biol 5, e1000335.PubMedCrossRefGoogle Scholar
  53. 53.
    Michael, S., Trave, G., Ramu, C., et al. (2008) Discovery of candidate KEN-box motifs using cell cycle keyword enrichment combined with native disorder prediction and motif conservation. Bioinformatics 24, 453.PubMedCrossRefGoogle Scholar
  54. 54.
    Diella, F., Chabanis, S., Luck, K., et al. (2009) KEPE—a motif frequently superimposed on sumoylation sites in metazoan chromatin proteins and transcription factors. Bioinformatics 25, 1.PubMedCrossRefGoogle Scholar
  55. 55.
    Copley, R. (2005) The EH 1 motif in metazoan transcription factors. BMC Genomics 6, 169.PubMedCrossRefGoogle Scholar
  56. 56.
    Davey, N., Edwards, R., Shields, D. (2010) Computational identification and analysis of protein short linear motifs. Front Biosci 15, 801–825.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Catherine Mooney
    • 1
    • 2
    • 3
  • Norman Davey
    • 4
  • Alberto J.M. Martin
    • 1
    • 5
    • 6
  • Ian Walsh
    • 1
    • 5
    • 6
  • Denis C. Shields
    • 1
    • 2
    • 3
  • Gianluca Pollastri
    • 1
    • 6
  1. 1.Complex and Adaptive Systems LaboratoryUniversity College DublinBelfield, Dublin 4Ireland
  2. 2.Conway Institute of Biomolecular and Biomedical ResearchUniversity College DublinBelfield, Dublin 4Ireland
  3. 3.School of Medicine and Medical ScienceUniversity College DublinBelfield, Dublin 4Ireland
  4. 4.EMBL Structural and Computational Biology UnitHeidelbergGermany
  5. 5.Biocomputing UP, Department of BiologyUniversity of PaduaPadovaItaly
  6. 6.School of Computer Science and InformaticsUniversity College DublinBelfield, Dublin 4Ireland

Personalised recommendations