In Silico Protein Motif Discovery and Structural Analysis

Mooney, Catherine; Davey, Norman; Martin, Alberto J.M.; Walsh, Ian; Shields, Denis C.; Pollastri, Gianluca

doi:10.1007/978-1-61779-176-5_21

Catherine Mooney^3,4,5,
Norman Davey⁶,
Alberto J.M. Martin^3,7,8,
Ian Walsh^3,7,8,
Denis C. Shields^3,4,5 &
…
Gianluca Pollastri^3,8

Part of the book series: Methods in Molecular Biology ((MIMB,volume 760))

2649 Accesses
1 Citations
1 Altmetric

Abstract

A wealth of in silico tools is available for protein motif discovery and structural analysis. The aim of this chapter is to collect some of the most common and useful tools and to guide the biologist in their use. A detailed explanation is provided for the use of Distill, a suite of web servers for the prediction of protein structural features and the prediction of full-atom 3D models from a protein sequence. Besides this, we also provide pointers to many other tools available for motif discovery and secondary and tertiary structure prediction from a primary amino acid sequence. The prediction of protein intrinsic disorder and the prediction of functional sites and SLiMs are also briefly discussed. Given that user queries vary greatly in size, scope and character, the trade-offs in speed, accuracy and scale need to be considered when choosing which methods to adopt.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

The UniProt Consortium (2008) The Universal Protein Resource (UniProt). Nucleic Acids Res 36, D190–D195.
Article Google Scholar
Berman, H., Westbrook, J., Feng, Z., et al. (2000) The Protein Data Bank. Nucleic Acids Res 28, 235–242.
Article PubMed CAS Google Scholar
Aloy, P., Pichaud, M., Russell, R. (2005) Protein complexes: structure prediction challenges for the 21st century. Curr Opin Struct Biol 15, 15–22.
Article PubMed CAS Google Scholar
Chothia, C., Lesk, A. (1986) The relation between the divergence of sequence and structure in proteins. EMBO J 5, 823–826.
PubMed CAS Google Scholar
Chandonia, J., Brenner, S. (2006) The impact of structural genomics: expectations and outcomes. Science 311, 347.
Article PubMed CAS Google Scholar
Moult, J. (2008) Comparative modeling in structural genomics. Structure 16, 14–16.
Article PubMed CAS Google Scholar
Altschul, S., Madden, T., Schaffer, A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389.
Article PubMed CAS Google Scholar
Baù D, Martin, A., Mooney, C., et al. (2006) Distill: a suite of web servers for the prediction of one-, two- and three-dimensional structural features of proteins. BMC Bioinformatics 7, 402.
Article PubMed Google Scholar
Pollastri, G., McLysaght, A. (2005) Porter: a new, accurate server for protein secondary structure prediction. Bioinformatics 21, 1719–1720.
Article PubMed CAS Google Scholar
Vullo, A., Walsh, I., Pollastri, G. (2006) A two-stage approach for improved prediction of residue contact maps. BMC Bioinformatics 7, 180.
Article PubMed Google Scholar
Mooney, C., Vullo, A., Pollastri, G. (2006) Protein structural motif prediction in multidimensional phi–psi space leads to improved secondary structure prediction. J Comput Biol 13, 1489–1502.
Article PubMed CAS Google Scholar
Pollastri, G., Martin, A., Mooney, C., Vullo, A. (2007) Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information. BMC Bioinformatics 8, 201.
Article PubMed Google Scholar
Vullo, A., Bortolami, O., Pollastri, G., Tosatto, S. (2006) Spritz: a server for the prediction of intrinsically disordered regions in protein sequences using kernel machines. Nucleic Acids Res 34, W164.
Article PubMed CAS Google Scholar
Walsh, I., Martin, A., Mooney, C., et al. (2009) Ab initio and homology based prediction of protein domains by recursive neural networks. BMC Bioinformatics 10, 195.
Article PubMed Google Scholar
Walsh, I., Baù, D., Martin, A., et al. (2009) Ab initio and template-based prediction of multi-class distance maps by two-dimensional recursive neural networks. BMC Struct Biol 9, 5.
Article PubMed Google Scholar
Sims, G., Choi, I., Kim, S. (2005) Protein conformational space in higher order ψ– ϕ maps. Proc Natl Acad Sci USA 18, 618–621.
Article Google Scholar
Mooney, C., Pollastri, G. (2009) Beyond the Twilight Zone: automated prediction of structural properties of proteins by recursive neural networks and remote homology information. Proteins 77, 181–190.
Article PubMed CAS Google Scholar
Suzek, B., Huang, H., McGarvey, P., et al. (2007) UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23, 1282.
Article PubMed CAS Google Scholar
Montgomerie, S., Sundararaj, S., Gallin, W., Wishart, D. (2006) Improving the accuracy of protein secondary structure prediction using structural alignment. BMC Bioinformatics 7, 301.
Article PubMed Google Scholar
Cheng, J., Randall, A., Sweredoski, M., Baldi, P. (2005) SCRATCH: a protein structure and structural feature prediction server. Nucleic Acids Res 33, W72.
Article PubMed CAS Google Scholar
Cole, C., Barber, J., Barton, G. (2008) The Jpred 3 secondary structure prediction server. Nucleic Acids Res 36, W197–W201.
Article PubMed CAS Google Scholar
Jones, D. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292, 195–202.
Article PubMed CAS Google Scholar
Adamczak, R., Porollo, A., Meller, J. (2005) Combining prediction of secondary structure and solvent accessibility in proteins. Proteins 59, 467–475.
Article PubMed Google Scholar
Moult, J., Fidelis, K., Kryshtafovych, A., et al. (2009) Critical assessment of methods of protein structure prediction – Round VIII. Proteins 77, 1–4.
Article PubMed CAS Google Scholar
Zhang, Y. (2009) I-TASSER: Fully automated protein structure prediction in CASP8. Proteins 77, 100.
Article PubMed CAS Google Scholar
Hildebrand, A., Remmert, M., Biegert, A., Söding, J. (2009) Fast and accurate automatic structure prediction with HHpred. Proteins 77, 128–132.
Article PubMed CAS Google Scholar
Eswar, N., Webb, B., Marti-Renom, M., et al. (2007) Comparative protein structure modeling using Modeller. Curr Protoc Protein Sci 50:2.9.1–2.9.31.
Google Scholar
Raman, S., Vernon, R., Thompson, J., et al. (2009) Structure prediction for CASP8 with all-atom refinement using Rosetta. Proteins 77, 89–99.
Article PubMed CAS Google Scholar
Kalinina, O., Gelfand, M., Russell, R. (2009) Combining specificity determining and conserved residues improves functional site prediction. BMC Bioinformatics 10, 174.
Article PubMed Google Scholar
Landau, M., Mayrose, I., Rosenberg, Y., et al. (2005) ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures. Nucleic Acids Res 33, W299.
Article PubMed CAS Google Scholar
Morgan, D., Kristensen, D., Mittelman, D., Lichtarge, O. (2006) ET viewer: an application for predicting and visualizing functional sites in protein structures. Bioinformatics 22, 2049.
Article PubMed CAS Google Scholar
Hernandez, M., Ghersi, D., Sanchez, R. (2009) SITEHOUND-web: a server for ligand binding site identification in protein structures. Nucleic Acids Res 37, W413–W416.
Article PubMed CAS Google Scholar
Dyson, H., Wright, P. (2005) Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol 6, 197–208.
Article PubMed CAS Google Scholar
Dosztanyi, Z., Csizmok, V., Tompa, P., Simon, I. (2005) IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 21, 3433.
Article PubMed CAS Google Scholar
Diella, F., Haslam, N., Chica, C., et al. (2008) Understanding eukaryotic linear motifs and their role in cell signaling and regulation. Front Biosci 13, 6580–6603.
Article PubMed CAS Google Scholar
Neduva, V., Russell, R. (2006) Peptides mediating interaction networks: new leads at last. Curr Opin Biotechnol 17, 465–471.
Article PubMed CAS Google Scholar
Neduva, V., Russell, R. (2005) Linear motifs: evolutionary interaction switches. FEBS Lett 579, 3342–3345.
Article PubMed CAS Google Scholar
Puntervoll, P., Linding, R., Gemund, C., et al. (2003) ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins. Nucleic Acids Res 31, 3625.
Article PubMed CAS Google Scholar
Gould, C., Diella, F., Via, A., et al. (2010) ELM: the status of the 2010 eukaryotic linear motif resource. Nucleic Acids Res 38, D167.
Article PubMed CAS Google Scholar
Balla, S., Thapar, V., Verma, S., et al. (2006) Minimotif Miner: a tool for investigating protein function. Nat Methods 3, 175–177.
Article PubMed CAS Google Scholar
Rajasekaran, S., Balla, S., Gradie, P., et al. (2009) Minimotif miner 2nd release: a database and web system for motif search. Nucleic Acids Res 37, D185.
Article PubMed CAS Google Scholar
Bateman, A., Birney, E., Cerruti, L., et al. (2002) The Pfam protein families database. Nucleic Acids Res 30, 276.
Article PubMed CAS Google Scholar
Finn, R., Mistry, J., Tate, J., et al. (2009) The Pfam protein families database. Nucleic Acids Res 36, 281–288.
Article Google Scholar
Letunic, I., Doerks, T., Bork, P. (2008) SMART 6: recent updates and new developments. Nucleic Acids Res 1, 4.
Google Scholar
Ashburner, M., Ball, C., Blake, J., et al. (2000) Gene Ontology: tool for the unification of biology. Nat Genet 25, 25–29.
Article PubMed CAS Google Scholar
Edwards, R., Davey, N., Shields, D. (2007) SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins. PloS One 2, e967.
Article PubMed Google Scholar
Neduva, V., Linding, R., Su-Angrand, I., et al. (2005) Systematic discovery of new recognition peptides mediating protein interaction networks. PLoS Biol 3, 2090.
Article CAS Google Scholar
Mészáros B, Simon, I., Dosztányi Z (2009) Prediction of protein binding regions in disordered proteins. PLoS Comput Biol 5, 5.
Article Google Scholar
Edwards, R., Davey, N., Shields, D. (2008) CompariMotif: quick and easy comparisons of sequence motifs. Bioinformatics 24, 1307.
Article PubMed CAS Google Scholar
Chica, C., Labarga, A., Gould, C., et al. (2008) A tree-based conservation scoring method for short linear motifs in multiple alignments of protein sequences. BMC Bioinformatics 9, 229.
Article PubMed Google Scholar
Dinkel, H., Sticht, H. (2007) A computational strategy for the prediction of functional linear peptide motifs in proteins. Bioinformatics 23, 3297.
Article PubMed CAS Google Scholar
Petsalaki, E., Stark, A., García-Urdiales, E., Russell, R. (2009) Accurate prediction of peptide binding sites on protein surfaces. PLoS Comput Biol 5, e1000335.
Article PubMed Google Scholar
Michael, S., Trave, G., Ramu, C., et al. (2008) Discovery of candidate KEN-box motifs using cell cycle keyword enrichment combined with native disorder prediction and motif conservation. Bioinformatics 24, 453.
Article PubMed CAS Google Scholar
Diella, F., Chabanis, S., Luck, K., et al. (2009) KEPE—a motif frequently superimposed on sumoylation sites in metazoan chromatin proteins and transcription factors. Bioinformatics 25, 1.
Article PubMed CAS Google Scholar
Copley, R. (2005) The EH 1 motif in metazoan transcription factors. BMC Genomics 6, 169.
Article PubMed Google Scholar
Davey, N., Edwards, R., Shields, D. (2010) Computational identification and analysis of protein short linear motifs. Front Biosci 15, 801–825.
Article PubMed CAS Google Scholar

Download references

Acknowledgements

C.M. is supported by Science Foundation Ireland (SFI) grant 08/IN.1/B1864. ND is supported by an EMBL Interdisciplinary Postdoc (EIPOD) fellowship. CM, GP, IW and AJMM were partly supported by SFI grant 05/RFP/CMS0029, grant RP/2005/219 from the Health Research Board of Ireland, a UCD President’s Award 2004 and UCD Seed Funding 2009 award SF371.

Author information

Authors and Affiliations

Complex and Adaptive Systems Laboratory, University College Dublin, Belfield, Dublin 4, Ireland
Catherine Mooney, Alberto J.M. Martin, Ian Walsh, Denis C. Shields & Gianluca Pollastri
Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland
Catherine Mooney & Denis C. Shields
School of Medicine and Medical Science, University College Dublin, Belfield, Dublin 4, Ireland
Catherine Mooney & Denis C. Shields
EMBL Structural and Computational Biology Unit, 69117, Heidelberg, Germany
Norman Davey
Biocomputing UP, Department of Biology, University of Padua, I-35131, Padova, Italy
Alberto J.M. Martin & Ian Walsh
School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland
Alberto J.M. Martin, Ian Walsh & Gianluca Pollastri

Authors

Catherine Mooney
View author publications
You can also search for this author in PubMed Google Scholar
Norman Davey
View author publications
You can also search for this author in PubMed Google Scholar
Alberto J.M. Martin
View author publications
You can also search for this author in PubMed Google Scholar
Ian Walsh
View author publications
You can also search for this author in PubMed Google Scholar
Denis C. Shields
View author publications
You can also search for this author in PubMed Google Scholar
Gianluca Pollastri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Catherine Mooney .

Editor information

Editors and Affiliations

Royal Prince Alfred Hospital, Dept. of Molecular & Clinical Genetics, University of Sydney, Missenden Road, Camperdown, 2050, New South Wales, Australia
Bing Yu
Royal Prince Alfred Hospital, Dept. Molecular & Clinical Genetics, University of Sydney, Missenden Road, Camperdown, 2050, New South Wales, Australia
Marcus Hinchcliffe

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Mooney, C., Davey, N., Martin, A.J., Walsh, I., Shields, D.C., Pollastri, G. (2011). In Silico Protein Motif Discovery and Structural Analysis. In: Yu, B., Hinchcliffe, M. (eds) In Silico Tools for Gene Discovery. Methods in Molecular Biology, vol 760. Humana Press. https://doi.org/10.1007/978-1-61779-176-5_21

Download citation

DOI: https://doi.org/10.1007/978-1-61779-176-5_21
Published: 30 June 2011
Publisher Name: Humana Press
Print ISBN: 978-1-61779-175-8
Online ISBN: 978-1-61779-176-5
eBook Packages: Springer Protocols

Publish with us

Policies and ethics