Structural Proteomics pp 3-25

Part of the Methods in Molecular Biology™ book series (MIMB, volume 426)

Target Selection for Structural Genomics: An Overview

  • Russell L. Marsden
  • Christine A. Orengo

The success of the whole genome sequencing projects brought considerable credence to the belief that high-throughput approaches, rather than traditional hypothesis-driven research, would be essential to structurally and functionally annotate the rapid growth in available sequence data within a reasonable time frame. Such observations supported the emerging field of structural genomics, which is now faced with the task of providing a library of protein structures that represent the biological diversity of the protein universe. To run efficiently, structural genomics projects aim to define a set of targets that maximize the potential of each structure discovery whether it represents a novel structure, novel function, or missing evolutionary link. However, not all protein sequences make suitable structural genomics targets: It takes considerably more effort to determine the structure of a protein than the sequence of its gene because of the increased complexity of the methods involved and also because the behavior of targeted proteins can be extremely variable at the different stages in the structural genomics “pipeline.” Therefore, structural genomics target selection must identify and prioritize the most suitable candidate proteins for structure determination, avoiding “problematic” proteins while also ensuring the ultimate goals of the project are followed.


  1. 1.
    Bourne, P. E., Westbrook, J., and Berman, H. M. (2004) The Protein Data Bank and lessons in data management. Brief. Bioinform. 5, 23–30.CrossRefPubMedGoogle Scholar
  2. 2.
  3. 3.
    Baker D., and Sali A. (2001) Protein structure prediction and structural genomics. Science 294, 93–96.CrossRefPubMedGoogle Scholar
  4. 4.
    Brenner, S. E., and Levitt, M. (2000) Expectations from structural genomics. Protein Sci. 9, 197–200.CrossRefPubMedGoogle Scholar
  5. 5.
    Chandonia, J. M., Earnest, T. N., and Brenner, S. E. (2004) Structural genomics and structural biology: compare and contrast. Genome Biol. 5, 343.CrossRefPubMedGoogle Scholar
  6. 6.
    Todd, A. E., Marsden, R. L., Thornton, J. M., and Orengo, C. A. (2005) Progress of structural genomics initiatives: an analysis of solved target structures. J. Mol. Biol. 348, 1235–1260.CrossRefPubMedGoogle Scholar
  7. 7.
    Bray, J. E., Marsden, R. L., Rison, S. C., Savchenko, A., Edwards, A. M., Thornton, J. M., and Orengo, C. A. (2004) A practical and robust sequence search strategy for structural genomics target selection. Bioinformatics 20, 2288–2295.CrossRefPubMedGoogle Scholar
  8. 8.
    Marsden, B. D., Sundstrom, M., and Knapp, S. (2006) High-throughput structural characterization of therapeutic protein targets. Expert Opin. Drug Disc. 1, 123–136.CrossRefGoogle Scholar
  9. 9.
    Bravo, J., and Aloy, P. (2006) Target selection for complex structural genomics. Curr. Opin. Struct. Biol. 16, 385–392.CrossRefPubMedGoogle Scholar
  10. 10.
    Murzin, A. G., Brenner, S. E., Hubbard, T., and Chothia, C. (2000) SCOP: a structural classification of proteins for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540.Google Scholar
  11. 11.
    Orengo, C. A., Mitchie, A. D., Jones, S., Jones, D. T., Swindells, M. B., and Thornton, J. M. (1997) CATH—a hierarchical classification of protein domain structures. Structure 5, 1093–1108.CrossRefPubMedGoogle Scholar
  12. 12.
    Grant, A., Lee, D., and Orengo, C. (2004) Progress towards mapping the universe of protein folds. Genome Biol. 5, 107.CrossRefPubMedGoogle Scholar
  13. 13.
    Harrison, A., Pearl, F., Mott, R., Thornton, J., and Orengo, C. (2002) Quantifying the similarities within fold space. J. Mol. Biol. 323, 909–926.CrossRefPubMedGoogle Scholar
  14. 14.
    Orengo, C. A., Jones, D. T., and Thornton, J. M. (1994) Protein superfamilies and domain superfolds. Nature 372, 631–634.CrossRefPubMedGoogle Scholar
  15. 15.
    Todd, A. E., Orengo, C. A., and Thornton, J. M. (2002) Sequence and structural differences between enzyme and nonenzyme homologs. Structure 10, 1435–1451.CrossRefPubMedGoogle Scholar
  16. 16.
    Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.CrossRefPubMedGoogle Scholar
  17. 17.
    Eddy, S. R. (1996) Hidden Markov models. Curr. Opin. Struct. Biol. 6, 361–365.CrossRefPubMedGoogle Scholar
  18. 18.
    Finn, R. D., Mistry, J., Schuster-Bockler, B., Griffiths-Jones, S., Hollich, V., Lassmann, T., Moxon, S., Marshall, M., Khanna, A., Durbin, R., Eddy, S. R., Sonnhammer, E. L., and Bateman, A. (2006) Pfam: clans, web tools and services. Nucleic Acids Res. 34, D247–251.CrossRefPubMedGoogle Scholar
  19. 19.
    Letunic, I., Copley, R. R., Pils, B., Pinkert, S., Schultz, J., and Bork, P. (2006) SMART 5: domains in the context of genomes and networks. Nucleic Acids Res. 34, D257–260.CrossRefPubMedGoogle Scholar
  20. 20.
    tigr fam protein families:
  21. 21.
    Friedberg, I., Jaroszewski, L., Ye, Y., and Godzik, A. (2004) The interplay of fold recognition and experimental structure determination in structural genomics. Curr. Opin. Struct. Biol. 14, 307–312.CrossRefPubMedGoogle Scholar
  22. 22.
    Vitkup, D., Melamud, E., Moult, J., and Sander, C. (2001) Completeness in structural genomics. Nat. Struct. Biol. 8, 559–566.CrossRefPubMedGoogle Scholar
  23. 23.
    Marsden, R. L., Lee, D., Maibaum, M., Yeats, C., and Orengo, C. A. (2006) Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space. Nucleic Acids Res. 34, 1066–1080.CrossRefPubMedGoogle Scholar
  24. 24.
    Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., and Wheeler, D. L. (2006) GenBank. Nucleic Acids Res. 34, D16–20.CrossRefPubMedGoogle Scholar
  25. 25.
    Savchenko, A., Yee, A., Khachatryan, A., Skarina, T., Evdokimova, E., Pavlova, M., Semesi, A., Northey, J., Beasley, S., Lan, N., Das, R., Gerstein, M., Arrowmith, C. H., and Edwards, A. M. (2003) Strategies for structural proteomics of prokaryotes: quantifying the advantages of studying orthologous proteins and of using both NMR and X-ray crystallography approaches. Proteins 50, 392–329.CrossRefPubMedGoogle Scholar
  26. 26.
    Needleman, S., and Wunsch, C. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453.CrossRefPubMedGoogle Scholar
  27. 27.
    Smith, T., and Waterman, M. (1981) Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197.CrossRefPubMedGoogle Scholar
  28. 28.
    Sander, C., and Schneider, R. (1991) Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 9, 56–68.CrossRefPubMedGoogle Scholar
  29. 29.
    Doolittle, R. F. (1986) Of URFs and ORFs: a primer on how to analyze derived amino acid sequences. University Science Books, Mill Valley, California.Google Scholar
  30. 30.
    Rost, B. (1997). Protein structures sustain evolutionary drift. Folding and Design 2, S19–S24.CrossRefPubMedGoogle Scholar
  31. 31.
    Smith, C. V., and Sacchettini, J. C. (2003) Mycobacterium tuberculosis: a model system for structural genomics. Curr. Opin. Struct. Biol. 13, 658–664.CrossRefPubMedGoogle Scholar
  32. 32.
    Riley, M. L., Schmidt, T., Wagner, C., Mewes, H. W., and Frishman, D. (2005) The PEDANT genome database in 2005. Nucleic Acids Res. 33, D308–310.CrossRefPubMedGoogle Scholar
  33. 33.
    Yeats, C., Maibaum, M., Marsden, R., Dibley, M., Lee, D., Addou, S., and Orengo, C. A. (2006) Gene3D: modeling protein structure, function and evolution. Nucleic Acids Res. 34, D281–284.CrossRefPubMedGoogle Scholar
  34. 34.
    The Gene Ontology Consortium. (2000) Gene ontology: tool for the unification of biology. Nature Genet. 25, 25–29.CrossRefGoogle Scholar
  35. 35.
    Kanehisa, M., and Goto, S. (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30.CrossRefPubMedGoogle Scholar
  36. 36.
    Bairoch, A. (2000) The ENZYME database in 2000. Nucleic Acids Res. 28, 304–305.CrossRefPubMedGoogle Scholar
  37. 37.
    Xie, L., and Bourne P. E. (2005) Functional coverage of the human genome by existing structures, structural genomics targets, and homology models. PLoS Comput. Biol. 1, e31.CrossRefPubMedGoogle Scholar
  38. 38.
    Russell, R. B., and Eggleston, D. S. (2000) New roles for structure in biology and drug discovery. Nat. Struct. Biol. 7, 928–930.CrossRefPubMedGoogle Scholar
  39. 39.
    Goh, C. S., Lan, N., Douglas, S. M., Wu, B., Echols, N., Smith, A., Milburn, D., Montelione, G. T., Zhao, H., and Gerstein, M. (2004) Mining the structural genomics pipeline: identification of protein properties that affect high-throughput experimental analysis. J. Mol. Biol. 336, 115–130.CrossRefPubMedGoogle Scholar
  40. 40.
    Gruber, M., Soding, J., and Lupas, A. N. (2006) Comparative analysis of coiled-coil prediction methods. J. Struct. Biol. 155, 140–145.CrossRefPubMedGoogle Scholar
  41. 41.
    Wolf, E., Kim, P. S., and Berger, B. (1997) MultiCoil: a program for predicting two- and three-stranded coiled coils. Protein Sci. 6, 1179–1189.CrossRefPubMedGoogle Scholar
  42. 42.
    Bryson, K., McGuffin, L. J., Marsden, R. L., Ward, J. J., Sodhi, J. S., and Jones, D. T. (2005) Protein structure prediction servers at University College London. Nucleic Acids Res. 33, W36–38.CrossRefPubMedGoogle Scholar
  43. 43.
    Krogh, A., Larsson, B., von Heijne, G., and Sonnhammer, E. L. (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305, 567–580.CrossRefPubMedGoogle Scholar
  44. 44.
    Bigelow, H., and Rost, B. (2006) PROFtmb: a web server for predicting bacterial transmembrane beta barrel proteins. Nucleic Acids Res. 34, W186–188.CrossRefPubMedGoogle Scholar
  45. 45.
    Bendtsen, J. D., Nielsen, H., von Heijne, G., and Brunak, S. (2004) Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340, 783–795.CrossRefPubMedGoogle Scholar
  46. 46.
    Wootton, J. C., and Federhen, S. (1996) Analysis of compositionally biased regions in sequence databases. Methods Enzymol. 266, 554–571.CrossRefPubMedGoogle Scholar
  47. 47.
    Promponas, V. J., Enright, A. J., Tsoka, S., Kreil, D. P., Leroy, C., Hamodrakas, S., Sander, C., and Ouzounis, C. A. (2000) CAST: an iterative algorithm for the complexity analysis of sequence tracts. Complexity analysis of sequence tracts. Bioinformatics 16, 915–922.CrossRefPubMedGoogle Scholar
  48. 48.
    Linding, R., Jensen, L. J., Diella, F., Bork, P., Gibson, T. J., and Russell, R. B. (2003) Protein disorder prediction: implications for structural proteomics. Structure 11, 1453–1459.CrossRefPubMedGoogle Scholar
  49. 49.
    Pantazatos, D., Kim, J. S., Klock, H. E., Stevens, R. C., Wilson, I. A., Lesely, S. A., and Woods, V. L. (2004) On the use of DXMS to produce more crystallizable proteins: structures of the T. maritima proteins TM0160 and TM1171. Proc. Natl. Acad. Sci. USA 101, 751–756.CrossRefPubMedGoogle Scholar
  50. 50.
    Sarachu, M., and Colet, M. (2005) wEMBOSS: a web interface for EMBOSS. Bioinformatics 21, 540–541.CrossRefPubMedGoogle Scholar
  51. 51.
    Gasteiger, E., Gattiker, A., Hoogland, C., Ivanyi, I., Appel, R. D., and Bairoch A. (2003) ExPASy: the proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res. 31, 3784–3788.CrossRefPubMedGoogle Scholar
  52. 52.
    Rost, B., Yachdav, G., and Liu, J. (2003) The PredictProtein Server. Nucleic Acids Res. 32, W321–W326.CrossRefGoogle Scholar
  53. 53.
    Canaves, J. M., Page, R., Wilson, I. A., and Stevens, R. C. (2004) Protein biophysical properties that correlate with crystallization success in Thermotoga maritima: maximum clustering strategy for structural genomics. J. Mol. Biol. 344, 977–991.CrossRefPubMedGoogle Scholar
  54. 54.
    Zdobnov, E. M., and Apweiler, R. (2001) InterProScan—an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17, 847–848.CrossRefPubMedGoogle Scholar
  55. 55.
    Chen, L., Oughtred, R., Berman, H. M., and Westbrook, J. (2004) TargetDB: a target registration database for structural genomics projects. Bioinformatics 20, 2860–2862.CrossRefPubMedGoogle Scholar
  56. 56.
    Task Force on Target Tracking (2001)
  57. 57.
    Chandonia, J. M., and Brenner, S. E. (2006) The impact of structural genomics: expectations and outcomes. Science 311, 347–351.CrossRefPubMedGoogle Scholar
  58. 58.
    Pellegrini, M., Haynor, D., and Johnson, J. M. (2004) Protein interaction networks. Expert Rev. Proteomics 1, 239–249.CrossRefPubMedGoogle Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Russell L. Marsden
    • 1
  • Christine A. Orengo
    • 1
  1. 1.Biochemistry and Molecular Biology DepartmentUniversity College LondonLondonUK

Personalised recommendations