Skip to main content

Target Selection for Structural Genomics: An Overview

  • Protocol
Structural Proteomics

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 426))

The success of the whole genome sequencing projects brought considerable credence to the belief that high-throughput approaches, rather than traditional hypothesis-driven research, would be essential to structurally and functionally annotate the rapid growth in available sequence data within a reasonable time frame. Such observations supported the emerging field of structural genomics, which is now faced with the task of providing a library of protein structures that represent the biological diversity of the protein universe. To run efficiently, structural genomics projects aim to define a set of targets that maximize the potential of each structure discovery whether it represents a novel structure, novel function, or missing evolutionary link. However, not all protein sequences make suitable structural genomics targets: It takes considerably more effort to determine the structure of a protein than the sequence of its gene because of the increased complexity of the methods involved and also because the behavior of targeted proteins can be extremely variable at the different stages in the structural genomics “pipeline.” Therefore, structural genomics target selection must identify and prioritize the most suitable candidate proteins for structure determination, avoiding “problematic” proteins while also ensuring the ultimate goals of the project are followed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bourne, P. E., Westbrook, J., and Berman, H. M. (2004) The Protein Data Bank and lessons in data management. Brief. Bioinform. 5, 23–30.

    Article  CAS  PubMed  Google Scholar 

  2. Airlie Agreement (2001) http://www.nigms.nih.gov/news/meetings/airlie.html

  3. Baker D., and Sali A. (2001) Protein structure prediction and structural genomics. Science 294, 93–96.

    Article  CAS  PubMed  Google Scholar 

  4. Brenner, S. E., and Levitt, M. (2000) Expectations from structural genomics. Protein Sci. 9, 197–200.

    Article  CAS  PubMed  Google Scholar 

  5. Chandonia, J. M., Earnest, T. N., and Brenner, S. E. (2004) Structural genomics and structural biology: compare and contrast. Genome Biol. 5, 343.

    Article  PubMed  Google Scholar 

  6. Todd, A. E., Marsden, R. L., Thornton, J. M., and Orengo, C. A. (2005) Progress of structural genomics initiatives: an analysis of solved target structures. J. Mol. Biol. 348, 1235–1260.

    Article  CAS  PubMed  Google Scholar 

  7. Bray, J. E., Marsden, R. L., Rison, S. C., Savchenko, A., Edwards, A. M., Thornton, J. M., and Orengo, C. A. (2004) A practical and robust sequence search strategy for structural genomics target selection. Bioinformatics 20, 2288–2295.

    Article  CAS  PubMed  Google Scholar 

  8. Marsden, B. D., Sundstrom, M., and Knapp, S. (2006) High-throughput structural characterization of therapeutic protein targets. Expert Opin. Drug Disc. 1, 123–136.

    Article  CAS  Google Scholar 

  9. Bravo, J., and Aloy, P. (2006) Target selection for complex structural genomics. Curr. Opin. Struct. Biol. 16, 385–392.

    Article  CAS  PubMed  Google Scholar 

  10. Murzin, A. G., Brenner, S. E., Hubbard, T., and Chothia, C. (2000) SCOP: a structural classification of proteins for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540.

    Google Scholar 

  11. Orengo, C. A., Mitchie, A. D., Jones, S., Jones, D. T., Swindells, M. B., and Thornton, J. M. (1997) CATH—a hierarchical classification of protein domain structures. Structure 5, 1093–1108.

    Article  CAS  PubMed  Google Scholar 

  12. Grant, A., Lee, D., and Orengo, C. (2004) Progress towards mapping the universe of protein folds. Genome Biol. 5, 107.

    Article  PubMed  Google Scholar 

  13. Harrison, A., Pearl, F., Mott, R., Thornton, J., and Orengo, C. (2002) Quantifying the similarities within fold space. J. Mol. Biol. 323, 909–926.

    Article  CAS  PubMed  Google Scholar 

  14. Orengo, C. A., Jones, D. T., and Thornton, J. M. (1994) Protein superfamilies and domain superfolds. Nature 372, 631–634.

    Article  CAS  PubMed  Google Scholar 

  15. Todd, A. E., Orengo, C. A., and Thornton, J. M. (2002) Sequence and structural differences between enzyme and nonenzyme homologs. Structure 10, 1435–1451.

    Article  CAS  PubMed  Google Scholar 

  16. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.

    Article  CAS  PubMed  Google Scholar 

  17. Eddy, S. R. (1996) Hidden Markov models. Curr. Opin. Struct. Biol. 6, 361–365.

    Article  CAS  PubMed  Google Scholar 

  18. Finn, R. D., Mistry, J., Schuster-Bockler, B., Griffiths-Jones, S., Hollich, V., Lassmann, T., Moxon, S., Marshall, M., Khanna, A., Durbin, R., Eddy, S. R., Sonnhammer, E. L., and Bateman, A. (2006) Pfam: clans, web tools and services. Nucleic Acids Res. 34, D247–251.

    Article  CAS  PubMed  Google Scholar 

  19. Letunic, I., Copley, R. R., Pils, B., Pinkert, S., Schultz, J., and Bork, P. (2006) SMART 5: domains in the context of genomes and networks. Nucleic Acids Res. 34, D257–260.

    Article  CAS  PubMed  Google Scholar 

  20. tigr fam protein families: http://www.tigr.org/TIGRFAMs

  21. Friedberg, I., Jaroszewski, L., Ye, Y., and Godzik, A. (2004) The interplay of fold recognition and experimental structure determination in structural genomics. Curr. Opin. Struct. Biol. 14, 307–312.

    Article  CAS  PubMed  Google Scholar 

  22. Vitkup, D., Melamud, E., Moult, J., and Sander, C. (2001) Completeness in structural genomics. Nat. Struct. Biol. 8, 559–566.

    Article  CAS  PubMed  Google Scholar 

  23. Marsden, R. L., Lee, D., Maibaum, M., Yeats, C., and Orengo, C. A. (2006) Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space. Nucleic Acids Res. 34, 1066–1080.

    Article  CAS  PubMed  Google Scholar 

  24. Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., and Wheeler, D. L. (2006) GenBank. Nucleic Acids Res. 34, D16–20.

    Article  CAS  PubMed  Google Scholar 

  25. Savchenko, A., Yee, A., Khachatryan, A., Skarina, T., Evdokimova, E., Pavlova, M., Semesi, A., Northey, J., Beasley, S., Lan, N., Das, R., Gerstein, M., Arrowmith, C. H., and Edwards, A. M. (2003) Strategies for structural proteomics of prokaryotes: quantifying the advantages of studying orthologous proteins and of using both NMR and X-ray crystallography approaches. Proteins 50, 392–329.

    Article  CAS  PubMed  Google Scholar 

  26. Needleman, S., and Wunsch, C. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453.

    Article  CAS  PubMed  Google Scholar 

  27. Smith, T., and Waterman, M. (1981) Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197.

    Article  CAS  PubMed  Google Scholar 

  28. Sander, C., and Schneider, R. (1991) Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 9, 56–68.

    Article  CAS  PubMed  Google Scholar 

  29. Doolittle, R. F. (1986) Of URFs and ORFs: a primer on how to analyze derived amino acid sequences. University Science Books, Mill Valley, California.

    Google Scholar 

  30. Rost, B. (1997). Protein structures sustain evolutionary drift. Folding and Design 2, S19–S24.

    Article  CAS  PubMed  Google Scholar 

  31. Smith, C. V., and Sacchettini, J. C. (2003) Mycobacterium tuberculosis: a model system for structural genomics. Curr. Opin. Struct. Biol. 13, 658–664.

    Article  CAS  PubMed  Google Scholar 

  32. Riley, M. L., Schmidt, T., Wagner, C., Mewes, H. W., and Frishman, D. (2005) The PEDANT genome database in 2005. Nucleic Acids Res. 33, D308–310.

    Article  CAS  PubMed  Google Scholar 

  33. Yeats, C., Maibaum, M., Marsden, R., Dibley, M., Lee, D., Addou, S., and Orengo, C. A. (2006) Gene3D: modeling protein structure, function and evolution. Nucleic Acids Res. 34, D281–284.

    Article  CAS  PubMed  Google Scholar 

  34. The Gene Ontology Consortium. (2000) Gene ontology: tool for the unification of biology. Nature Genet. 25, 25–29.

    Article  Google Scholar 

  35. Kanehisa, M., and Goto, S. (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30.

    Article  CAS  PubMed  Google Scholar 

  36. Bairoch, A. (2000) The ENZYME database in 2000. Nucleic Acids Res. 28, 304–305.

    Article  CAS  PubMed  Google Scholar 

  37. Xie, L., and Bourne P. E. (2005) Functional coverage of the human genome by existing structures, structural genomics targets, and homology models. PLoS Comput. Biol. 1, e31.

    Article  PubMed  Google Scholar 

  38. Russell, R. B., and Eggleston, D. S. (2000) New roles for structure in biology and drug discovery. Nat. Struct. Biol. 7, 928–930.

    Article  CAS  PubMed  Google Scholar 

  39. Goh, C. S., Lan, N., Douglas, S. M., Wu, B., Echols, N., Smith, A., Milburn, D., Montelione, G. T., Zhao, H., and Gerstein, M. (2004) Mining the structural genomics pipeline: identification of protein properties that affect high-throughput experimental analysis. J. Mol. Biol. 336, 115–130.

    Article  CAS  PubMed  Google Scholar 

  40. Gruber, M., Soding, J., and Lupas, A. N. (2006) Comparative analysis of coiled-coil prediction methods. J. Struct. Biol. 155, 140–145.

    Article  CAS  PubMed  Google Scholar 

  41. Wolf, E., Kim, P. S., and Berger, B. (1997) MultiCoil: a program for predicting two- and three-stranded coiled coils. Protein Sci. 6, 1179–1189.

    Article  CAS  PubMed  Google Scholar 

  42. Bryson, K., McGuffin, L. J., Marsden, R. L., Ward, J. J., Sodhi, J. S., and Jones, D. T. (2005) Protein structure prediction servers at University College London. Nucleic Acids Res. 33, W36–38.

    Article  CAS  PubMed  Google Scholar 

  43. Krogh, A., Larsson, B., von Heijne, G., and Sonnhammer, E. L. (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305, 567–580.

    Article  CAS  PubMed  Google Scholar 

  44. Bigelow, H., and Rost, B. (2006) PROFtmb: a web server for predicting bacterial transmembrane beta barrel proteins. Nucleic Acids Res. 34, W186–188.

    Article  CAS  PubMed  Google Scholar 

  45. Bendtsen, J. D., Nielsen, H., von Heijne, G., and Brunak, S. (2004) Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340, 783–795.

    Article  PubMed  Google Scholar 

  46. Wootton, J. C., and Federhen, S. (1996) Analysis of compositionally biased regions in sequence databases. Methods Enzymol. 266, 554–571.

    Article  CAS  PubMed  Google Scholar 

  47. Promponas, V. J., Enright, A. J., Tsoka, S., Kreil, D. P., Leroy, C., Hamodrakas, S., Sander, C., and Ouzounis, C. A. (2000) CAST: an iterative algorithm for the complexity analysis of sequence tracts. Complexity analysis of sequence tracts. Bioinformatics 16, 915–922.

    Article  CAS  PubMed  Google Scholar 

  48. Linding, R., Jensen, L. J., Diella, F., Bork, P., Gibson, T. J., and Russell, R. B. (2003) Protein disorder prediction: implications for structural proteomics. Structure 11, 1453–1459.

    Article  CAS  PubMed  Google Scholar 

  49. Pantazatos, D., Kim, J. S., Klock, H. E., Stevens, R. C., Wilson, I. A., Lesely, S. A., and Woods, V. L. (2004) On the use of DXMS to produce more crystallizable proteins: structures of the T. maritima proteins TM0160 and TM1171. Proc. Natl. Acad. Sci. USA 101, 751–756.

    Article  CAS  PubMed  Google Scholar 

  50. Sarachu, M., and Colet, M. (2005) wEMBOSS: a web interface for EMBOSS. Bioinformatics 21, 540–541.

    Article  CAS  PubMed  Google Scholar 

  51. Gasteiger, E., Gattiker, A., Hoogland, C., Ivanyi, I., Appel, R. D., and Bairoch A. (2003) ExPASy: the proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res. 31, 3784–3788.

    Article  CAS  PubMed  Google Scholar 

  52. Rost, B., Yachdav, G., and Liu, J. (2003) The PredictProtein Server. Nucleic Acids Res. 32, W321–W326.

    Article  Google Scholar 

  53. Canaves, J. M., Page, R., Wilson, I. A., and Stevens, R. C. (2004) Protein biophysical properties that correlate with crystallization success in Thermotoga maritima: maximum clustering strategy for structural genomics. J. Mol. Biol. 344, 977–991.

    Article  CAS  PubMed  Google Scholar 

  54. Zdobnov, E. M., and Apweiler, R. (2001) InterProScan—an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17, 847–848.

    Article  CAS  PubMed  Google Scholar 

  55. Chen, L., Oughtred, R., Berman, H. M., and Westbrook, J. (2004) TargetDB: a target registration database for structural genomics projects. Bioinformatics 20, 2860–2862.

    Article  CAS  PubMed  Google Scholar 

  56. Task Force on Target Tracking (2001) http://www.nigms.nih.gov/news/reports/airlie_tasks.html

  57. Chandonia, J. M., and Brenner, S. E. (2006) The impact of structural genomics: expectations and outcomes. Science 311, 347–351.

    Article  CAS  PubMed  Google Scholar 

  58. Pellegrini, M., Haynor, D., and Johnson, J. M. (2004) Protein interaction networks. Expert Rev. Proteomics 1, 239–249.

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Humana Press, a part of Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Marsden, R.L., Orengo, C.A. (2008). Target Selection for Structural Genomics: An Overview. In: Kobe, B., Guss, M., Huber, T. (eds) Structural Proteomics. Methods in Molecular Biology™, vol 426. Humana Press. https://doi.org/10.1007/978-1-60327-058-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-1-60327-058-8_1

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-809-6

  • Online ISBN: 978-1-60327-058-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics