Structural Genomics of Minimal Organisms: Pipeline and Results

  • Sung-Hou Kim
  • Dong-Hae Shin
  • Rosalind Kim
  • Paul Adams
  • John-Marc Chandonia
Part of the Methods in Molecular Biology™ book series (MIMB, volume 426)

The initial objective of the Berkeley Structural Genomics Center was to obtain a near complete three-dimensional (3D) structural information of all soluble proteins of two minimal organisms, closely related pathogens Mycoplasma genitalium and M. pneumoniae. The former has fewer than 500 genes and the latter has fewer than 700 genes. A semiautomated structural genomics pipeline was set up from target selection, cloning, expression, purification, and ultimately structural determination. At the time of this writing, structural information of more than 93% of all soluble proteins of M. genitalium is avail able. This chapter summarizes the approaches taken by the authors' center.


Dynamic Light Scattering Protein Data Bank Coiled Coil Ureaplasma Urealyticum Mycoplasma Genitalium 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work is supported by grants from the NIH (1-P50-GM62412 and 1-R01-GM073109). The authors are grateful to a large number of colleagues who participated in various aspects of BSGC's PSI-1 program, such as high throughput cloning (H. Yokota and B. Gold) and expression (M. Henriquez and B. Martinez), large-scale production and characterization of proteins (C. Huang, Y. Lou, N. Oganesyan, and A. DeGiovanni), crystallization (J. Jancarik, I. Ankoudinova, and H. Hyun) and structure determination (D. Das, J. Liu, V. Oganesyan, and Q. Xian), and structural space mapping (S. Jun, G. Sims, J. Hou, and I.-G. Choi), with the guidance of S. Brenner, D. Wemmer, T. Earnest, D. McKay, and C. Hutchison, Jr.


  1. 1.
    Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N., and Bourne, P. E. (2000) The Protein Data Bank. Nucleic Acids Res. 28, 235–242.CrossRefPubMedGoogle Scholar
  2. 2.
    Hou, J., Sims, G. E., Zhang, C., and Kim, S. -H. (2003) A global representation of the protein fold space. Proc. Natl. Acad. Sci. USA 100, 2386–2390.CrossRefPubMedGoogle Scholar
  3. 3.
    Hou, J., Jun, S.-R., Zhang, C., and Kim, S.-H. (2005). Global mapping of the protein structure space and application in structure-based inference of protein function. Proc. Natl. Acad. Sci. U.S.A. 102, 3651–3656.CrossRefPubMedGoogle Scholar
  4. 4.
    Himmelreich, R., Hilbert, H., Plagens, H., Pirkl, E., Li, B. C., and Herrmann, R. (1996) Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae. Nucleic Acids Res. 24, 4420–4449.CrossRefPubMedGoogle Scholar
  5. 5.
    Chen, L., Oughtred, R., Berman, H. M., and Westbrook, J. (2004) TargetDB: a target registration database for structural genomics projects. Bioinformatics 20, 2860–2862.CrossRefPubMedGoogle Scholar
  6. 6.
    Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M. C., Estreicher, A., Gasteiger, E., Martin, M. J., Michoud, K., O'Donovan, C., Phan, I., Pilbout, S., and Schneider, M. (2003) The SWISS-PROT protein knowledge base and its supplement TrEMBL in 2003. Nucleic Acids Res. 31, 365–370.CrossRefPubMedGoogle Scholar
  7. 7.
    Wootton, J. C. (1994) Nonglobular domains in protein sequences: automated segmentation using complexity measures. Comput. Chem. 18, 269–285.CrossRefPubMedGoogle Scholar
  8. 8.
    Jones, D. T., and Swindells, M. B. (2002) Getting the most from PSI—BLAST. Trends Biochem. Sci. 27, 161–164.CrossRefPubMedGoogle Scholar
  9. 9.
    Schaffer, A. A., Aravind, L., Madden, T. L., Shavirin, S., Spouge, J. L., Wolf, Y. I., Koonin, E. V., and Altschul, S. F. (2001) Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res. 29, 2994–3005.CrossRefPubMedGoogle Scholar
  10. 10.
    Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search program. Nucleic Acids Res. 25, 3389–3402.CrossRefPubMedGoogle Scholar
  11. 11.
    Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.PubMedGoogle Scholar
  12. 12.
    Bateman, A., Coin, L., Durbin, R., Finn, R. D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E. L., Studholme, D. J., Yeats, C., and Eddy, S. R. (2004) The Pfam protein families database. Nucleic Acids Res. 32, D138–141.CrossRefPubMedGoogle Scholar
  13. 13.
    Eddy, S. R. (1998) Profile hidden Markov models. Bioinformatics 14, 755–763.CrossRefPubMedGoogle Scholar
  14. 14.
    Lupas, A. (1996) Prediction and analysis of coiled-coil structures. Methods Enzymol. 266, 513–525.CrossRefPubMedGoogle Scholar
  15. 15.
    Krogh, A., Larsson, B., von Heijne, G., and Sonnhammer, E. L. (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305, 567–580.CrossRefPubMedGoogle Scholar
  16. 16.
    Rost, B., Casadio, R., Fariselli, P., and Sander, C. (1995) Transmembrane helices predicted at 95% accuracy. Protein Sci. 4, 521–533.CrossRefPubMedGoogle Scholar
  17. 17.
    Chandonia, J. M., Kim, S. H., and Brenner, S. E. (2005) Target selection and deselection at the Berkeley Structural Genomics Center. Proteins 62, 356–370.CrossRefGoogle Scholar
  18. 18.
    Aslanidis, C., and De Jong, P. J. (1990). Ligation-independent cloning of PCR products (LIC-PCR). Nucleic Acids Res. 20, 6069–6074.CrossRefGoogle Scholar
  19. 19.
    Studier, W. (2005) Protein production by auto-induction in high density shaking cultures. Protein Expr. Purif. 41, 207–234.CrossRefPubMedGoogle Scholar
  20. 20.
    Nguyen, H., Martinez, B., Oganesyan, N., and Kim, R. (2004) An automated small-scale protein expression and purification screening provides beneficial information for protein production. J. Struct. Funct. Genom. 5, 23–27.CrossRefGoogle Scholar
  21. 21.
    Sachdev, D., and Chirgwin, J. M. (1998) Solubility of proteins isolated from inclusion bodies is enhanced by fusion to maltose-binding protein or thioredoxin. Protein Express. Purif. 12, 122–132.CrossRefGoogle Scholar
  22. 22.
    Harrison, S. C. (2004) Whither structural biology? Nat. Struct. Mol. Biol. 11, 12–15.CrossRefPubMedGoogle Scholar
  23. 23.
    Kempf, B., and Bremer, E. (1998) Uptake and synthesis of compatible solutes as microbial stress responses to high-osmolality environments. Arch. Microbiol. 170, 319–330.CrossRefPubMedGoogle Scholar
  24. 24.
    Bukau, B., and Horwich, A. L. (1998) The Hsp70 and Hsp60 chaperone machines. Cell 92, 351–366.CrossRefPubMedGoogle Scholar
  25. 25.
    Chen, J., Acton, T. B., Basu, S. K., Montelione, G. T., and Inouye, M. (2002) Enhancement of the solubility of proteins overexpressed in Escherichia coli by heat shock. J. Mol. Microbiol. Biotech. 4, 519–524.Google Scholar
  26. 26.
    Samuel D., Kumar, T. K., Ganesh, G., Jayaraman, G., Yang, P. W., Chang, M. M., Trivedi, V. D., Wang, S. L., Hwang, K. C., and Chang, D. K., and Yu, C. (2000) Proline inhibits aggregation during protein refolding. Protein Sci. 9, 344–352.CrossRefPubMedGoogle Scholar
  27. 27.
    Yang, D. S., Yip, C. M., Huang, T. H., Chakrabartty, A., and Fraser, P. E. (1999) Manipulating the amyloid-beta aggregation pathway with chemical chaperones. J. Biol. Chem. 274, 32970–32974.CrossRefPubMedGoogle Scholar
  28. 28.
    Voziyan, P. A., and Fisher, M. T. (2000) Chaperonin-assisted folding of glutamine synthetase under nonpermissive conditions: off-pathway aggregation propensity does not determine the co-chaperonin requirement. Protein Sci. 9, 2405–2412.CrossRefPubMedGoogle Scholar
  29. 29.
    Diamant, S., Eliahu, N., Rosenthal, D., and Goloubinoff, P. (2001) Chemical chaperones regulate molecular chaperones in vitro and in cells under combined salt and heat stresses. J. Biol. Chem. 276, 39586–39591.CrossRefPubMedGoogle Scholar
  30. 30.
    Oganesyan, N., Ankoudinova, I., Kim, S.-H., and Kim, R. (2007) Effect of osmotic stress and heat shock in recombinant protein overexpression and crystallization. Protein Express. Purif. 52(2), 280–285.CrossRefGoogle Scholar
  31. 31.
    Das, D., Oganesyan, N., Yokota, H., Pufan, R., Kim, R., and Kim, S.-H. (2004) Crystal structure of the conserved hypothetical protein MPN330 (GI: 1674200) from Mycoplasma pneumoniae. Proteins Struc. Func. Bioinf. 58, 504–508.CrossRefGoogle Scholar
  32. 32.
    Oganesyan, N., Kim, S.—H., and Kim, R. (2004) On-column chemical refolding of proteins. PharmaGenomics 4, 22–26.Google Scholar
  33. 33.
    Rozema, D., and Gellman, S.H. (1996) Artificial chaperone-assisted refolding of denatured-renatured lysozyme: modulation of the competition between renaturation and aggregation. Biochemistry 35, 15760–15771.CrossRefPubMedGoogle Scholar
  34. 34.
    Daugherty, D. L., Rozema, D., Hanson, P. E., and Gellman, S. H. (1998) Artificial chaperone-assisted refolding of citrate synthase. J. Biol. Chem. 273, 33961–33971.CrossRefPubMedGoogle Scholar
  35. 35.
    Lepre, C. A., and Moore, J. M. (1998) Microdrop screening: A rapid method to optimize solvent conditions for NMR spectroscopy of proteins. J. Biomol. NMR 12, 493–499.CrossRefPubMedGoogle Scholar
  36. 36.
    Jancarik, J., Pufan, R., Hong, C., Kim, R., Kim, S.—H. (2004) Optimum Solubility (OS) Screening: an efficient method to optimize buffer conditions for homogeneity and crystallization of proteins. Acta Cryst. D60, 1670–1673.Google Scholar
  37. 37.
    Jancarik, J. and Kim, S. H. (1991) Sparse matrix sampling: a screening method for crystallization of proteins. J. Appl. Cryst. 2, 409–411.CrossRefGoogle Scholar
  38. 38.
    Grosse-Kunstleve, R. W., and Adams, P. D. (2003) Substructure search procedures for macromolecular structures. Acta Cryst. D59, 1966–1973.Google Scholar
  39. 39.
    Terwilliger, T. C., and Berendzen, J. (1999) Automated MAD and MIR structure solution. Acta Crystallogr. D Biol. Crystallogr. 55, 849–861.CrossRefPubMedGoogle Scholar
  40. 40.
    Collaborative Computational Project, Number 4 (1994) The CCP4 suite: programs for protein crystallography. Acta Crystallogr. D Biol. Crystallogr. 50, 760–763.CrossRefGoogle Scholar
  41. 41.
    Brunger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., GrosseKunstleve, R. W., Jiang, J. S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T., and Warren, G. L. (1998) Crystallography' & NMR system: a new software suite for macromolecular structure determination. Acta Cryst. D54, 905–921.Google Scholar
  42. 42.
    de La Fortelle, E., and Bricogne, G. (1997) Maximum-likelihood heavy-atom parameter refinement in the MIR and MAD methods. Methods Enzymol. 276, 472–494.CrossRefGoogle Scholar
  43. 43.
    Cowtan, K. (1999) Error estimation and bias correction in phase-improvement calculations. Acta Cryst. D55, 1555–1567.Google Scholar
  44. 44.
    Terwilliger, T. C. (2000) Maximum likelihood density modification. Acta Cryst. D56, 965–972.Google Scholar
  45. 45.
    Perrakis, A., Morris, R., and Lamzin, V. S. (1999) Automated protein model building combined with iterative structure refinement. Nat. Struct. Biol. 6, 458–463.CrossRefPubMedGoogle Scholar
  46. 46.
    Murshudov, G. N., Vagin, A. A., and Dodson, E. J. (1997) Refinement of macromolecular structures by the maximum-likelihood method. Acta Cryst. D53, 240–255.Google Scholar
  47. 47.
    Kim, S. H., Shin, D. H., Choi, I. G., Schulze-Gahmen, U., Chen, S., and Kim, R. (2003) Structure-based functional inference in structural genomics. J. Struct. Funct. Genom. 4, 129–135.CrossRefGoogle Scholar
  48. 48.
    Kim, S.-H., Shin, D. H., Liu, J., Oganesyan, V., Chen, S., Xu, Q. S., Kim, J.-S., Das, D., Schulze-Gahmen, U., Holbrook, S. R., Holbrook, E. L., Martinez, B. A., Oganesyan, N., DeGiovanni, A., Lou, Y., Henriquez, M., Huang, C., Jancarik, J., Pufan, R., Choi, I.-C., Chandonia, J.-M., Hou, J., Gold, B., Yokota, H., Brenner, S. E., Adams, P. A., and Kim, R. (2005) Structural genomics of minimal organisms and protein fold space. J. Struct. Funct. Genomics. 6, 63–70.CrossRefPubMedGoogle Scholar
  49. 49.
    Murzin, A. G., Brenner, S. E., Hubbard, T., and Chothia, C. (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540.PubMedGoogle Scholar
  50. 50.
    Chandonia, J. M., and Kim, S. H. (2006) Structural proteomics of minimal organisms: conservation of protein fold usage and evolutionary implications BMC Struct. Biol. 6, 7–22.CrossRefPubMedGoogle Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Sung-Hou Kim
    • 1
  • Dong-Hae Shin
    • 1
  • Rosalind Kim
    • 1
  • Paul Adams
    • 1
  • John-Marc Chandonia
    • 1
  1. 1.Berkeley Structural Genomics Center, Lawrence Berkeley National Laboratory, and Department of ChemistryUniversity of CaliforniaBerkeleyUSA

Personalised recommendations