High-throughput computational structure-based characterization of protein families: START domains and implications for structural genomics

  • Hunjoong Lee
  • Zhaohui Li
  • Antonina Silkov
  • Markus Fischer
  • Donald Petrey
  • Barry Honig
  • Diana Murray


SkyLine, a high-throughput homology modeling pipeline tool, detects and models true sequence homologs to a given protein structure. Structures and models are stored in SkyBase with links to computational function annotation, as calculated by MarkUs. The SkyLine/SkyBase/MarkUs technology represents a novel structure-based approach that is more objective and versatile than other protein classification resources. This structure-centric strategy provides a multi-dimensional organization and coverage of protein space at the levels of family, function, and genome. The concept of “modelability”, the ability to model sequences on related structures, provides a reliable criterion for membership in a protein family (“leverage”) and underlies the unique success of this approach. The overall procedure is illustrated by its application to START domains, which comprise a Biomedical Theme for the Northeast Structural Genomics Consortium as part of the Protein Structure Initiative. START domains are typically involved in the non-vesicular transport of lipids. While 19 experimentally determined structures are available, the family, whose evolutionary hierarchy is not well determined, is highly sequence diverse, and the ligand-binding potential of many family members is unknown. The SkyLine/SkyBase/MarkUs approach provides significant insights and predicts: (1) many more family members (~4,000) than any other resource; (2) the function for a large number of unannotated proteins; (3) instances of START domains in genomes from which they were thought to be absent; and (4) the existence of two types of novel proteins, those containing dual START domain and those containing N-terminal START domains.


Homology modeling Structural genomics Bioinformatics Protein function annotation START domain Arabidopsis thaliana 



Birch antigen




Classical START domain


Metastatic lymph node 64 protein


National Center for Biotechnology Information


Northeast Structural Genomics Consortium


StAR-related lipid transfer


Phosphatidylcholine transfer protein


Protein data bank


A score derived from the log-transformed length-normalized integration over the residue by residue ProsaII structure evaluation profile


Protein structure initiative



B. H. acknowledges the support of National Institutes of Health Grants GM030518, GM074958, and CA121852. D. M. acknowledges the support of National Institutes of Health Grants GM074958 and GM071700 and of National Science Foundation Grant NSF0738311.


  1. 1.
    Schwede T, Sali A et al (2009) Outcome of a workshop on applications of protein models in biomedical research. Structure 17(2):151–159CrossRefPubMedGoogle Scholar
  2. 2.
    Dessailly BH, Nair R et al (2009) PSI-2: structural genomics to cover protein domain family space. Structure 17(6):869–881CrossRefPubMedGoogle Scholar
  3. 3.
    Terwilliger TC, Stuart D, Yokoyama S (2009) Lessons from structural genomics. Annu Rev Biophys 38:371–383CrossRefPubMedGoogle Scholar
  4. 4.
    Arnold K, Kiefer F et al (2009) The protein model portal. J Struct Funct Genomics 10(1):1–8CrossRefPubMedGoogle Scholar
  5. 5.
    Berman HM, Westbrook JD et al (2009) The protein structure initiative structural genomics knowledgebase. Nucleic Acids Res 37(Database issue):D365–D368CrossRefPubMedGoogle Scholar
  6. 6.
    Mirkovic N, Li Z et al (2007) Strategies for high-throughput comparative modeling: applications to leverage analysis in structural genomics and protein family organization. Proteins 66(4):766–777CrossRefPubMedGoogle Scholar
  7. 7.
    Altschul SF, Madden TL et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402CrossRefPubMedGoogle Scholar
  8. 8.
    Benson DA, Karsch-Mizrachi I et al (2004) GenBank: update. Nucleic Acids Res 32(Database issue):D23–D26CrossRefPubMedGoogle Scholar
  9. 9.
    Sali A, Blundell TL (1993) Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 234(3):779–815CrossRefPubMedGoogle Scholar
  10. 10.
    Schwede T, Kopp J et al (2003) SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Res 31(13):3381–3385CrossRefPubMedGoogle Scholar
  11. 11.
    Sippl MJ (1993) Recognition of errors in three-dimensional structures of proteins. Proteins 17(4):355–362CrossRefPubMedGoogle Scholar
  12. 12.
    Sanchez R, Sali A (1998) Large-scale protein structure modeling of the Saccharomyces cerevisiae genome. Proc Natl Acad Sci USA 95(23):13597–13602CrossRefPubMedGoogle Scholar
  13. 13.
    Petrey D, Fischer M et al (2009) Structural relationships among proteins with different global topologies and their implications for function annotation strategies. Proc Natl Acad Sci USA 106(41):17377–17382CrossRefPubMedGoogle Scholar
  14. 14.
    Berman HM, Westbrook JD (2004) The impact of structural genomics on the protein data bank. Am J Pharmacogenomics 4(4):247–252CrossRefPubMedGoogle Scholar
  15. 15.
    Alpy F, Tomasetto C (2005) Give lipids a START: the StAR-related lipid transfer (START) domain in mammals. J Cell Sci 118(Pt 13):2791–2801CrossRefPubMedGoogle Scholar
  16. 16.
    Hanada K, Kumagai K, Tomishige N, Kawano M (2007) CERT and intracellular trafficking of ceramide. Biochim Biophys Acta 1771(6):644–653PubMedGoogle Scholar
  17. 17.
    Wirtz KW (2006) Phospholipid transfer proteins in perspective. FEBS Lett 580(23):5436–5441CrossRefPubMedGoogle Scholar
  18. 18.
    Strauss JF III, Kishida T, Christenson LK, Fujimoto T, Hiroi H (2003) START domain proteins and the intracellular trafficking of cholesterol in steroidogenic cells. Mol Cell Endocrinol 202(1–2):59–65PubMedGoogle Scholar
  19. 19.
    Tsujishita Y, Hurley JH (2000) Structure and lipid transport mechanism of a StAR-related domain. Nat Struct Biol 7(5):408–414CrossRefPubMedGoogle Scholar
  20. 20.
    Im YJ, Raychaudhuri S et al (2005) Structural mechanism for sterol sensing and transport by OSBP-related proteins. Nature 437(7055):154–158CrossRefPubMedGoogle Scholar
  21. 21.
    Kanno K, Wu MK et al (2007) Structure and function of phosphatidylcholine transfer protein (PC-TP)/StarD2. Biochim Biophys Acta 1771(6):654–662PubMedGoogle Scholar
  22. 22.
    Radauer C, Lackner P et al (2008) The Bet v 1 fold: an ancient, versatile scaffold for binding of large, hydrophobic ligands. BMC Evol Biol 8:286CrossRefPubMedGoogle Scholar
  23. 23.
    Schrick K, Nguyen D et al (2004) START lipid/sterol-binding domains are amplified in plants and are predominantly associated with homeodomain transcription factors. Genome Biol 5(6):R41CrossRefPubMedGoogle Scholar
  24. 24.
    Shen Y, Goldsmith-Fischman S et al (2005) NMR structure of the 18 kDa protein CC1736 from Caulobacter crescentus identifies a member of the START domain superfamily and suggests residues mediating substrate specificity. Proteins 58(3):747–750CrossRefPubMedGoogle Scholar
  25. 25.
    Gajhede M, Osmark P et al (1996) X-ray and NMR structure of Bet v 1, the origin of birch pollen allergy. Nat Struct Biol 3(12):1040–1045CrossRefPubMedGoogle Scholar
  26. 26.
    Roderick SL, Chan WW et al (2002) Structure of human phosphatidylcholine transfer protein in complex with its ligand. Nat Struct Biol 9(7):507–511PubMedGoogle Scholar
  27. 27.
    Barros MH, Johnson A et al (2005) The Saccharomyces cerevisiae COQ10 gene encodes a START domain protein required for function of coenzyme Q in respiration. J Biol Chem 280(52):42627–42635CrossRefPubMedGoogle Scholar
  28. 28.
    Finn RD, Tate J et al (2008) The Pfam protein families database. Nucleic Acids Res 36(Database issue):D281–D288PubMedGoogle Scholar
  29. 29.
    Schultz J, Milpetz F et al (1998) SMART, a simple modular architecture research tool: identification of signaling domains. Proc Natl Acad Sci USA 95(11):5857–5864CrossRefPubMedGoogle Scholar
  30. 30.
    Geer LY, Domrachev M, Lipman DJ, Bryant SH (2002) CDART: protein homology by domain architecture. Genome Res 12:1619–1623CrossRefPubMedGoogle Scholar
  31. 31.
    Marchler-Bauer A, Bryant SH (2004) CD-Search: protein domain annotations on the fly. Nucleic Acids Res 32:W327–W331CrossRefPubMedGoogle Scholar
  32. 32.
    Landau M, Mayrose I et al (2005) ConSurf: the projection of evolutionary conservation scores of residues on protein structures. Nucl Acids Res 33:W299–W302CrossRefPubMedGoogle Scholar
  33. 33.
    Nicholls A, Sharp KA, Honig B (1991) Protein folding and association: insights from the interfacial and thermodynamic properties of hydrocarbons. Proteins Stuc Func Genet 11:281–296CrossRefGoogle Scholar
  34. 34.
    Kleywegt GJ, Jones TA (1994) Detection, delineation, measurement and display of cavities in macromolecular structures. Acta Crystallogr D Biol Crystallogr 50(Pt 2):178–185CrossRefPubMedGoogle Scholar
  35. 35.
    Laskowski RA (1995) SURFNET: a program for visualizing molecular surfaces, cavities and intermolecular interactions. J Mol Graph 13:323–330CrossRefPubMedGoogle Scholar
  36. 36.
    Eisenberg D, Lüthy R, Bowie JU (1997) VERIFY3D: assessment of protein models with three-dimensional profiles. Methods Enzymol 277:396–404CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2010

Authors and Affiliations

  • Hunjoong Lee
    • 1
  • Zhaohui Li
    • 1
  • Antonina Silkov
    • 1
  • Markus Fischer
    • 2
  • Donald Petrey
    • 2
  • Barry Honig
    • 2
  • Diana Murray
    • 1
  1. 1.Department of Pharmacology, College of Physicians and Surgeons of Columbia UniversityCenter for Computational Biology and BioinformaticsNew YorkUSA
  2. 2.Department of Biochemistry and Molecular Biophysics, Howard Hughes Medical Institute, Center for Computational Biology and BioinformaticsColumbia UniversityNew YorkUSA

Personalised recommendations