High-throughput crystallization-to-structure pipeline at RIKEN SPring-8 Center
- 137 Downloads
A high-throughput crystallization-to-structure pipeline for structural genomics was recently developed at the Advanced Protein Crystallography Research Group of the RIKEN SPring-8 Center in Japan. The structure determination pipeline includes three newly developed technologies for automating X-ray protein crystallography: the automated crystallization and observation robot system “TERA”, the SPring-8 Precise Automatic Cryosample Exchanger “SPACE” for automated data collection, and the Package of Expert Researcher’s Operation Network “PERON” for automated crystallographic computation from phasing to model checking. During the 5 years following April, 2002, this pipeline was used by seven researchers to determine 138 independent crystal structures (resulting from 437 purified proteins, 234 cryoloop-mountable crystals, and 175 diffraction data sets). The protocols used in the high-throughput pipeline are described in this paper.
KeywordsStructural genomics High-throughput X-ray diffraction Crystal structure
Advanced Protein Crystallography Research Group
Protein Data Bank
Knowledge of the three-dimensional (3-D) structures of proteins at the atomic level is essential for understanding proteins’ detailed molecular mechanisms. For example, determining the 3-D structural information around the active site is an indispensable prerequisite for structure-based drug design. X-ray crystallography is one of the most powerful techniques available for determining the 3-D structures of proteins. Recently, various high-throughput pipelines for determining protein crystal structures have been developed by several structural genomics initiatives [1, 2, 3, 4, 5, 6, 7]. In Japan, a 5-year structural genomics project called the “National Project on Protein Structural and Functional Analyses” (also known as the “Protein 3000 project”) funded by the Ministry of Education, Sports, Science and Technology (MEXT) was conducted from April 2002 to March 2007 . On behalf of the RIKEN Structural Genomics Initiative (RSGI; http://www.rsgi.riken.go.jp), the RIKEN SPring-8 Center contributed to the Protein 3000 project through the X-ray protein crystallography utilizing the synchrotron radiation of SPring-8, Japan. In the RIKEN SPring-8 Center, several high-throughput facilities for protein crystallography have recently been developed, including the automated crystallization and observation robot system “TERA” , the SPring-8 Precise Automatic Cryo-sample Exchanger “SPACE”  at beamline BL26  for automated data collection, the heavy-atom searching program “HATODAS” for derivatization of protein crystals (ref: http://hatodas.harima.riken.go.jp) , the versatile cryoprotectant and its application to heavy-atom derivatization , the microporous zeolite as a hetero-epitaxic nucleant for protein crystallization , and the automated crystallographic computation system “PERON” (Package of Expert Researcher’s Operation Network; Asada et al., in preparation). The efficiency of the structure determination pipeline greatly depends upon the protocols by which the developed elemental technologies are integrated with it. Here, we introduce the high-throughput crystallization-to-structure pipeline at the Advanced Protein Crystallography Research Group (i.e., the APCR-group, formerly known as the Highthroughput Factory) of the RIKEN SPring-8 Center.
Target selection in structural genomics
Cumulative summary of protein structure determination in the APCR-group
Thermus thermophilus HB8
Pyrococcus horikoshii OT3
Percent of solved proteinb
Percent of solved crystalc
The purified recombinant protein samples for crystallization were supplied mainly from the RIKEN Genomic Science Center and the Structurome Research Group . The protein samples were crystallized using the oil-microbatch method  implemented in the automated crystallization and observation robot system, TERA . A sparse-matrix crystal screening of TERA was performed using both the original kit (144 conditions for the initial screening and 3456 (144 × 24) conditions for optimization) and the following commercially available kits: Crystal Screen I/II, PEG Ion, Crystal Screen Cryo, and Crystal Screen Lite from Hampton Research; Wizard I/II and Cryo I/II from Dedode Genetics. The oil-microbatch crystallization was carried out at 291 K using a Nunc HLA crystal plate with 72 wells (Nalge Nunc International). The crystal plate for TERA was assigned a unique barcode address. A crystallization drop of 1.0 μl was set up by using a 1:1 mixing of protein solution and precipitant solution in a well of the crystal plate which was then covered with 15 μl of paraffin oil. Utilizing the automatic observation function of TERA, the photographic image of each well was automatically taken and recorded once a week for two months. TERA provided access to the images via a WEB user interface. The droplet images were evaluated by manual scoring on an integer scale: 0 for a clear drop; 1 for a grainless, heavy precipitate; 2 for a fine, white precipitate; 3 for a granulated, transparent precipitate; 4 for a coarse, granulated precipitate; 5 for a micro-crystal with dimensions less than 0.05 mm; 6 for a mountable needle or plate crystal with a thickness of 0.01–0.03 mm; 7 for a mountable cluster of crystals; 8 for a mountable single crystal with a thickness greater than 0.03 mm and with dimensions less than 0.2 mm; 9 for a mountable single crystal with dimensions greater than 0.2 mm; −1 for a dried-up drop that left no solution . Computerized evaluation and scoring of crystallization droplets is a key technology that is required for fully automated crystallization experiments. Although this technology is not yet available, the development is in progress using image classification software [28, 29]. Crystals from the initial screening with scores ranging from 5 to 7 were optimized using a 24 well grid screening around the initial condition with variations in the buffer pH and the precipitant/salt concentrations.
The TERA system is capable of setting up 40 crystal plates (2880 drops) of crystallization screening per day and taking observation images for 200 crystal plates (14,400 drops) per day. During the 5 years course of the Protein 3000 project, 234 (53.5%) of 437 purified independent protein samples yielded cryoloop-mountable crystals in the APCR-group, where “independent protein” is defined as a protein which shares limited sequence similarities of less than 90% in amino-acid identity with structure-known proteins in the PDB (Table 1). Of the 138 PDB structures in Table 1, 67 are derived from optimization screening, indicating that more than 48% of the structures were determined using crystals from optimized crystallization conditions. The sitting-drop and hanging-drop methods of vapor diffusion are the most commonly used techniques for the screening of protein crystallization. Accordingly, the sitting-drop vapor diffusion method is popular in the European structural genomics project “SPINE” (Structural Proteomics In Europe) . The sitting-drop vapor diffusion method employs currently available nanolitre-scale systems such as the T2K crystallization system  which can process a 20 nl protein solution per droplet. However, we adopted the oil-microbatch method for TERA which processes 0.5 μl protein solution per droplet, because this method is the simplest technique mechanically, which is an important factor for the robustness of automated systems. Over a million of crystallization conditions have been screened using the TERA system since 2002, indicating its high reliability and durability in the case of protein samples with sufficient quantities. However, with current X-ray crystallography, the crystallization of proteins with poor crystallizability is becoming a major problem that cannot be resolved by further minimization in sample amounts. Therefore, in order to overcome this limitation, we need new methods to improve protein crystallizability, such as nucleant-mediated crystallization methods using mineral substances , porous silicon , mesoporous bioactive gel-glass , and microporous zeolite .
Robotic screening for diffraction quality of crystals
The diffraction experiment was carried out using an in-house Cu Kα diffractometer with a Rigaku R-AXIS VII imaging-plate detector and the SPring-8 Precise Automatic Cryosample Exchanger SPACE . Protein crystals with a score of 6 or higher were manually mounted one at a time in a SPACE cryoloop . One of the advantages of the screw thread type pin of SPACE is the ability to reproduce the sample position among multiple mounting trials, although other synchrotron facilities worldwide employ sample changers using a magnetic-base sample pin, such as the EMBL/ESRF/BM13 robotic sample changer (SC3)  and the LBL-ALS automount robot . The crystals were then flash-frozen under a nitrogen gas stream at 100 K and automatically stored in a sample storage tray at cryogenic temperatures by SPACE. The bar-coded sample storage tray was a metal block with 52 cylindrical wells for storing the same number of cryoloops. On the initial stage of the diffraction experiments, a protein crystal with a unique identification number was treated with the versatile cryoprotectant , in order to adapt it to the large variety of crystals from TERA. Two diffraction images from two orthogonal crystal orientations were collected at 100 K, with the oscillation angle and the exposure time per frame of 1° and 3 min, respectively. The in-house diffraction checking for 52 crystals (a block of the SPACE tray) was performed within 10 hours, using the procedure described above. The crystal mosaicity and the resolution limit were evaluated using the program HKL2000 . Based on the results of the diffraction experiments, the crystallization conditions and the cryoprotectant conditions were optimized to reach the best quality of crystals. For heavy-atom derivatization, protein crystals were treated by the versatile cryoprotectant with heavy-atom reagents . The trial order of heavy-atom reagents was decided based on suggestions from the heavy-atom searching program HATODAS (refer to http://hatodas.harima.riken.go.jp) . A combined use of the program HATODAS and the versatile cryoprotectants with heavy-atom reagents enabled an efficient heavy-atom derivatization screening with the automated SPACE system. After the diffraction checking, crystals of high-quality diffraction were placed in liquid nitrogen with the storage tray until the time of data collection. The in-house robotic screening of diffraction quality helped us to use the synchrotron radiation beamtime more efficiently. However, the crystal harvesting process from the crystallization plate still requires human intervention and is therefore the most time-consuming step, depending upon the specific person’s skill with crystal handling. Unfortunately, current robotic technology’s accuracy for this process is not yet sufficient to replace manual handling. This should be resolved in the near future through further advance in robotic technology.
Data collection and processing
Resolution limit of diffraction data in the APCR-group
Resolution limit d (Å)
Thermus thermophilus HB8
Pyrococcus horikoshii OT3
3.0 ≤ d
2.5 ≤ d < 3.0
2.0 ≤ d < 2.5
1.5 ≤ d < 2.0
d < 1.5
High-throughput determination of crystal structure
One of the major challenges in structural genomics is the communication of complex specifications from an expert crystallographer to the person in charge of structural analysis (who is usually inexperienced in protein crystallography). The novice user is likely to make mistakes which can make more work for the expert to correct at a later stage. The program system PERON (Package of Expert Researcher’s Operation Network; Asada et al., in preparation) enables the inexperienced user to perform the entire process of crystallographic computation in an automated and controlled manner. PERON integrates commonly available analysis programs with exhaustive combinations of parameters: SOLVE  for experimental phasing, Molrep  and EPMR  for molecular replacement (MR) phasing, and RESOLVE  and ARP/wARP  for model building. This PERON system allowed us to simplify project communications so that experts could recommend trying subsequent crystals if PERON calculations proved unsuccessful. The input to the PERON WEB server is a scaled reflection file from HKL2000  or d*TREK (Rigaku/MSC). The efficiency in the subsequent manual model building process depends highly upon the quality of the initial model from PERON. The systematic calculation protocol of PERON screens the parameters including the resolution range, the solvent content, and the number of protein molecules in an asymmetric unit, which are critical to improving the quality of the model. The multiple calculations with varying parameters in PERON were performed in parallel using a PC cluster with 20 CPUs. In case the PERON auto-phasing failed, a manual phase calculation was tried using phasing programs such as SHARP , SHELXD , and SnB [54, 55]. Poor MR phasing from a search model sharing a low amino-acid identity (less than 30%) with the target protein was not suitable for automatic model building by RESOLVE  or ARP/wARP , and it tended to make the model-rebuilding process difficult. In such cases, experimental phasing methods using selenomethionyl proteins  and other heavy-atom derivatives were tried in parallel. Of phased 149 independent crystal structures in the APCR-group, more than 35% were solved by the MAD, SAD, MIR and SIR phasing methods, suggesting that the experimental phasing was effective for high-throughput structure determination (Table 1).
Structure refinement, validation and PDB deposition
After the PERON auto model building, we performed manual model revision and refinement using the programs QUANTA2000 (Accelrys Inc.) and CNS , respectively. A diffraction data set at a high resolution enabled the execution of the automated model building programs RESOLVE , ARP/wARP , and LAFIRE . However, in spite of extensive optimization trials to achieve the best quality of crystals in the crystallization process, data sets with resolution worse than 2.0 Å comprised about 53% of total data sets in our pipeline (Table 2). Therefore, a pivotal task in the present pipeline is improving the quality of protein crystals. The stereochemical quality of refined models was evaluated by PROCHECK . The manual model checking and revision of the refined structure before the PDB deposition  are also time-consuming and require a great deal of expertise in protein crystallography. To facilitate this process, we included PERON’s automated model checking function (Asada et al., in preparation) in our pipeline. The model fitness to electron density maps and non-covalent interactions including hydrogen-bond networks and unfavorable contacts were checked and semi-automatically revised in the PERON system. During the 5 year course of the Protein 3000 project, seven researchers in the APCR-group deposited 307 protein structures into the PDB from 725 purified samples (Table 1), indicating that each researcher determined about nine structures per year. As of January 2008, structural genomics centres worldwide purified 26930 soluble targets: 18.9% (5082) were crystallized and diffracted, and 14.4% (3872) resulted in crystal structures . In our pipeline, the success rates from purified protein to crystal and from purified protein to structure were 53.4% (234/437) and 34.1% (149/437), respectively. These rates are significantly higher than those reported by structural genomics centres worldwide, although our higher success rate in the APCR-group may be greatly attributed to the higher thermostability of the sample proteins we used. Regarding the independent structures, 46.6% of the purified proteins failed to crystallize and 25.2% of the crystallized proteins did not yield diffraction data. The crystallization step had the lowest success rate: only half of the purified samples could be crystallized, suggesting that this is the most difficult step in the structure determination process. The X-ray data collection step has the second lowest success rate, emphasizing the importance of improving crystal quality. Finally, the ratio of structure-solved proteins versus purified or crystallized proteins tends to decrease year by year (data not shown), indicating rising difficulty in obtaining quality of protein samples. The next stage of structural genomics clearly requires further development in elemental technologies for the preparation of crystallizable samples and for the improvement of crystal quality.
Protein structure determination for functional analysis is an important component of our pipeline. Knowing a protein’s structure allows us to infer its biological function. Firstly, a DALI search  was made to check whether or not structurally similar proteins existed in the PDB. Any detected structural similarities to function-known proteins are useful to determine the function of the target protein. In addition, the programs BLAST and PSI-BLAST  were routinely used to predict the protein’s function. From these results, candidates for functional ligands were selected to prepare protein-ligand complexes by cocrystallization or soaking, and finally the crystal structures of the liganded forms were determined. Different crystal forms of the same protein which have variations in crystal packing were also determined to analyze the flexibility of the protein molecule. Furthermore, mutant proteins for functional analysis were purified and crystallized, if required. In the APCR-group, 199 structures were determined for functional analysis in total, and 169 structures of them were eventually deposited in the PDB (Table 1). Precise analysis of the structural changes between various crystal forms proved quite useful in understanding the structure-function relationship of the proteins. However, the structural changes observed in crystallography were not necessarily large in scale, and therefore only proteins that caused obvious structural changes had ever been the subject of research. In order to analyze the subtle structural change of proteins, we developed “the multiple superposition method” where structural changes of an oligomeric protein are precisely described as a combination of two components: an overall rearrangement of the subunit referred to as the rigid-body shift and an intra-subunit local deformation referred to as the local shift. This elemental technology was successfully applied to analyze the half-of-the-sites reactivity of the tetrameric phenylacetate degradation protein PaaI , in which the structure-function relationship is elucidated from its 3-D structure. While the conformational analysis procedure is convenient, successful cases such as in PaaI are still rare. Regarding the current technology in structure-based drug design, only 3-D structures at high resolution are useful for drug development. In our pipeline, the rate of collected data sets at resolution higher than 1.5 Å is less than 7% for independent proteins and 5% for functional analysis proteins (Table 2). Thus, the demand for high resolution data should be resolved with special urgency.
Of the authors, Michi. S. contributed principally to this work, solving structures and writing the paper; YA solved structures and contributed to the automated structure determination using PERON; KS solved structures and contributed to the automated diffraction experiment using SPACE; HY solved structures and contributed to large-scale protein production; NKL, HM, and BB solved structures; YM, MT, YK, NO, YM, YT, HS, and TN contributed to the construction of the structure determination pipeline; Mitsu. S. contributed to automated crystallization using TERA; MY contributed to the automated diffraction experiment using SPACE; NK supervised this work and wrote the paper. The authors would like to thank the staff of the RIKEN Genomic Science Center and the Structurome Research Group for providing plasmids, the technical staff of the RIKEN SPring-8 Center for assistance in the large-scale protein production, the beamline staff for assistance during the data collection at the BL26B1/B2 of SPring-8, and Drs M. Miyano, T. Iizuka, S. Yokoyama, and T. Ishikawa for their direction of the APCR-group. This work was supported by the “National Project on Protein Structural and Functional Analyses” funded by the Ministry of Education, Sports, Science, and Technology of Japan.
- 12.Sugahara M, Asada Y, Ayama H, Ukawa M, Taka H, Kunishima N (2005) Acta Crystallogr D61:1302–1305Google Scholar
- 13.Sugahara M, Kunishima N (2006) Acta Crystallogr D62:520–526Google Scholar
- 14.Sugahara M, Asada Y, Morikawa Y, Kageyama Y, Kunishima N (2008) Acta Crystallogr D64:686–695Google Scholar
- 15.Lokanath NK, Shiromizu I, Ohshima N, Nodake Y, Sugahara Mitsuaki, Yokoyama S, Kuramitsu S, Miyano M, Kunishima N (2004) Acta Crystallogr D60:1816–1823Google Scholar
- 16.Asada Y, Sawano M, Ogasahara K, Nakamura J, Ota M, Kuroishi C, Sugahara Mitsuaki, Yutani K, Kunishima N (2005) J Biochem (Tokyo) 138:343–353Google Scholar
- 17.Sugahara M, Ohshima N, Ukita Y, Sugahara Mitsuaki, Kunishima N (2005) Acta Crystallogr D61:1500–1507Google Scholar
- 28.Kawabata K, Takahashi M, Saitoh K, Asama H, Mishima T, Sugahara Mitsuaki, Miyano M (2006) Acta Crystallogr D62:239–245Google Scholar
- 29.Kawabata K, Saitoh K, Takahashi M, Sugahara Mitsuaki, Asama H, Mishima T, Miyano M (2006) Acta Crystallogr D62:1066–1072Google Scholar
- 30.Berry IM, Dym O, Esnouf RM, Harlos K, Meged R, Perrakis A et al (2006) Acta Crystallogr D62:1137–1149Google Scholar
- 36.Cipriani F, Felisaz F, Launer L, Aksoy JS, Caserotto H, Cusack S et al (2006) Acta Crystallogr D62:1251–1259Google Scholar
- 40.Kim KM, Yi EC, Baker D, Zhang KYJ (2001) Acta Crystallogr D57:759–762Google Scholar
- 43.Czepas J, Devedjiev Y, Krowarsch D, Derewenda U, Otlewski J, Derewenda ZS (2004) Acta Crystallogr D60:275–280Google Scholar
- 44.Cooper DR, Boczek T, Grelewska K, Pinkowska M, Sikorska M, Zawadzki M et al (2007) Acta Crystallogr D63:636–645Google Scholar
- 45.Heras B, Martin JL (2005) Acta Crystallogr D61:1173–1180Google Scholar
- 46.Newman J (2006) Acta Crystallogr D62:27–31Google Scholar
- 47.Terwilliger TC, Berendzen J (1999) Acta Crystallogr D55:849–861Google Scholar
- 48.Vagin A, Teplyakov A (2000) Acta Crystallogr D56:1622–1624Google Scholar
- 49.Kissinger CR, Gehlhaar DK, Fogel DB (1999) Acta Crystallogr D55:484–491Google Scholar
- 50.Terwilliger TC (1999) Acta Crystallogr D55:1863–1871Google Scholar
- 53.Schneider TR, Sheldrick GM (2002) Acta Crystallogr D58:1772–1779Google Scholar
- 55.Weeks CM, Miller R (1999) Acta Crystallogr D55:492–500Google Scholar
- 57.Brünger AT, Adams PD, Clore GM, Delano WL, Gros P, Grosse–Kunstleve RW et al (1998) Acta Crystallogr D54:905–921Google Scholar
- 58.Yao M, Zhou Y, Tanaka I (2006) Acta Crystallogr D62:189–196Google Scholar