NMR spectroscopy is well suited for rapid and, in favorable cases, largely automated structure determination of small (<125 residues) proteins [22, 36]. While backbone assignments for such proteins are routinely obtained in a largely automated fashion [1, 17, 23, 39], assignment of sidechain resonances can often be a bottleneck for the process of structure determination. Automated sidechain assignment methods are, however, evolving and beginning to have an important impact on the field [1, 13]. Recently, we described an approach for solving protein NMR structures using Rosetta conformational energy calculations together with only a limited amount of experimental NMR data, including backbone resonance assignments, residual dipolar coupling data, and some manually-assigned long-range backbone-backbone NOEs [26]. This approach was demonstrated to provide accurate backbone structures (chain folds) for proteins of up to 25 kDa, with reasonably accurate core sidechain packing.

Several years ago, we described a strategy for rapid automatic determination of small (<100 residue) protein structures using only the sparse constraints that can be obtained using a perdeuterated protein [38]. Our strategy for rapid fold determination derives from ideas that were originally introduced for determining NMR structures of larger proteins [9, 10, 12], using [2H,13C,15N]-enriched protein samples with protonated sidechain methyl groups (13CH3). Data collection includes acquiring NMR spectra for determining assignments of backbone and sidechain 15N, HN resonances, and sidechain 13CH3 methyl resonances. Backbone resonance assignments and NOESY cross peaks are then determined automatically, and 3D structures generated using CNS [5, 20]. This strategy provides reliable backbone chain folds for small (<100 residue) proteins, which are useful for certain applications, and good starting points for further refinement to high precision and accuracy using additional NMR data.

This “sparse constraint” approach exploits the fact that perdeuteration generally improves spectral quality and interpretability even of smaller proteins. Although protein deuteration is not generally required for small protein structure determination, it is valuable for improving sensitivity of many amide or methyl proton-detected heteronuclear NMR experiments [2, 14] even for proteins in the 7–12 kDa range. As the gyromagnetic ratio of the 2H is ~6.5 fold smaller than that of 1H, the dipolar interaction between 13C or 15N and the directly bound proton spin is greatly reduced. Therefore the transverse relaxation times T2 of 13C and 15N nuclei are increased, providing sharper linewidths and higher signal-to-noise ratios (S/N). Constant-time NMR experiments which may have poor S/N with fully protonated proteins can be recorded with higher sensitivity due to the reduced transverse relaxation rates of 13C and 15N obtained for perdeuterated proteins. We also observe better performance of automated resonance assignment software for backbone resonance assignments (e.g. AutoAssign [39]) because of the improved resolution and sensitivity of amide HN-detected triple resonance experiments on the perdeuterated protein samples. Another advantage of longer transverse relaxation times and the reduction in spin-diffusion pathways is that it permits the detection of weaker NOEs that may not otherwise be observed when longer NOESY mixing times are used. Some poor NMR signals resulting from exchange broadening and limited protein solubility can also be improved by perdeuteration. These advantages of deuterium incorporation are well-known for studies of larger (15–50 kDa) proteins, but also provide improved performance and improved S/N for smaller sized (<70–100 residues) proteins.

While the idea of rapid, fully automated structure determination of small perdeuterated proteins is attractive and innovative, two drawbacks have hindered the routine application of this method for high-throughput NMR protein structure determination. First, producing perdeuterated proteins by conventional expression methods is expensive, and secondly, only backbone chain folds are reliably determined using sparse constraints and CNS refinement [38]; the details of the resulting structures are not particularly good.

Here, we combine the fully automated sparse constraint approach for small proteins, first outlined by Zheng et al. [38], with two recent innovations. First, we have adopted recently developed condensed-phase single protein production (cSPP) methods [29, 3335] to allow bacterial expression in 10 to 40-fold condensed-phase fermentations without reduction in protein expression per cell, allowing significantly less expensive production of 2H, 13C, 15N-enriched proteins. In the cSPP system, MazF, an mRNA interferase functioning as an ACA-specific endoribonuclease, is co-expressed with the target protein. The expression of MazF eliminates almost all cellular mRNAs containing ACA sequences. The target gene is selectively expressed by engineering it to contain no ACA sequences, without altering the amino acid sequence of the protein encoded by the resulting mRNA. Secondly, we replace CNS refinement with the recently introduced CS-Rosetta method [31] for small protein structure analysis. The CS-Rosetta program provides a powerful approach for NMR structure determination of small proteins using only 1H, 13C, and 15N backbone and 13Cβ resonance assignments [31]. Exploiting these recent innovations, we have extended the approach originally described by Zheng et al. [38] to demonstrate, using a 2H, 15N, 13C-enriched sample of the 86-residue E. coli cold shock protein (CspA) as an example of a general process for determining accurate small protein structures requiring only a few days of NMR data collection, a specific data collection protocol, and largely automated data analysis.

Methods and materials

Preparation of [1H-13C]-I(δ1)LV, 13C, 15N, 2H—CspA for structural studies

Competent E. coli BL21(DE3) cells containing the pACYCmazF [35] plasmid were transformed with pColdI(SP-4) [33] plasmid (Takara Bioscience, Inc) containing ACA-less cspA gene. The resulting constructs include a 16-residue N-terminal tag, consisting of a translation enhancing element (TEE), a His6 tag, and a Factor Xa cleavage site. Protein expression was performed essentially as described by Schneider et al. [29], with the following details: single colonies were selected and used to inoculate 2.5 ml LB medium at 37°C for 6 h. 2 ml of the LB culture was inoculated into 100 ml of MJ9 minimal medium at 37°C overnight. When OD600 reached 1.8–2.0 units, the culture was centrifuged at 3,000 × g for 15 min at 4°C. The cell pellet was resuspended in 1 l of fresh MJ9 medium and cells were grown at 37°C until OD600 reached 0.5. At this point the culture was chilled on ice for 5 min and shifted to 15°C for 45 min to acclimate the cells to cold shock conditions. Target protein (CspA) was then expressed along with MazF for 1.5 h by addition of 1 mM isopropyl-β-d-thiogalactoside (IPTG) prior to expression in isotope enriched medium. Cultures were then centrifuged at 3,000 × g for 15 min at 4°C, resuspended in 2.5% volume (40× condensed) in deuterated (2H2O) wash solution [7.0 g/l Na2HPO4; 3.0 g/l KH2PO4; 0.5 g/l NaCl; pH 7.4], centrifuged again, and resuspended in 25 ml of deuterated MJ9 minimal medium containing 1 g/l 15NH4Cl; 4 g/l 13C, 2H-glucose; 50 mg/l α-13C-ketobutyric acid; 100 mg/l α-13C-ketoisovaleric acid; and 1 mM IPTG. Protein expression continued at 15°C for 24 h. Cells were harvested by centrifugation as described above and cell pellets were stored at −80°C. All isotopes were purchased from Cambridge Isotope Laboratories.

CspA purification and concentration

Cell pellets were resuspended in 40 ml of lysis buffer [50 mM Na2HPO4-NaH2PO4; 300 mM NaCl; 5 mM imidazole; 5 mM 2-mercaptoethanol; with 1 EDTA-free protease inhibitor tablet (Roche Cat. # 11 873 580 001) per 50 ml at pH 8.0] and sonicated to lyse the cells. Lysed cells were then centrifuged at 4°C for 1 h at 16,000 rpm in a Sorvall SS-34 rotor. The protein was further purified by binding to Ni–NTA agarose at 40 ml of soluble extract per 1 ml of bed resin at 4°C overnight. Resin was washed twice with 25 ml of Wash Buffer [50 mM Na2HPO4-NaH2PO4; 300 mM NaCl; 25 mM imidazole; 5 mM 2-mercaptoethanol, pH 8.0], and protein was eluted in 8 ml of Elution Buffer [50 mM Na2HPO4-NaH2PO4; 300 mM NaCl; 300 mM imidazole; 5 mM 2-mercaptoethanol, pH 8.0]. The protein solution was then dialyzed overnight at 4°C into NMR Buffer containing 50 mM KH2PO4, 1 mM NaN3, pH 6.0, containing 10% 2H2O, and concentrated to a final concentration of ~0.2 mM.

NMR spectroscopy

Backbone resonance assignments for [1H-13C]-I(δ1)LV, 13C, 15N, 2H-enriched CspA were determined using conventional triple resonance NMR experiments [37] including HNCO and deuterium-decoupled pulse sequences HN(ca)CO; HNCA; HN(co)CA; HNCACB and HN(co)CACB. The carrier positions were set to 118.0 ppm for 15N, 176 ppm for 13CO, 54 ppm for 13Cα and 39 ppm for 13Cα/13Cβ. Key parameters of data collection are summarized in Table 1. The total data collection time for all of these triple resonance experiments was about 3.5 days.

Table 1 800 MHz triple resonance data used for determining backbone resonance assignments

In addition, 3D 13C-edited NOESY (mixing time of 350 ms) and 15N-edited NOESY (mixing time of 175 ms) were collected on a 600 MHz Bruker spectrometer with TXI probe. The matrix sizes of these spectra were 1,024 × 32 × 220 total data points for 13C-edited NOESY, and 1,024 × 64 × 256 total data points for 15N-edited NOESY. For 13C-edited NOESY, the spectrum widths in 1H, 13C and indirect detected 1H dimensions were set to 14, 16 and 12 ppm respectively and the carrier positions were set 4.7 ppm for 1H and 16 ppm for 13C dimension. For 15N-edited NOESY, the spectrum widths in 1H, 15N and indirect detected 1H dimensions were set to 14, 28 and 11.5 ppm respectively and the carrier positions were set 4.7 ppm for 1H and 118 ppm for 15N dimensions. The total data collection time for 13C-edited and 15N-edited NOESY spectra was approximately 2.5 days.

In all NMR experiments, FIDs were processed with linear prediction and zero filling, and weighted by sine bell function in all direct and indirect detected dimensions. All NMR spectra were processed and examined with NMRPipe and NMRDraw software packages [7]. The program SPARKY [11] was used for data visualization and analysis. Chemical shifts of proton were referenced to external DSS. 13C and 15N chemical shifts were referenced indirectly based on the proton referencing.

Analysis of resonance assignments

AutoAssign [23] software was used for automated analysis of backbone and side chain 13Cβ resonance assignments for CspA. Peak list of [15N-1HN]-HSQC, and peak lists from the triple resonance experiments, including 3D HNCO; HN(ca)CO; HNCA; HN(co)CA; HNCACB and HN(co)CACB, were peak picked automatically using the ‘restrictive peak picking’ function of the SPARKY [11] software; in order to improve the performance of AutoAssign for backbone assignments, these peak lists were manually refined prior to input into AutoAssign [23, 39] for automated analysis of backbone resonance assignments. Cleaning up the peak lists only required 2–3 h. Sidechain 13C and 1H methyl resonances of Leu, Val and Ile (δ1) were determined subsequently by interactive spectral analysis using [13C–1H]-HSQC, 3D 13C-edited NOESY, and 3D 15N-edited NOESY spectra. These methyl sidechain assignments were used in the “conventional 3D structure calculations”, but not in the CS-Rosetta calculations.

Sparse-constraint 3D structure calculations

Sparse-constraint 3D structure calculations were performed using the AutoStructure [15, 16] software ver. 2.2.1-CND for automated analysis of NOESY cross peak assignments, implemented together with the program CYANA ver. 2.1 for structure generation. The input for AutoStructure analysis consisted of (1) a list of backbone and 13C-1H methyl sidechain assignments; (2) manually edited NOESY peak lists, including chemical shift and peak heights, generated from 13C-edited and 15N-edited NOESY spectra; (3) sites of slowly exchanging amide hydrogens based on published amide 1H/2H exchange data for CspA [8, 24]; (4) broad ϕ, ψ angle constraints (±40° and ±50°, respectively) derived from chemical shift data (after correction of 2H isotope-shift effect) using the program TALOS [6]. The best 10 of 56 structures (lowest energy) from the final cycle of AutoStructure were refined by restrained molecular dynamics in an explicit water bath using CNS 1.1 [5, 20].

Chemical-shift based protein structure prediction by ROSETTA (CS-ROSETTA)

Chemical shift information, including backbone 13Cα, 15N, 13C’, 1HN and sidechain 13Cβ assignments, were used as input for CS-ROSETTA. Details of the process of generating the CS-ROSETTA protein structure are described in Shen et al. [31] Three key steps are involved. First, based on the chemical shift values (which did not include backbone 1Hα shifts) and protein sequences, peptide fragments were selected from a protein structure database using the MFR module [7, 18] of the NMRPipe software package. All proteins with PSI-BLAST E-val score <0.05 with E. coli CspA were removed from the database. Second, a standard ROSETTA [27] protocol was used for de novo structure generation. Third, ROSETTA all-atom models resulting from the above procedure were evaluated based on how well backbone chemical shifts predicted for the models using SPARTA [30] agree with the experimental chemical shifts. If the lowest energy models cluster within less than ~2 Å from the model with the lowest energy, the structure prediction is considered successful and lowest energy models are converged. A total of 10,000 all-atom Rosetta models were generated from the MFR-selected peptide fragments, using a cluster of 20 CPUs. These CS-Rosetta runs required approximate 3 days. The 1,000 lowest-energy models were chosen and their all-atom ROSETTA energies were recalculated in terms of the fitness with respect to the experimental chemical shift values. The lowest energy models are converged based on the fact that Cα RMSD values are less than ~2 Å relative the lowest energy model. 10 lowest energy models were selected as a representation of the 3D structure of CspA. The CS-ROSETTA package used in this work may be downloaded from

Structure quality assessment

Global structure quality factors for the ensemble of CspA structures generated using sparse NMR constraints with conventional data analysis methods or by CS-Rosetta were determined using Protein Structure Validation Suite (PSVS) software package [3]. The output of the PSVS includes raw scores and normalized statistical Z-scores [3] for metrics assessed by the Verify 3D [4], Prosa II [32], PROCHECK [19], and MolProbity [21] software packages.


Rapid resonance assignments with perdeuterated CspA sample prepared by cSPP

An 86-residue construct of E. coli CspA was produced in 40-fold condensed 2H, 13C, 15N-enriched media using the cSPP system [29]. A 0.2 mM sample of ILV-perdeuterated CspA was used for collection of deuterium-decoupled triple resonance experiments. The complete list of the experiments includes HNCO, HN(ca)CO, HNCA, HN(co)CA, HNCACB, and HN(co)CACB experiments, collected over a period of 3.5 days at 800 MHz. The experiments executed and corresponding key parameters of the data collation are summarized in Table 1. Following automatic peak picking and some manual refinement of the peak lists with the SPARKY program [11], the program AutoAssign [39] was used for automatic analysis of backbone HN, 15N, 13Cα, 13C’, and sidechain 13Cβ resonance assignments. The resulting 13Cα, 13Cβ and 13C′ connectivity map, documented in Fig. 1, is essentially complete, indicating high reliability of the assignments. The reliability of the automatically-determined backbone resonance assignments was subsequently validated by manual analysis of these same data by interactive spectral analysis with SPARKY [11]. Except for the N-terminal tag (in the region of 6 consecutive His residues), a complete set of backbone and 13Cβ assignments were obtained for CspA. The automated backbone resonance assignments are consistent with the published assignments for CspA (BMRB accession number 4296), which have been validated by self-consistent analysis of NOESY data and 3D structure calculations. Perdeuterated protein samples produced with the cSPP system thus provide high-quality NMR spectra suitable for rapid automated analysis of backbone resonance assignments.

Fig. 1
figure 1

Summary of backbone and 13Cβ resonances assignments for CspA derived from triple resonance NMR experiments. Red bars and yellow bars underneath the amino acid sequence represent the connectivity established between intra and sequential residues respectively. These data were obtained by analyzing six 2D and 3D NMR spectra, summarized in Table 1. Slowly exchanging backbone amides, used in the conventional structure analysis but not in the CS-Rosetta analysis, identified by 1H/2H exchange measurements, are represented by filled circles. Secondary structures of the β-barrel found in the final structure are indicated by arrows along the amino acid sequence

Protein structure determination using sparse NMR constraints

As a further example of the novel utility of such perdeuterated samples produced with the cSPP system, we next demonstrated rapid analysis of the 3D structure of [1H-13C]-I(δ1)LV, 13C, 15N, 2H-enriched CspA using conventional sparse NOESY-based methods. Additional 3D 15N-edited NOESY and 3D 13C-edited NOESY data were acquired and used to assign side-chain methyl resonances, and NOESY cross peaks were assigned in order to determine the 3D structure by conventional automated methods with energy refinement. AutoStructure [15, 16] was used to determine NOESY cross peak assignments and to generate distance constraints, structure generation calculations were carried out using these constraints as input to CYANA, and CNS refinement was done by restrained molecular dynamics in explicit water [5, 20]. Table 2 summarizes the NOE-based distance constraints, hydrogen bonds, and dihedral angle constraints, identified by AutoStructure. AutoStructure identified a total of 131 distance constraints, including 61 long-range constraints. Based on characteristic NOE-based contact patterns and slow amide hydrogen/deuterium (H/D) exchange data, AutoStructure also identified a total of 22 hydrogen bond upper/lower constraints (11 hydrogen bonds); 20 of which are long-range hydrogen bond constraints. Identification of hydrogen bonds by AutoStructure is critical for proper registration of β-strands and folding β-sheet structures derived from sparse constraint data. In each of these calculations, 56 structures were generated from extended conformations, and 10 with lowest values of the target function were selected to represent the solution NMR structure of CspA. The resulting ensembles of these sparse-constraint CspA structures, and comparison with the crystal structure (PDB ID: 1mjc) [28], are shown in Fig. 2. In the remainder of the text, we use PDB id 1mjc to designate the crystal structure of CspA. Structural statistics of the minimal-constraint structures are also listed in Table 2. These structures exhibit good structural convergence and few residual constraint violations. The averaged backbone RMSD in the ordered residue regions is 1.2 Å. For the well-defined core residues, the averaged backbone RMSD relative to 1mjc is ~1.6 Å. These results show that the backbone structure generated with these sparse constraint automated analysis methods can be reasonably accurate, as discussed in detail by Zheng et al. [38].

Table 2 Summary of structural statistics for E. coli CspA NMR structures
Fig. 2
figure 2

Stereoview of the superimposition of AutoStructure-CNS structure for [1H-13C]-I(δ1)LV, 13C, 15N, 2H-enriched CspA determined by conventional automated analysis methods (blue) with 1mjc (red). a Backbone line representations of the 10 lowest energy conformers obtained from AutoStructure-CNS structure compared with 1mjc. b Ribbon diagram of the lowest energy conformer of AutoStructure-CNS structure versus 1mjc. c The packing of the hydrophobic residues (viz, V9, I21, V30, V32, I37, L45, V51, F53, A64, and V67) for the lowest energy conformer of AutoStructure-CNS structure versus 1mjc. The disordered N-terminal hexaHis expression tag is excluded from the analysis

CS-Rosetta structure generation for perdeuterated CspA

The recently introduced CS-Rosetta method [31] provides an alternative approach for small protein structure analysis using only backbone and 13 C β chemical shift data. CS-Rosetta calculations were carried out using these resonance assignments determined with <4 days of data collection using the perdeuterated CspA sample, produced with the cSPP system [29]. In this work, we tested the performance of CS-Rosetta with and without hydrogen–deuterium isotope chemical shift corrections on 13C chemical shift values. The isotope chemical shift corrections for backbone 15N and 13Cα nuclei and sidechain 13Cα nuclei were made using values proposed by Gardner et al. [10]. In our experience, the isotope chemical shift corrections did not impact the quality of the resulting CS-Rosetta structure. The resulting ensemble of 10 structures generated using no isotope shifts correction, shown in Fig. 3, exhibits excellent structure quality scores (Table 2). The CS-Rosetta structure is also in excellent agreement with the 1mjc crystal structure [28], with backbone RMSD of 0.5 Å, all atom RMSD of 1.2 Å for well-converged regions, and 1.1 Å RMSD for core, non-solvent-exposed sidechain atoms, relative to 1mjc. Additional key structural statistics for the CS-Rosetta structure are listed in Table 2. Also included in Table 2 are structural statistics for the conventional NMR structure of CspA (PDB ID: 3mef) determined several years ago with extensive analysis of sidechain assignments and NOEs [8]. In the remainder of the text, we use 3mef to designate the conventionally-determined NMR structure with full sidechain assignment. The Ramachandran statistics and global quality scores for CS-Rosetta structure are significantly better than those for the 3 mef or for the sparse-constraint structure.

Fig. 3
figure 3

Stereoview of the superimposition of the CS-Rosetta structure for 2H,13C,15N-enriched CspA (blue) with 1mjc (red). a Backbone line representations of the 10 lowest energy conformers obtained from CS-Rosetta structure compared with 1mjc. b Ribbon diagram of the lowest energy conformer of CS-Rosetta structure versus 1mjc. c The packing of the core hydrophobic residues (viz, V9, I21, V30, V32, I37, L45, V51, F53, A64, and V67) for the lowest energy conformer of CS-Rosetta structure versus 1mjc. The disordered N-terminal expression tag is excluded from the analysis

A comparison of the CS-Rosetta structure of Fig. 3 with the NOESY constraint list used to generate the sparse-constraint NMR structure of Fig. 2 (i.e. the data obtained for 2H, 15N, 13C-enriched 13CH3 methyl protonated CspA) is also summarized statistically in Table 2. This analysis reveals only a few distance violations >0.5 Å (the largest being 1.7 Å) across the ensemble of 10 CS-Rosetta structures, cross-validating the high accuracy of the CS-Rosetta structure. Comparison with the more extensive NOESY constraint list used to determine the 3mef [8] reveals some additional constraint violations by the CspA structure; however this work was performed using a different CspA construct, and the overall structure quality scores (Table 2) for this published “full blown” NMR structure 3mef are much poorer than either the CS-Rosetta structure or the 1mjc. Indeed, structure quality scores for the published NMR structure (Table 2), particularly the Procheck (all dihedral) and Molprobity Clash scores, are well below the threshold (Z = −5) considered to be acceptable for a good quality NMR structure [3]. Based on its closer agreement with 1mjc, particularly for core sidechain atom positions, and better overall structure quality scores, it appears that the CS-Rosetta NMR structure of CspA (Fig. 3) is in fact more accurate than the previously published “full blown” NMR structure 3mef [8].


Our results demonstrate a general, rapid, and simple approach for determining high quality 3D structures of small (<10 kDa) proteins, in fully automated fashion, with accuracies rivaling structures determined using more extensive NMR methods. In particular, the core sidechain packing, determined by the Rosetta potential energy function, is quite accurate based on comparison with the crystal structure, despite the fact that no sidechain constraints are used in these calculations. Similar results were observed in CS-RDC-Rosetta calculations with larger proteins [26].

The time spent on CS-Rosetta runs depends on the number of Rosetta models generated and the number of CPUs used for the CS-Rosetta structural generation. In our study, we generate 10,000 models initially and we use 20 CPUs for the calculation. The process takes about 3 days. The time saved for structure determination using our proposed methods relative to conventional methods includes the time required for collection of spectra required for determining side-chain assignments and NOESYs, time required to process and analyze these spectra, as well as the time required for structure calculations and refinement which are the time-limiting steps for NMR structure determination. Our proposed approach only requires triple resonance NMR experiments for backbone assignments followed by automated analysis of backbone resonance assignments. Once most of the backbone resonance assignments are determined, these chemical shift data are submitted to CS-Rosetta. This approach, which is largely automated, not only saves time in data collection and analysis, but can generate a high-quality protein structure.

NOESY data and protein ILV methyl protonation are not required in the strategy proposed in this paper for small protein structure determination. NOESY data on the ILV-labeled sample was only used for cross validation of the CS-Rosetta structure. However, CS-Rosetta calculations do not always converge, even for small protein structures, and NOESY data for the perdeuterated ILV methyl protonated protein sample can be used if necessary together with CS-Rosetta if the chemical shift data alone do not provide a converged structure.

Our work further demonstrates that 2H, 13C,15N-enriched protein samples made by the cSPP system at a drastically reduced cost and purified with a single-step Ni–NTA affinity chromatography, allow data collection and automated analysis of backbone 1HN, 15N, 13Cα, 13C′, as well as sidechain 13Cβ, assignments in only a few days. In related work, we have demonstrated the combined use of CS-Rosetta and automated NOESY analysis to provide more accurate NOESY cross peak assignments, beginning with extensive backbone and sidechain assignments [25], and the use of CS-RDC-Rosetta with manually assigned NOESY-based constraints to generate good quality structures of larger (10–25 kDa proteins) [26]. The present study is the first example of applying CS-Rosetta for rapid fully-automated NMR structure determination of small proteins, a unique application that provides a new and general approach for obtaining 3D structures of small proteins. The CspA structure obtained rivals the best NMR structures available to date for CspA using conventional methods, even those utilizing extensive sidechain proton assignments [8]. This approach has tremendous value in preparing protein samples and generating assignments and structural information for small molecule screening studies, as well as in high-throughput structural and functional genomics studies.