Particle segmentation algorithm for flexible single particle reconstruction
- 285 Downloads
As single particle cryo-electron microscopy has evolved to a new era of atomic resolution, sample heterogeneity still imposes a major limit to the resolution of many macromolecular complexes, especially those with continuous conformational flexibility. Here, we describe a particle segmentation algorithm towards solving structures of molecules composed of several parts that are relatively flexible with each other. In this algorithm, the different parts of a target molecule are segmented from raw images according to their alignment information obtained from a preliminary 3D reconstruction and are subjected to single particle processing in an iterative manner. This algorithm was tested on both simulated and experimental data and showed improvement of 3D reconstruction resolution of each segmented part of the molecule than that of the entire molecule.
KeywordsSingle particle reconstruction Cryo-EM Particle segmentation Local reconstruction
Single particle cryo-electron microscopy (cryo-EM) is a powerful structural biology tool being developed in the past several decades and becoming more matured in recent years (Bai et al. 2015a; Carazo et al. 2015; Cheng 2015; Cheng et al. 2015; Nogales and Scheres 2015). By quickly freezing biological macromolecules in a thin film of vitreous ice, cryo-EM preserves the molecules as they are in solution immediately before the freezing. This stipulates cryo-EM the unique advantage to reveal the molecular structure in their close-to-native states and the possibility to examine structures in action. The most recent development of new direct-electron detection device and image processing algorithms has dramatically boosted the capability of this technique so that three-dimensional (3D) structures of biological macromolecules can be solved to near atomic resolution from averaging many individual images without crystallization (Bai et al. 2013; Liao et al. 2013; Bartesaghi et al. 2015). This has led to a resolution revolution of the cryo-EM technology and is transforming the field of structural biology (Kuhlbrandt 2014).
Despite the major technical progresses, compositional and conformational heterogeneity still imposes a major obstacle on high-resolution single particle cryo-EM structural determination. Different from crystallography where the macromolecules are constrained within a crystalline lattice, single particle molecules in solution are more flexible in changing their ternary and quaternary structures which may cause conformational or compositional heterogeneity among the molecules. In cases where the heterogeneity is relatively subtle and localized, single particle 3D reconstruction of a macromolecule complex is an averaged structure of the common region of all the molecules but with a low resolution at the flexible region. Algorithms based on multivariate statistical analysis were developed to classify molecules into different states (van Heel and Frank 1981). The maximum likelihood algorithm was developed to classify molecule images with low signal to noise ratio (Scheres et al. 2007). Methods such as random conical tilt and orthogonal tilt reconstruction were developed to obtain 3D models of different molecular states (Radermacher et al. 1987; Leschziner and Nogales 2006). Using statistical classification approach, these algorithms sort the heterogeneous particle images into different classes based on the level of similarity among them and treat each class of images as a homogeneous set of molecules. The classification thus generates multiple structures each reflecting a different state of the biological sample in vitreous ice. The above methods all assume common structure within the same class of molecules. While these methods have been proved to be very successful on the structural studies of many macromolecular complexes and revealed important mechanistic insight to the conformational switch of important molecular machines, there are still a lot of complexes with more complicated conformational heterogeneity that cannot be easily studied. In a severe conformational heterogeneity such as a global variation within the molecule or a continuous domain–domain movement at large scale, a correct 3D reconstruction cannot even be obtained using the conventional classification approach.
Several algorithms without classification strategy have been introduced to single particle analysis of macromolecular complexes with continuous conformational changes. These include the normal-mode analysis (Ma and Karplus 1997; Brink et al. 2004; Ma 2005; Jin et al. 2014), energy landscape analysis and manifold embedding (Dashti et al. 2014; Frank and Ourmazd 2016), 3D variance analysis (Penczek et al. 2006; Zhang et al. 2008), covariance analysis (Anden et al. 2015; Katsevich et al. 2015; Liao et al. 2015), and eigen analysis-based methods (Penczek et al. 2011; Tagare et al. 2015). These algorithms can provide quantitative description of the conformational variation mode in the complex to guide further processing of the dataset. More recently, local masking technique was used in reconstructing the rigid body within a complex or further classifying local subtle conformational heterogeneity in a focused region of the molecule. This has been quite successful in improving the local resolution significantly of different rigid portions within a complex (Amunts et al. 2014; Brown et al. 2014; Chang et al. 2015; Yan et al. 2015).
Further implementation of algorithms that can separate the relative mobile parts within a flexible molecule and reconstruct the different parts separately will be more useful. Because the electron micrograph of a molecule reflects the 2D projection of the molecule along the electron beam illumination direction, different parts of the complex superimpose with each other in the 2D image. So simply masking the 2D image or 3D model does not eliminate the influence by the signal of the mobile portion on the 3D reconstruction. A clearer way should be to remove the signal of mobile portion from the 2D image entirely so a reconstruction of the interesting part can be done with greater fidelity. Such kind of separation has been realized in Fourier–Bessel space for the reconstruction of a double-layered helical assembly of tubulin (Wang and Nogales 2005). Recently, separation and reconstruction of icosahedral viral genomic structure from the capsid structure were achieved by subtracting the capsid signal from the raw images of virus particles (Liu and Cheng 2015; Zhang et al. 2015). In our most recent work, we have developed a segmentation algorithm to separate the SNAP–SNARE structure from 20S particle by subtracting the hexameric NSF complex in the raw image of 20S particle and thus overcome the symmetry mismatch and severe conformational heterogeneity in the 20S particles. This allowed us to reconstruct the SNAP–SNARE complex with higher resolution than using the whole particle images (Zhou et al. 2015). At nearly the same time, Bai et al. (2015b), Ilca et al. (2015), and Shan et al. (2016) developed similar algorithms independently. A recent development in RELION software (Scheres 2012a, b) makes it possible to subtract certain portions within a complex from the raw 2D images without introducing major artifact. This allowed much better classification of the interested portion to further sort the heterogeneous particle images to even higher resolution than the overall average (Bai et al. 2015b).
In this work, we further expand the particle segmentation algorithm that we have developed for the analysis of 20S particles to other samples. The successful application of this algorithm to different systems with conformational heterogeneity indicated its generality. We also incorporated the image subtraction algorithm at micrograph level so it not only overcomes the potential artifact from interpolation and contrast transfer function, but more importantly also provides new opportunities to analyze micrographs of crowding particle images.
Theory and algorithm
The different combinations of E1 and E2 define a heterogeneous conformation among the molecules. Our goal is to determine the high-resolution structure of the two rigid bodies, V1 and V2. During the process, we should also be able to reveal all the E1 and E2 combinations therefore the conformational distribution within the specimen.
More specifically, we can first subtract V2 and generate images for V1. Then we can get an updated volume and Euler matrix for V1 with which we can generate images for V2. These procedures can be iterated between V1 and V2 for several rounds until convergence (Fig. 1B).
Segmentation algorithm improves the resolution of simulated 20S particle dataset
Parameters for micrograph simulation
B factor (Å2)
Pixel size (Å/pixel)
σ of translation between SS and DD (pixel)
σ of Euler angle difference between SS and DD (°)
−1 to −3
1000 ± 50
50 ± 2
Summary of 3D reconstruction
Resolution after post-processing
Box size (pixel)
Whole volume of simulated particles
Whole volume of simulated particles with SS mask
Whole volume of simulated particles with DD mask
SS sub-particles generated with relion_project
DD sub-particles generated with relion_project
Segmented SS particles round I
Segmented DD particles (box size 160)
Segmented DD particles (box size 256)
Segmented SS particles round II
Whole volume of 70S ribosome
Whole volume of 70S ribosome with 50S mask
Whole volume of 70S ribosome with 30S mask
50S ribosome generated with relion_project
30S ribosome generated with relion_project
Segmented 30S subunit
Segmented 50S subunit
Whole volume of influenza RdRP tetramer
Whole volume of influenza RdRP tetramer with dimer mask
Segmented influenza RdRP dimer
Influenza RdRP dimer generated with relion_project
It is notable that the image box size of the windowed particle has an effect on the reconstruction resolution of DD particles. The 3D reconstruction resolution of the segmented DD with a box size of 160 and 256 pixels was 3.52 Å and 3.41 Å, respectively (Fig. 3G, I, J, Table 2). Because the signal of particles is proportional to the molecular weight and the noise is proportional to the box size (Rosenthal and Henderson 2003), using too large box size will decrease the signal to noise ratio of particles. But on the other hand the too small box size results in too large reciprocal pixel size, which may limit the CTF correction and interpolation in Fourier space (Penczek et al. 2014). The optimal box size used for 3D reconstruction may be variable for particles with different sizes and/or symmetry.
Segmentation algorithm improves the reconstruction quality of influenza RdRP
Segmentation algorithm calculates conformational flexible distribution of 70S ribosome
Because we were using segmentation reconstruction, we could calculate the relative rotating angles between 30S and 50S subunits for each individual particle by comparing their Euler angles after the reconstructions. The distribution of the rotation angles showed two peaks, in agreement to the fact that there are two major populations of conformers in the ratchet switch of the 70S ribosome (Fig. 5F). When we aligned the two classes of 3D reconstructions of 70S ribosome based on the 50S subunit, the 30S subunit has a rotation of about 3.8°(Fig. 5G).
Direct segmentation of particle images from raw micrographs
Sample heterogeneity is still a major technical obstacle in single particle cryo-EM 3D reconstruction. The source of heterogeneity includes but is not limited to the following aspects: compositional diversity and conformational flexibility. The conformational variation that molecules undergo can be continuous or discrete. Compositional heterogeneity and conformational heterogeneity with discrete states usually lead to a finite number of classes that current 3D classification algorithms can handle reasonably well. In contrast, continuous conformational change within a molecule would lead to an almost infinite number of classes.
3D refinement and reconstruction with an adaptive local mask around the relatively rigid portion of the molecule has shown to be successful in some cases to solve high-resolution structure of certain part of the whole molecule. But in most cases, the overlapped structures in 2D projections interfere correct alignment of the common portion of the molecule. Using the particle segmentation algorithm, we can separate the relatively mobile portions within a molecule image and thus perform single particle analysis of the separated portions without the interference from each other. The image after segmentation has much cleaner signals for more precise alignment and further analysis. Our example of the 20S particle analysis presented in this work indicates the particular advantage of segmentation algorithm in analyzing complexes with internal symmetry mismatch. The further refinement with local angular searching may result in artifact in some cases. In the example of simulated 20S particle, the asymmetric feature of SS part was lost after local angular searching. However, this feature can be well recovered by the segmentation algorithm.
In our segmentation algorithm, after projecting the 3D partial density, it is critical to subtract the projection from raw particles with correct operation. There have been several attempts (Wang and Sigworth 2009; Bai et al. 2015b; Ilca et al. 2015; Liu and Cheng 2015; Zhang et al. 2015) to subtract the projection of a 3D reconstruction or 3D model from raw particles. We found that the absolution gray scale feature of the 3D reconstruction within RELION makes the subtraction easy and intuitive. This operation, which removes most of the low frequency signals of one macromolecule part from the raw particle images, immediately allows the alignment of the other macromolecule part more precisely. This is proved by the fact that reference-free 2D classes of segmented particles show more detailed features than the entire particle but are free of contaminated features from the subtracted references. Furthermore, while we can use the iterative approach (Fig. 1B) to improve the segmentation and alignment of each portion of the molecule, at most two iterations are enough to result the convergence of the solutions in practice (Table 2). This proved that our approximation in Eq. 7 is reasonable for practical purpose.
Besides solving the high-resolution structure of each compositional rigid parts of a complex, the segmentation algorithm provides additional information of the spatial relationship between the rigid parts within each individual particle image. Although in the examples of this work, we mainly focused at the molecules made of two rigid components, the concept can be extended to molecules composed of three or even more rigid bodies that are mobile to each other. Such information of the whole dataset can then be summarized for statistical analysis to reflect the distribution of various conformational states within the flexible molecule. The conformational distribution is of important biological relevance beyond what the static structure can provide, thus realizing the unique power of single particle analysis.
Materials and methods
Generation of simulated dataset
Previous works (Zhao et al. 2015; Zhou et al. 2015) showed that human 20S particle functioning in membrane fusion processes in eukaryotic cells is composed of two parts relatively flexible to each other: the SS complex with pseudo four-fold symmetry and the hexameric NSF complex. We used the 20S particle as a testing model to generate simulated dataset. For convenience of the simulation, we built a model of the SS complex without symmetry and a hexameric model of DD imposed with a C6 symmetry using the Modeller software package (Eswar et al. 2006). The two atomic models were converted to MRC format with e2pdb2mrc.py in EMAN2 package (Tang et al. 2007). The two MRC volumes with voxel size of 1.32 Å representing the SS and DD portions of 20S particle were then assembled together to resemble the overall architecture of 20S particle. Heterogeneous conformational states were generated by randomly tilting the two portions independently with a standard deviation of 10° for all three Euler angles and translating the two parts with a standard deviation of 2 pixels in coordinates. Subsequently, we used the full set of simulated 3D MRC volumes to generate simulated electron micrographs using a program genRandomImage.py written with EMAN2 package. A total of 48 simulated electron micrographs each containing 150 particle images at random orientations and locations were generated. In each of these micrographs, CTF-independent Gaussian white noise was superimposed and CTF-dependent water noise was generated by randomizing the Fourier phase of the atomic model of water molecules simulated with NAMD and VMD (Humphrey et al. 1996). The noise level and CTF parameters in these simulated micrographs were chosen to mimic the real micrographs obtained by a Gatan K2-Summit electron counting camera on a Titan Krios microscope operated at 300 kV. More details of the parameters for simulation are listed in Table 1.
Processing of simulated dataset
A total of 7200 SS/DD particle images were extracted from simulated micrographs with a box size of 256 pixels. These particle images were first 3D refined with RELION 1.3 against an initial model of 20S particle low-pass filtered at 60 Å resolution. As a control, we refined the 3D reconstruction with local angular search range of 30°, during which a SS or DD mask was applied, resulting in a SS or DD volume, respectively. As another control, we also generated SS or DD sub-particles with relion_project and performed 3D auto-refinement with these sub-particles with a local angular search range of 30°. Alternatively, using our implemented segmentation algorithm, the SS particles were segmented by subtracting the DD density from the whole particle images. The segmented and re-windowed SS particles with a box size of 160 pixels were subjected to 2D classification to select the good SS particle images for further 3D refinement in RELION 1.3. After the 3D refinement of segmented SS particles, DD particles were segmented and re-windowed from the whole particle images by subtracting the SS density calculated from the new SS 3D volume. The DD particle images were then subjected to 2D classification and 3D refinement, resulting in an updated DD 3D volume, which was then used for the next cycle of SS segmentation and 3D reconstruction.
Processing of influenza RdRP
The 3D reconstruction of influenza RdRP tetramer and dimer was described previously (Chang et al. 2015). The RdRP dataset from the previous work was used in this study. Each raw particle image containing a tetramer has a pixel size of 1.32 Å and a dimension of 256 pixels. Two RdRP dimer particles were segmented and re-windowed from each raw tetramer particle image with a box size of 180 pixels. Therefore, the particle number of RdRP dimer was doubled after segmentation from the tetramers. The segmented RdRP dimer particles were subsequently used for 2D classification and 3D refinement analysis. As a control, we also generated dimer sub-particles with relion_project and performed 3D auto-refinement with all of the dimer sub-particles.
Processing of 70S ribosome
We used a cryo-EM dataset of 70S ribosome comprising 68,543 particle images with box size of 280 pixels and a pixel size of 1.32 Å from Prof. Ning Gao’s group. These micrographs were taken from a Titan Krios microscope equipped with a Gatan K2-Summit electron counting camera. We firstly reconstructed a 3D volume of the entire 70S ribosome following the conventional way. This 3D reconstruction was further refined with a local angular search range of 15°, during which a 30S or 50S mask was applied, resulting in the 3D map of 30S or 50S subunit, respectively. We then segmented the 30S subunit from the dataset with a box size of 280 pixels by subtracting the 50S subunit with the segmentation algorithm. The segmented 30S particles were subjected to 2D classification to select good particles for further 3D auto-refinement. The 50S subunit was subsequently segmented from the 70S ribosome images by subtracting the 30S signal using the segmentation algorithm. The segmented 50S subunit images were then refined to reconstruct a 3D volume. As a control, we also generated 30S or 50S sub-particles with relion_project and performed 3D auto-refinement with these sub-particles. The rotating angles between segmented 30S and 50S subunits were calculated with a program CompareDataStars_data.py written with EMAN2 package.
The micrograph of 20S particle was obtained as described in our previous paper (Zhou et al. 2015). 2D classification, 3D reconstruction, and auto-refinement were performed with RELION 1.3. CTF parameters were determined with CTFFIND3 (Mindell and Grigorieff 2003). Reconstruction resolution was estimated with high-frequency noise substituted gold-standard FSC (Scheres and Chen 2012; Chen et al. 2013). Local resolution was calculated with ResMap (Kucukelbir et al. 2014). Corresponding masks were also applied during the 3D auto-refinement of the segmented particles if not particularly indicated. 3D volume segmentation and atomic model docking were performed with UCSF Chimera (Pettersen et al. 2004). The 3D refinements mentioned above are summarized in Table 2.
Open access. The software and scripts used in the work can be accessed via https://github.com/zhouqiang00/Particle-Segmentation. We thank Prof. X. Li, S.-F. Sui for helpful discussions, Dr. D.P. Sun and Dr. J. Wang for kindly providing the RdRP dataset, and Prof. N. Gao and Dr. Y.X. Zhang for kindly providing the ribosome dataset. This work was supported by Grant (2016YFA0501100 to H.W.) from the Ministry of Science and Technology of China and Grant (Z161100000116034 to H.W.) from the Beijing Municipal Science & Technology Commission. Q.Z. was supported by CLS Postdoctoral Fellowship Foundation.
Compliance with ethical standards
Conflict of interest
Qiang Zhou, Niyun Zhou, and Hongwei Wang declare that they have no conflict of interest.
Human and animal rights and informed consent
This article does not contain any studies with human or animal subjects performed by any of the authors.
- Anden J, Katsevich E, Singer A (2015) COVARIANCE ESTIMATION USING CONJUGATE GRADIENT FOR 3D CLASSIFICATION IN CRYO-EM. Proceedings. IEEE Int Symp Biomed Imaging 2015:200–204Google Scholar
- Chen S, McMullan G, Faruqi AR, Murshudov GN, Short JM, Scheres SH, Henderson R (2013) High-resolution noise substitution to measure overfitting and validate resolution in 3D structure determination by single particle electron cryomicroscopy. Ultramicroscopy 135:24–35CrossRefPubMedPubMedCentralGoogle Scholar
- Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A, (2006). Comparative protein structure modeling using Modeller. Current protocols in bioinformatics/editoral board, Andreas D. Baxevanis… [et al.] Chapter 5, Unit 5.6Google Scholar
- Humphrey W, Dalke A, Schulten K (1996) VMD: visual molecular dynamics. J Mol Graph 14(33–38):27–38Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.