phenix.mr_rosetta: molecular replacement and model rebuilding with Phenix and Rosetta
- First Online:
- Cite this article as:
- Terwilliger, T.C., DiMaio, F., Read, R.J. et al. J Struct Funct Genomics (2012) 13: 81. doi:10.1007/s10969-012-9129-3
- 1.5k Downloads
The combination of algorithms from the structure-modeling field with those of crystallographic structure determination can broaden the range of templates that are useful for structure determination by the method of molecular replacement. Automated tools in phenix.mr_rosetta simplify the application of these combined approaches by integrating Phenix crystallographic algorithms and Rosetta structure-modeling algorithms and by systematically generating and evaluating models with a combination of these methods. The phenix.mr_rosetta algorithms can be used to automatically determine challenging structures. The approaches used in phenix.mr_rosetta are described along with examples that show roles that structure-modeling can play in molecular replacement.
KeywordsMolecular replacement Automation Macromolecular crystallography Rosetta Phenix
Molecular replacement  is an exceptionally powerful technique for the determination of structures of macromolecules. In molecular replacement a template structure serves as an initial model for the structure to be determined. The orientation and location of the template in the crystallographic cell are found by optimizing the agreement between measured structure factors and those calculated from the placement of the template. Then the placed template is used to estimate the crystallographic phases, allowing calculation of a preliminary electron density map. A new model is then built using this map as a guide.
Molecular replacement accounts for over 70%  of the structures in the Protein Data Bank (PDB, ). Despite this success, molecular replacement is limited to situations where a suitable template structure is available. The template must normally represent a large fraction (usually more than 50%) of the structure and have a core whose atomic coordinates are superimposable within approximately 1.5–2 Å root mean square deviation (rmsd) of the target structure .
There are two steps in molecular replacement where the availability of a sufficiently similar template is crucial. The first is at the stage of finding the orientation and location of the template structure in the asymmetric unit of the structure to be determined. If the template is too different from the structure to be determined, the correct location and orientation may not be identifiable.
The second step that requires a template sufficiently similar to the structure to be determined is the rebuilding of a correctly-placed model. It is not uncommon for molecular replacement to yield a solution that is unambiguous in its placement yet leads to an electron density map that does not give any useful clues as to how to improve the model. In such cases it is again not feasible to proceed with structure determination.
These restrictions on the divergence between template and structure to be determined, along with the wide use of molecular replacement, mean that any improvements in the starting templates for molecular replacement, in methods for finding the location and orientation of the template, in methods for obtaining accurate phases from a preliminary model, or in methods for rebuilding molecular replacement models can substantially increase the number of structures that can be determined by molecular replacement.
There have recently been many important advances in all these areas. Improved starting templates for molecular replacement have been obtained by judicious pruning of parts of models that are less likely to be correct [5, 6, 7], by creating ensembles of templates [8, 9], using normal mode analysis [10, 11], and by systematic searches using many or all of the proteins in the Protein Data Bank [12, 13]. Improved methods for finding the placement of the template include the use of likelihood in scoring of placements and the development of approximations to the likelihood function that are accurate yet much more rapid . Improvements in methods for obtaining phase information from a preliminary model include developments in algorithms for creating maps that optimally show unmodeled density  and developments in density modification procedures that reduce model bias . Improvements in model-building algorithms include the use of iteration between model-building, refinement and map calculation or density modification [17, 18, 19] and the development of methods that can be used at resolutions lower than 3 Å [20, 21, 22, 23, 24, 25, 26, 27].
A recent approach to obtaining improved templates for molecular replacement is to apply tools from the structure modeling field before or after placing the template in the crystallographic cell [28, 6, 29, 30]. The key idea in this approach is that crystallographic model-building and structure modeling use fundamentally different sources of information so that combining them can yield a more powerful approach to model-building than either alone.
Complementarity of model-building in macromolecular crystallography and in structure-modeling
Crystallographic model-building (Phenix)
Interpretation of patterns of density
Creating physically plausible models
Density search for secondary structure
Ab initio modeling or homology modeling
3-residue fragment library
3- and 9-residue libraries
Fit to density
Rosetta force field (optional density term)
Rosetta force field (optional density term)
Crystallographic model-building does make use of force fields as well. After model-building, crystallographic structures are refined using a combination of the agreement with crystallographic data and a simple set of geometric restraints. The restraints used in crystallographic model-building are normally much less sophisticated than the force fields used in the structure modeling field, however. They often do not include electrostatic or hydrogen bonding interactions for example. In contrast to refinement with force fields used in structure modeling, refinement of a structure with geometric restraints in the absence of crystallographic data typically is highly unlikely to converge to near-native conformations.
Qian et al. , Ramelot et al. , DiMaio et al.  and Mao et al.  have shown that Rosetta structure modeling can be used to improve homology models to make them more useful for finding their locations in a crystallographic cell, the first step in molecular replacement. Qian et al.  have shown that in some cases ab initio models created with Rosetta from sequence information alone can be sufficiently accurate to be useful in this step. DiMaio et al.  have shown further that augmentation of Rosetta structure modeling with pseudo-energy terms representing fit of model to electron density can greatly improve the rebuilding of models in the second key step of molecular replacement.
The procedures used by DiMaio et al.  for combining Rosetta structure modeling and crystallographic model-building require considerable manipulations and familiarity with both crystallographic and structure-modeling tools. To make the use of these procedures more accessible to a broader range of structural biologists, we developed software in the Phenix crystallographic computing environment  that provides simultaneous access to Rosetta structure modeling and Phenix crystallographic model-building . This phenix.mr_rosetta software allows a user to identify suitable templates for molecular replacement available in the PDB, edit them to match the target sequence, optionally refine their structures with Rosetta prior to molecular replacement , carry out molecular replacement, and rebuild the resulting models with Rosetta  and Phenix autobuilding  algorithms. Alternatively the same software can begin with a partial or complete model already placed in a crystallographic cell and rebuild the model with Rosetta and Phenix autobuilding approaches. These procedures can be carried out using simple keyworded scripts that specify the input data and the procedures to be used. Here we describe the methods used in phenix.mr_rosetta and present examples that help show how the approach works.
Steps in molecular replacement and model rebuilding by phenix.mr_rosetta
The basic data required to run the phenix.mr_rosetta procedure consists of the sequence of the structure to be determined and the measured crystallographic structure-factor amplitudes for this structure. Additionally either a file from the hhpred server (http://toolkit.tuebingen.mpg.de/hhpred; ) listing similar proteins in the PDB and their alignments, or one or more templates edited to match the target sequence are required. For loop-building in Rosetta, files containing 3-residue and 9-residue fragments from the PDB tailored for the target protein are also required. These fragments can be obtained from the Robetta fragment server (http://robetta.bakerlab.org/fragmentsubmit.jsp) .
The overall procedure used in phenix.mr_rosetta consists of six steps. These are (1) downloading suitable templates and editing them to match the target sequence, (2) optional optimization of the models with Rosetta without using X-ray data, (3) placement of templates using molecular replacement, (4) refinement and calculation of density-modified electron density maps, (5) model rebuilding with Rosetta including density information, and (6) model rebuilding with phenix.autobuild.
Once the entire cycle of 6 steps has been carried out, a partially or completely built model may be obtained. If all chains in the model are found in the molecular replacement step (step 3) but the model is not fully rebuilt after carrying out these steps, then steps (4–6) of this procedure can be iterated to complete and further improve the resulting models. Alternatively, if some chains in the model are not found in molecular replacement, those that are found can be rebuilt in steps (4–6). Then the resulting model can be used as a fixed model for another molecular replacement attempt, and the resulting model can be rebuilt as before. The six steps are described in more detail below.
Downloading suitable templates and editing them to match the target sequence
The simplest starting point for phenix.mr_rosetta is a list of proteins in the PDB that are likely to have similar structures to the target protein. The hhpred server (http://toolkit.tuebingen.mpg.de/hhpred)  provides a rapid analysis of homologous sequences that are present in the PDB and lists these PDB entries along with their sequence alignments to the target structure. If the resulting summary file is supplied, phenix.mr_rosetta will use the tool phenix.mr_model_preparation to download a specified number of these PDB entries and edit them to match the sequence of the target protein. These edited templates can then either be the starting points for structure optimization by Rosetta or serve as search models for molecular replacement.
This simple procedure is limited to structures that can be represented by a single template from the PDB. Normally this means that it is suitable for structures with a single type of polypeptide chain. Structures that contain several different chains or chains that require several templates to be represented can be built with phenix.mr_rosetta but the initial molecular replacement steps must be carried out separately. The tool phenix.mr_model_preparation can be used to download and edit multiple templates and the molecular replacement tool phenix.automr can then be used to carry out molecular replacement with Phaser  to place and combine these templates. Then any number of the resulting potential molecular replacement solutions (placed models) can be used as the starting point for phenix.mr_rosetta beginning in step (4) below.
Optional optimization of the models with Rosetta
Once a template structure is available, Rosetta modeling tools  can optionally be applied to remodel the template. The information that is available for this remodeling is the sequence alignment between the template and the target molecule and the starting structure of the template. Rosetta can be used to rebuild the template, making its structure more compatible with the sequence of the target molecule and creating new chains for any gaps where the template did not match the target sequence. This process is carried out without reference to any crystallographic data. Normally 1,000–2,000 Rosetta models are created and the top-scoring model (based on the standard Rosetta energy function) is used as a search model in the molecular replacement step.
Placement of search models using molecular replacement
Once search models are available, molecular replacement is carried out using the crystallographic data along with each search model in turn. In cases where the size of the asymmetric unit of the crystal can accommodate more than one copy of the search model, the number of copies of the search model to be found can be specified, or phenix.mr_rosetta can try all plausible numbers of copies. If the number of copies to be found is a multiple of the number of copies of the template in its original crystallographic asymmetric unit, then the corresponding multimer of the template is tested in molecular replacement as well as the monomer. For example, if the template was a dimer in its original crystal form and four copies of the molecule can fit in the asymmetric unit of the target structure, then both the monomer and dimer of this template would be considered in separate runs of molecular replacement by phenix.mr_rosetta.
As there may be several search models and several numbers of copies to be tested, the entire molecular replacement step can produce a number of possible models. These models are rescored with the Phaser log-likelihood scoring procedure  using a fixed value of the estimated rmsd between template and target structure (typically using the smallest value of the estimated rmsd for all the search models considered). The best-scoring model or models are then considered as starting points for map calculation and Rosetta rebuilding.
Refinement and calculation of density-modified electron density maps
Once a potential molecular replacement solution is obtained, it is refined with phenix.refine  and the resulting model is used along with the experimental data to create a model-based density-modified electron density map with Resolve density modification . If more than one copy of the template is present in the molecular replacement model, then non-crystallographic symmetry is included in the density modification procedure .
If the starting point for the entire procedure is a model already placed in the crystallographic cell, then this model is refined and a density-modified map is created in the same way. In this case the model can consist of any number of copies of any number of different chains. This allows the application of later steps in phenix.mr_rosetta to structures that are more complicated than those that can be described with a single sequence.
Model rebuilding with Rosetta including density information
Once a model has been placed in the crystallographic cell and a density map has been created, a Rosetta modeling procedure is carried out in which the Rosetta energy function is augmented with a term describing the fit of the model to the density [37, 28]. This Rosetta modeling procedure can rebuild existing segments of the model as well as build short loops (typically up to 8 residues in length) in gaps of the model. There can still be segments that are missing in the model, however. The resulting models with the best Phaser likelihood scores  are then refined with phenix.refine and used to create a new set of density-modified maps. These maps are averaged to yield a single averaged density-modified map. The refined Rosetta models are then rebuilt one more time with Rosetta using the fit to this averaged map in scoring and the best-scoring models are refined with phenix.refine and used as the starting point for phenix.autobuild automated model rebuilding.
In cases where more than one copy of a chain is present in the model, a single copy is supplied to Rosetta along with the density map corresponding to that chain. Then the resulting Rosetta model is copied to the locations of each of the copies in the original model to form a new Rosetta-based model with idealized non-crystallographic symmetry. In cases where more than one type of chain is present, one copy of each type of chain is supplied at a time to Rosetta. In this way any number of copies of any number of types of chains can be rebuilt with Rosetta including a density term.
Model rebuilding with phenix.autobuild
Model rebuilding is continued using phenix.autobuild. The starting points are the models rebuilt as described above with Rosetta, including a density term in the Rosetta energy. These models are rescored using the Phaser likelihood score . The top models (typically 2) are then rebuilt with phenix.autobuild  based on the crystallographic data and the sequence of the target macromolecule. This automated model-building procedure uses the starting model and any non-crystallographic symmetry to create a density-modified map in the same way as in step (3) above. The density-modified map is used as the basis for crystallographic model-building and recombination of the newly-built model with the existing model, and the resulting model is refined using the crystallographic data . The overall rebuilding procedure is iterated until the R-value comparing the crystallographic data with data expected from the model does not change substantially from cycle to cycle.
In the model-building process some polypeptide chain can be built in regions that are not represented in the Rosetta model used to start the autobuilding process. The sequences corresponding to such chains may be identified by the correspondence between the sequence of the target structure and the shapes of side chains visible in the electron density map along the polypeptide chain. However some chains may be built that cannot automatically be assigned to sequence. These are normally discarded if further cycles of Rosetta model-building are to be carried out as Rosetta model-building requires a knowledge of the sequence of the model to be rebuilt.
At the conclusion of autobuilding, the model with the lowest R-value and the corresponding density-modified map are saved. This model and map can be suitable for further rebuilding with semi-automated tools or re-used as the input for further cycles of Rosetta and phenix.autobuild rebuilding.
Results and discussion
Application of phenix.mr_rosetta to challenging structure determinations
Recently we have used a combination of Rosetta and Phenix to determine 13 new structures that had proven difficult or not possible to determine by a variety of other approaches . The procedures used in phenix.mr_rosetta are automated versions of the procedures used in that work. Here we describe the application of phenix.mr_rosetta to two of these structures to illustrate how the combination of structure modeling and crystallographic model-building can enhance structure determination by molecular replacement.
Structure-modeling of an NMR model prior to molecular replacement
One of the structures determined by a combination of Rosetta modeling and Phenix autobuilding was the structure of the radA intein (structure #12 in [28, 38]). X-ray diffraction data were available to a resolution of 1.7 Å, and a dimer of the molecule is present in the asymmetric unit of the crystal in space-group P212121. Additionally, an NMR model potentially suitable for use in molecular replacement was available (this NMR model was not a final model, but rather one that had been generated from NMR data using rapid automated procedures). Molecular replacement with the automatically-generated NMR model had not succeeded, but the structure could be determined by applying Rosetta structure modeling to the automatically-determined NMR model, choosing the best-scoring Rosetta model, and using that model in molecular replacement followed by Phenix autobuilding [28, 38].
Structure-modeling with density to yield critical improvements in a placed model
A structure for which Rosetta modeling substantially aided crystallographic model-building is the protease XMRV PR , structure #6 in . Efforts to determine this structure by standard molecular replacement approaches had failed, and the structure was determined by a combination of extensive molecular replacement and Rosetta modeling with electron density restraints using X-ray data collected to a resolution of 2 Å . The structure was determined by creating a symmetric dimer from chain A of the HIV-1 protease structure 2hs1  with a sequence identity of 30%. There is a dimer of XMRV PR in the asymmetric unit of the crystal. The location of a symmetric dimer from the template 2hs1 could be determined by molecular replacement, but the resulting model was too different from the template to yield a useful electron density map for rebuilding . Rebuilding this model with Phenix autobuilding failed (with free R-value of 0.57).
Application of phenix.mr_rosetta to 13 previously-solved structures
Structure determinations with phenix.mr_rosetta
Sequence identity (%)
mr_rosetta free R
Autobuild free R (from mr_rosetta placed templates)
Autobuild free R (from DiMaio et al. templates)
These 13 structures and their experimental data have been examined quite extensively  and many different approaches for structure determination have been applied to each of them. In previous work the key question was how much information was contributed by the use of Rosetta modeling. To answer this question, the comparisons among methods all began with templates placed in the crystallographic unit cell using Phaser molecular replacement, and the effectiveness of each method in improving these placed models was examined . Those comparisons showed that for two of the structures (radA intein and pc0265), Rosetta modeling was essential for the first step in molecular replacement to succeed. For 6 additional structures (XMRV PR, thiod, pc02153, tirap,hp3342 and estan) Rosetta modeling with density after molecular replacement yielded substantially better models than the other methods tried. The next-best method for these 6 structures consisted of deformable elastic network (DEN) refinement  followed by Phenix autobuilding. For the final 5 structures (fk4430, bfr258e, niko, fj6376 and cab55348) several methods, including Rosetta modeling with density, could be used to determine the structures.
Table 2 (columns G and H) lists the free R-values obtained by using phenix.autobuild (without including Rosetta structure-modeling) to rebuild the templates placed with phenix.mr_rosetta (column G) or the templates used in DiMaio et al. . Rebuilding the templates used in the previous analysis , with phenix.mr_rosetta (column H) gave results similar to those reported previously . In only 4 of 13 cases did autobuilding yield free R-values of 0.42 or better. This shows the need for other approaches such as Rosetta modeling to improve these models before crystallographic autobuilding could be used.
Some of the template placements found in the molecular replacement step by phenix.mr_rosetta were closer to the final structures than those used in DiMaio et al. . The molecular replacement searches carried out by phenix.mr_rosetta in Table 2 (column F) were in some cases quite extensive. Some used as many as 13 starting templates. Others tested various possibilities for the number of copies in the asymmetric unit or various possibilities for the number of chains from the deposited structures used as templates in the molecular replacement search. The result of the extensive search approach can be seen from column G of Table 2, in which the templates placed by phenix.mr_rosetta were used directly in autobuilding (without the use of Rosetta). Using phenix.autobuild with these templates, 7 of the 13 structures could be determined with free R-values of 0.42 or better. This result is consistent with the known utility of extensive searches with a variety of molecular replacement templates (e.g., [12, 13].
The combination of structure modeling with Rosetta and crystallographic model-building techniques can substantially increase the range of templates that are suitable for molecular replacement . The automated tools in phenix.mr_rosetta simplify the application of these combined approaches by integrating the Phenix and Rosetta algorithms and by systematically generating and evaluating models with a combination of these methods. As demonstrated here, the phenix.mr_rosetta algorithms can be used to automatically determine some of the most challenging structures determined by manual combination of molecular replacement and Rosetta.
The Rosetta and Phenix tools available in phenix.mr_rosetta can address each of the steps in molecular replacement that can fail because of lack of a template that is close enough to the target molecule. In cases where the template is so different that it cannot be successfully placed in the crystallographic cell, phenix.mr_rosetta can use Rosetta modeling to improve the template. As shown above for the radA intein structure, this improvement can be sufficient to allow molecular replacement and the subsequent rebuilding. In cases where the template is similar enough to the target structure for placement of the model, but too different for model rebuilding, phenix.mr_rosetta can use Rosetta, along with an electron density map, to improve the placed template. This was illustrated with the XMRV PR structure determination described above. The key step in this structure determination was the slight improvement in the model obtained by Rosetta rebuilding with density. Without this improvement, the model was too poor to yield a map that is interpretable, but with it the map was improved enough to allow rebuilding. This is the essence of the combination of Rosetta modeling with crystallographic model-building. The combination allows borderline cases, which are apparently quite frequent, to be solved by incorporating some complementary information from the Rosetta modeling that moves the starting model closer to the target structure.
The approaches used in phenix.mr_rosetta are likely to be applicable not only to molecular replacement, as in the examples described here, but also to other situations where model rebuilding is challenging but the sequence of the model being built is known. For example, it is not uncommon for an experimental structure determination to lead to a mostly-complete model that is outside the range of convergence of current refinement procedures. This can occur if the resolution is low or if the quality of the experimental electron density map is too poor to build an accurate model. The sequence associated with the model might be known or a limited number of possibilities for sequence assignment might be obtained. In such cases phenix.mr_rosetta tools may be useful in rebuilding the models, bringing in information from structure-modeling to improve the quality of the models and the resulting electron density maps, and ultimately leading to more complete and accurate models.
The authors are most grateful for the use of crystallographic data supplied by Alex Wlodawer, NCI, Herb Axelrod and Debanu Das, Joint Center for Structural Genomics, Gustav Oberdorfer and Ulrike Wagner, University of Graz, Eugene Valkov, University of Cambridge, Assaf Alon and Deborah Fass, Weizmann Institute of Science, Sergey M. Vorobiev, Northeast Center for Structural Genomics, Hideo Iwai, University of Helsinki, and P. Raj Pokkuluri, Argonne National Laboratory. The authors would like to thank the NIH (PDA, TCT, RJR, DB) and the HHMI (DB) for generous support. RJR is supported by a Principal Research Fellowship from the Wellcome Trust (UK).
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.