Conservative correction policy
A major consideration in our development of the Autofix method is ensuring that proposed changes will reliably and robustly be actual improvements. We feel it is acceptable to fail to make all possibly valid corrections (false negatives), but not acceptable to suggest any significant number of changes that either we or the structural biologist would consider clearly wrong upon detailed examination (false positives). Therefore, aspects such as the cut-off levels for acceptance criteria are chosen conservatively. A significant set of proposed corrections for each residue type were visually inspected, which revealed that while outlier candidates were nearly all corrected accurately, many corrections of borderline candidates were dubious and should be rejected. The procedure described here, which only attempts to fix initial rotamer outliers and only accepts results with rotamer score >1%, other scores improved or maintained, and χ angle shifts >90°, does succeed in meeting these goals, while achieving a very useful level of corrections. That is also true for our long-established correction of Asn, Gln, and His sidechain flips in Reduce  or in MolProbity . The Autofix methods will gradually be strengthened to cover more cases, as extensions are developed and tested that can do so robustly.
Even with rotamer search, real-space refinement, and stringent requirements for acceptance, a small number of Autofix corrections are found to be false positives. These false positives are seen primarily for Arg residues, from two quite different causes. First, the large hydrogen-bonding capacity of Arginine can sometimes stabilize a sterically unfavorable conformation, which occurs in the starting position of a handful of “corrected” Arg residues (e.g., 1ylt Arg A 256). Neither Coot’s scoring method nor our current acceptance criteria consider H-bonds, and if the starting position is bad enough to be a rotamer outlier then the protocol will be forced to choose some other alternative. Secondly, some surface Arg residues are fit dubiously to weak density, where neither the original nor the corrected residue provides a good answer for a sidechain that almost certainly has multiple conformations.
Additionally, some corrections show a drop rather than in increase in RSCC. This is most often seen for Leu corrections at lower resolution (e.g., 1gpz Leu B 595 at 2.9 Å or 1v4t Leu A 75 at 3.4 Å), where the truncated nubbin of density is best fit by the curled-over, backward-fit conformation, while the correct rotameric fit sticks the CD1/CD2 atoms slightly out of the density. These cases are almost certainly true corrections but cannot be fully substantiated by the sparser, low-resolution data. As density becomes contracted and less clear, Autofix is unable to accurately correct the problem, but such corrections can be done by hand. When a closely related high resolution structure is available it confirms such corrections, such as the nearly 180° backward misfit Leu 68 and 110 of both β chains in the 3.5 Å resolution 2qls hemoglobin, confirmed as standard rotamers in the 1.25 Å 2dn2.
Prevalence of systematic errors
Within our set of 945 PDB files, Table 1 shows that there are a large number of candidate misfit residues for Leu, Thr, Val, and Arg with outlier rotamer scores <1%. For Leu, there are 4,660 candidate outliers, accounting for 8.8% of the total 53,104 Leu residues in the whole set. The 2,037 corrected Leu outliers account for 3.8% of the total Leu residues in the dataset, or on average more than 2 corrected Leu residues per PDB file. A specific example is shown in Fig. 1. While fewer Val and Thr are corrected (1.3% and 1.7%, respectively), there is an average of more than one Val or Thr residue corrected per PDB file. Added to the high rate of Asn/Gln/His flips, this consistent prevalence of rotamer outliers is indicative of a widespread but largely correctable problem in deposited crystal structures. In fact, 99% of the 945 files had at least one Autofix correction. The remaining 1% (10 files) contained no outlier candidates to try correcting. They are all small, high resolution structures: 1b2a (1.7 Å), 1kr0 (1.92 Å), 1w5u (1.14 Å), 1wtf (1.60 Å), 1xyi (1.45 Å), 1ynv (1.2 Å), 1ys0 (2.00 Å), 1zgx (1.13 Å), 2blv (1.2 Å), and 2c9v (1.07 Å).
The effect of resolution, both on the number of rotamer outliers and on the success of their correction, is an important consideration, documented in Table 1. As resolution decreases, so does the distinct shape of electron density as well as the information content of the diffraction data used in standard crystallographic refinement. Looking at misfit Leu residues, the 364 PDB files with better than 2.0 Å resolutions contain only 497 candidate outliers, for an average of 33 per 1,000 Leu residues. For 2.0–2.5 Å resolution structures, the average jumps to 85 outliers per 1,000 residues; for 2.5–3.0 Å to 112 per 1,000 residues; and for 3.0 Å and poorer all the way to 135 per 1,000 residues. Arg shows a similar pattern. Val and Thr exhibit a lower overall outlier prevalence in this dataset at high resolution, but similar significant increases at poorer than 2.5 Å resolution, with Thr increasing from 80 outlier candidates per 1,000 Thr residues from 2.0 to 2.5 Å resolution to 150 per 1,000 residues >3.0 Å. The ill-defined “blobs” of density at low resolution are less effective for real-space refinement, as they are unable to offer meaningful scoring differences for different proposed rotamer states, so that the ratio of accepted/outlier Autofix corrections goes down. The combined effect of these two trends is that the overall rate of successful corrections is highest in the middle resolution ranges. Interestingly, 52.0% (23,839/45,842) of crystal structures deposited in the PDB as of 10/14/2008 fall within a middle resolution range (1.8–2.5 Å), and the majority of the remaining structures (27.9%: 12,819/45,842) are higher resolution and will have less frequent errors.
Validation by hydrogen-bonding in Thr and Arg
Systematically misfit Thr or Arg residues often have unsatisfied H-bonds and their satisfaction after correction can be taken as an independent validation criterion, since H-bonds were not used in the current protocol. The 1YHQ Thr sidechain shown in Fig. 2 originally had an unsatisfied H-bond as well as a serious clash with the RNA backbone. After correction, it has an equivalent fit to the density, eliminates the clash, and satisfies the H-bond. The guanidinium group at the end of Arg sidechains is also asymmetrical, so that its H-bond interactions are quite different if it is fit flipped-over, producing important disruptions to interactions at molecular interfaces and making its correction an important issue. The examined sample of Arg corrections showed improved H-bonding, often quite dramatically so.
Improvement with refinement
For optimal structure correction, a full round of refinement following Autofix correction is necessary. As shown in Arendall et al. , rotameric correction as part of the refinement pipeline improves R and Rfree values and correlation scores. It is important to consider that for that study, corrections and refinement were done in a self-consistent manner, which is limited in this case, as we do not know the complete details of the refinement methods used in each of our dataset structures. We believe that the use of Autofix as part of a self-consistent refinement strategy would yield similar improvements.
Causes of rejected flips
There are a number of reasons that proposed flips are rejected. A primary problem is sidechains with insufficient electron density for valid real-space refinement. In such cases, Coot may either fail to find a changed conformation or may suggest an incorrect rotamer due to an insignificant difference in fit. The latter cases generally but not always produce more all-atom clashes with surrounding groups, larger Cβ deviations, or unfavorable Ramachandran values, so that Autofix can usually correctly reject the proposed change. To ensure that Autofix never accepts a fix without robust real-space evidence, future versions will incorporate a separately calculated real-space correlation value (used but not reported by Coot) as a criterion for acceptance.
A second problem, especially at lower resolution, is other structural errors in the vicinity of the residue of interest. Because Autofix works through candidates one at a time, if a rotamer is corrected but another residue near it is wrong, increased clashes often occur which cause a false rejection of the fix. We cannot accept such changes under our goal of doing no additional harm to the structure, since the false rejections cannot be distinguished from true ones. We plan eventually to treat such interactions combinatorially.
In Fig. 2, note the local backbone movement required to fit the flipped residue into density, which is describable as a “backrub” motion . It is needed because the backward sidechain caused refinement to distort bond angles and shift backbone in order to keep the misfit OG1 and CG2 atoms in density. This example highlights the importance of the two steps of real-space refinement in the Coot component of the Autofix protocol (see Methods), which allowed the necessary motion in this and many cases. For branched-Cβ sidechains in general, even the pre-refinement step does not always improve the direction of the Cα–Cβ bond enough for the correct rotamer to lie in density, so the procedure then fails to identify the flip. Future implementations may therefore incorporate more explicit backrub-type motions.
As a final comment, one should keep in mind that most but not all rotamer outliers are incorrect. About 0.5–1% of sidechains genuinely occupy somewhat strained, outlier conformations (e.g., several hydrogen bonds holding an eclipsed χ angle in a needed position)  that are well supported by the electron density and should not be “fixed” by a properly conservative procedure. However, for any pair of atoms that have an all-atom steric clash ≥0.5 Å, one or both of them must be positioned incorrectly. Bond angle outliers >5σ are nearly always incorrect, and are often diagnostic of distortion produced by refinement compensating for groups trapped in the wrong local minimum conformation.