Although a significant number of proteins include bound metals as part of their structure, the identification of amino acid residues coordinated to non-paramagnetic metals by NMR remains a challenge. Metal ligands can stabilize the native structure and/or play critical catalytic roles in the underlying biochemistry. An atom’s chemical shift is exquisitely sensitive to its electronic environment. Chemical shift data can provide valuable insights into structural features, including metal ligation. In this study, we demonstrate that overlapped 13Cβ chemical shift distributions of Zn-ligated and non-metal-ligated cysteine residues are largely resolved by the inclusion of the corresponding 13Cα chemical shift information, together with secondary structural information. We demonstrate this with a bivariate distribution plot, and statistically with a multivariate analysis of variance (MANOVA) and hierarchical logistic regression analysis. Using 287 13Cα/13Cβ shift pairs from 79 proteins with known three-dimensional structures, including 86 13Cα and13Cβ shifts for 43 Zn-ligated cysteine residues, along with corresponding oxidation state and secondary structure information, we have built a logistic regression model that distinguishes between oxidized cystines, reduced (non-metal ligated) cysteines, and Zn-ligated cysteines. Classifying cysteines/cystines with a statisical model incorporating all three phenomena resulted in a predictor of Zn ligation with a recall, precision and F-measure of 83.7%, and an accuracy of 95.1%. This model was applied in the analysis of Bacillus subtilis IscU, a protein involved in iron–sulfur cluster assembly. The model predicts that all three cysteines of IscU are metal ligands. We confirmed these results by (i) examining the effect of metal chelation on the NMR spectrum of IscU, and (ii) inductively coupled plasma mass spectrometry analysis. To gain further insight into the frequency of occurrence of non-cysteine Zn ligands, we analyzed the Protein Data Bank and found that 78% of the Zn ligands are histidine and cysteine (with nearly identical frequencies), and 18% are acidic residues aspartate and glutamate.
chemical shift distribution analysis logistic regression analysis Zn-ligated cysteine