Predicting the oxidation states of Mn ions in the oxygen-evolving complex of photosystem II using supervised and unsupervised machine learning

Amin, Muhamed

doi:10.1007/s11120-022-00941-8

Predicting the oxidation states of Mn ions in the oxygen-evolving complex of photosystem II using supervised and unsupervised machine learning

Original Article
Open access
Published: 27 July 2022

Volume 156, pages 89–100, (2023)
Cite this article

Download PDF

You have full access to this open access article

Photosynthesis Research Aims and scope Submit manuscript

Predicting the oxidation states of Mn ions in the oxygen-evolving complex of photosystem II using supervised and unsupervised machine learning

Download PDF

Muhamed Amin ORCID: orcid.org/0000-0002-3146-150X^1,2,3

2563 Accesses
7 Citations
Explore all metrics

Abstract

Serial Femtosecond Crystallography at the X-ray Free Electron Laser (XFEL) sources enabled the imaging of the catalytic intermediates of the oxygen evolution reaction of Photosystem II (PSII). However, due to the incoherent transition of the S-states, the resolved structures are a convolution from different catalytic states. Here, we train Decision Tree Classifier and K-means clustering models on Mn compounds obtained from the Cambridge Crystallographic Database to predict the S-state of the X-ray, XFEL, and CryoEM structures by predicting the Mn’s oxidation states in the oxygen-evolving complex. The model agrees mostly with the XFEL structures in the dark S₁ state. However, significant discrepancies are observed for the excited XFEL states (S₂, S_3, and S₀) and the dark states of the X-ray and CryoEM structures. Furthermore, there is a mismatch between the predicted S-states within the two monomers of the same dimer, mainly in the excited states. We validated our model against other metalloenzymes, the valence bond model and the Mn spin densities calculated using density functional theory for two of the mismatched predictions of PSII. The model suggests designing a more optimized sample delivery and illumiation systems are crucial to precisely resolve the geometry of the advanced S-states to overcome the noncoherent S-state transition. In addition, significant radiation damage is observed in X-ray and CryoEM structures, particularly at the dangler Mn center (Mn4). Our model represents a valuable tool for investigating the electronic structure of the catalytic metal cluster of PSII to understand the water splitting mechanism.

cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination

Article 06 February 2017

Rapid Prediction of Multi-dimensional NMR Data Sets Using FANDAS

A Statistical Learning Framework for Accelerated Bandgap Prediction of Inorganic Compounds

Article 12 November 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Throughout history, Nature has inspired humans and driven many discoveries and inventions. For scientists, Nature is a vast school in which to observe, record, learn, and get inspired. Likewise, remarkable technologies and products have been inspired in one way or another by lessons learned from the environment, from sailing to flying to Velcro. Similarly, understanding the machinery of the biological nano-engines responsible for Photosynthesis would help understand how common metals such as manganese (Mn) act within Photosystem II (PSII) as the best catalyst for water oxidation (Fig. 1) (Brudvig 2008). A comprehensive understanding of the machinery can pave the way for developing similar artificial catalysts (Barber 2014; Nocera 2012, 2017; Orio and Pantazis 2021; Zhang and Reisner 2020). After recognizing the need to transition to renewable energy sources, Giacomo Ciamician proposed that photochemical devices can convert solar energy into fuel (Ciamician 1912). This idea was one of the early motivations to develop artificial Photosynthesis.

In natural Photosynthesis, PSII, a membrane protein complex in cyanobacteria, algae, and higher plants, harvests solar energy to drive water oxidization, converting the light energy into chemical energy and releasing di-molecular oxygen as a byproduct. In the ‘70 s of the twentieth century, Bassel Kok described the biological water oxidation process in a five-step reaction (Kok et al. 1970). PSII carries out this reaction by coupling four-electron water oxidation at the oxygen-evolving complex (OEC), with the one-electron photochemistry occurring at the reaction center (Yano and Yachandra 2014; Vinyard and Brudvig 2017; Cox and Messinger 2013). The OEC consists of a heteronuclear Mn₄O₅Ca cluster, and it cycles through five intermediate S-states (S₀ to S₄) that corresponds to the abstraction of four successive electrons from the OEC (Kok et al. 1970). Several cofactors, i.e., chlorophyll, pheophytin, quinones, non-heam iron, and a redox-active tyrosine sidechain, are involved in the charge separation reaction during the water oxidation (Fig. 1).

A complete understanding of the catalytic activity of the PSII requires detailed information about the geometric and the electronic structure of the Mn₄CaO₅ cluster. The atomic geometric structure was first revealed at a resolution of 1.9 Å in 2011, using synchrotron X-ray crystallography (Umena et al. 2011a). However, synchrotron radiation induces sample damage, which result in systematic elongation in the metal–ligand bond distances (Garman 2010; Garman and Weik 2013; Grabolle et al. 2006). On the other hand, the recent advancement in the X-ray Free-electron Laser (XFEL) crystallography provided a radiation damage-free geometric structure of the Mn₄CaO₅ cluster (Suga et al. 2019, 2015, 2017a; Ibrahim et al. 2020; Kern et al. 2018a; Hussein et al. 2021). Moreover, XFEL studies provided the geometric structure for the S₁ state (dark-adapted) and other S-states at a resolution of ~ 2.0 Å, which shows a significantly reduced Mn-ligands bond lengths. However, despite the advancements in understanding the geometric structure of the Mn₄CaO₅ cluster, the electronic structure is still elusive.

Although the serial femtosecond crystallography at the XFEL sources provides a tool for imaging the catalytic intermediates of the Kok cycle, it is very challenging to state if the imaged structure is for the desired S-state. This is mainly because of the incoherent S-state transitions. In addition, a high resolution is required to accurately resolve the Mn-ligands positions because of the high density of Mn. Thus, we built machine learning models to predict the oxidation states of Mn in the OEC and hence the S-state of the X-ray, XFEL, and cryoEM structures. A similar method has been used previously to predict the metal oxidation states in metal–organic frameworks (Jablonka et al. 2021). The model is trained on Mn-containing small molecules obtained from the Cambridge Crystallographic Database (CCD), where the oxidation states are already known. The models, which showed very high accuracy scores (above 95%) on the training dataset, agreed mostly with the XFEL structures in the dark-adapted state (S₁). However, significant discrepancies are observed for the X-ray and cryoEM S₁ structures, as well as the illuminated XFEL structures (S₂, S₃, and S₀). These disagreement might be the insufficient resolution or the incoherent S-state transitions of the OEC. The model is validated against another metalloenzymes and was tested against the calculated spin densities using density functional theory (DFT) of the Mn and the valence bond model in two PSII structures. Our model could be used to quickly evaluate structural models of the OEC. In addition, although other experimental techniques such as X-ray emission spectroscopy (XES) could be used to evaluate the total oxidation state of the cluster, they cannot assign oxidation state for each Mn center. Thus, the information provided by our model may be used to drive a possible mechanism for the catalytic reaction by monitoring the change in the oxidation state of each Mn center.

Results and discussion

To predict the oxidation states of the Mn in the oxygen evolution complex (OEC) of Photosystem II, we built a prediction model based on the data we collected for Mn compounds from the Cambridge Crystallography Database, where the oxidation states of the Mn are already known. Only small compounds with crystallographic data, R-factors ≤ 0.075, and are error-free (at the level of 0.05 Å) were included in the search (Bruno et al. 2002). Furthermore, only octahedral Mn compounds with oxygen (O) and nitrogen (N) ligands were selected because the Mn ions in the OEC are coordinated by O and N ligands. In total, the database that was built contains 1734, 835, and 107 structures corresponding to the oxidation states Mn(II), Mn(III), and Mn(IV), respectively.

The average bond lengths between the Mn and the ligands are significantly different for the different oxidation states; shorter for higher Mn oxidation states. Furthermore, in the case of Mn(III), the axial ligands have a significantly longer bond length due to the Jahn Teller effect. Therefore, the prediction model was designed to assess two features: (1) The average bond length between the Mn and the equatorial ligands (2) and the axial ligands. Figure 2a shows the average bond lengths of the equatorial ligands (X-axis) against the axial ligands (Y-. Although there are clear distinguished clusters for Mn(II) (Cyan), Mn(III) (Blue), and Mn(IV) (Dark Blue), there are some mislabeled elements within each cluster. There are several reasons for the mislabeled data, such as inter-ligand steric effects (Shields et al. 2000). The same conclusions were observed when calculating the oxidation states of metals using the valence bond model (Reeves et al. 2019). In addition, the asymmetry between the X and Y axes is observable in the Mn(III) cluster due to the Jahn Teller distortion.

The K-means clustering algorithm was used to label our data into three distinct clusters to correct the mislabeled data. As shown in Fig. 2b, the algorithm converged, and the data were clustered into three clusters with the following centers: (2.18, 2.28) Å for Mn(II), (1.95, 2.26) Å for Mn(III), and (1.91, 2.05) Å for Mn(IV). The ratio Y/X for the three centers are (1.05, 1.16 1.08) Å, corresponding to Mn(II), Mn(III), and Mn(VI), respectively. The cluster that corresponds to Mn(III) clearly shows the effect of Jahn Teller distortion, where the axial ligands are significantly longer than the equatorial ones. Finally, we used the clustered data to build a prediction model based on two different classifiers: Gaussian Naïve Bayes and Decision Tree classifier.

Gaussian Naïve Bayes classifier (GNB)

The reason for choosing the GNB is that the data could be fitted to 2D Gaussians. Thus, we expected the model to perform well given the training dataset. The model is trained on 75% of the data, and the remaining 25% are used for testing. Before processing the data with K-means clustering, the accuracy score for the GNB prediction model is 94%, and the confusion matrix that shows the prediction against the true labels is shown in Fig. 3 (upper left). Furthermore, we performed a tenfold cross-validation to evaluate further the model, which resulted in a mean accuracy score of 96% and a sigma of 1%.

The accuracy score increased to 99% after using the processed data after the clustering, which is also reflected in the confusion matrix shown in Fig. 3 (upper right). Most of the wrong predictions are Mn(IV) data points, which were classified as Mn(III) (6 out of 44). Interestingly, the means of the Gaussians used to calculate the prior probabilities precisely match the center of the clusters obtained from the K-means algorithm.

Decision tree classifier (DT)

The Decision Tree (DT) classifier is based on a very different algorithm than the Naïve Bayes. Each node in the tree applies a test on a feature (here, the average bond lengths of the axial or the equatorial ligands); the branches descending from each node correspond to one of the possible values for that feature. The nodes are arranged in the tree so that the reduction in the information entropy is maximized.

The DT model shows a higher accuracy score than the GNB before and after the clustering (Fig. 3 lower left, lower right, respectively). The cross-validation calculations show an accuracy score of 95% before the clustering and 100% after the clustering. The set of rules used to classify the Mn oxidation states are shown in Fig. 4.

Prediction of the catalytic states of the Kok cycle

The OEC contains four Mn ions and one Ca ion (Fig. 1). The Mn ions are ligated mainly by (O) and one (N). During the catalytic cycle of the PSII enzyme, the Mn ions are oxidized in the transition between the S-states. According to different spectroscopic studies, using different techniques, the oxidation state of the Mn ions in the dark-adapted state of the OEC (S₁-state) is Mn(III, IV, IV, III) (Visser et al. 2002; Riggs et al. 1992; Bergmann et al. 1998; Cox et al. 2014). The Mn(III) is then oxidized in the transition to higher S-states till all Mn are Mn(IV) in the S₃-state (Fig. 1). Our prediction model with the highest accuracy score based on Decision Tree Classifier was used to predict the oxidation states of the Mn in Photosystem II in 38 structures (27 XFEL, 6 X-ray, and 5 cryoEM structures) for each monomer independently. The Decision Tree Classifier trained on data prior clustering is also tested, which predicted a slightly more reduced structure (see supporting information). Only the structures of the meta-stables S-states (S₁, S₂, S₃, and S₀), which have a resolution of 2.5 Å or better, were included in this study (Table 1). After the predictions of the oxidation states, the S-states were assigned according to the total charges of the 4 Mn, i.e., the total charges for S₀, S₁, S_2, and S₃ are 13, 14, 15, and 16, respectively. In addition, we have included the S₋₁, S₋₂, …, S₋₅ to account for more reduced structures.

Table 1 The reported vs. predicted S-states of the X-ray, XFEL and cryoEM structures

Full size table

The prediction accuracy of our model is 96% before clustering and nearly 100% after clustering when applied to the small molecules. To further assess our model, we predicted the oxidation states of the Mn in DFT optimized structures of the OEC, where the oxidation states are already known from the spin densities (Amin et al. 2019). The model successfully predicted the oxidation state that matches the Mn’s spin densities. On the other hand, the first impression of the prediction model’s accuracy to predict the oxidation states of the OEC is low. As shown in Fig. 5, the structures are mostly in the S₁ state for both monomers. However, the experimentally assigned S-states, which are based on the number of flashes used to pump the sample, are significantly different from the predicted by our model. For monomer 1, out of the 38 structures, the prediction matched the experimentally assigned S-state for ten structures; all of them are assigned as S₁ except for the 6dhf and 6w1v structures, which are S₂ and S₃, respectively (Table 1). For monomer 2, 10 structures matched the experimentally assigned S-state, all of them for S₁ except for the 7rf3 structure, which is S₂. (PDB ID: 6dhp). The mismatch between the two monomers may be attributed to the different turnover rates in the crystal due to the different physical/chemical conditions in the crystals (Wang et al. 2021).

A closer and more detailed look at the predicted oxidation states shows a totally different story. Among the investigated PSII PDB files, 22 structures correspond to the S₁-state (PDB; 3wu2, 4il6, 4pj0, 4ub6, 4ub8, 5b66, 5b5e, 5gth, 5h2f, 5ws5, 5zzn, 6dhe, 6jlj, 6jlm, 6w1o, 7cji, 7cou, 7rf3, 7d1t, 7d1u, 7n8o, 7rcv); eleven out of the 22 have been solved using XFEL data. In seven XFEL-S₁ structures, the predicted S-states, using our models, agreed with the experimental one in both monomers. Furthermore, in another three XFEL-S₁ structures, the predicted S-stated agreed with the experimental one in at least one of the two monomers. Although, the predicted S-state did not match the reported one in only one XFEL-S₁ structure. The agreement between the XFEL structures and the prediction in the dark state (S₁) supports that our models accurately predict the damage-free XFEL-S₁ structures.

The other 11 structures were solved using data that have been collected using conventional synchrotron X-ray radiation (PDB; 3wu2, 4il6, 4pj0, 5b66, 5b5e, 5h2f) or cryoEM (PDB; 5zzn, 7n8o, 7rcv, 7d1t, 7d1u). While the predicted and reports states are mostly in agreement, for the XFEL S₁-structure, the reported S-states for these 11 structures mainly disagreed with the predicted one. Only one monomer of the 11 structures was predicted to be in the S₁-state as reported, while the rest are predicted to be in more reduced states S₀, S₋₁, …, or S₋₅ (Table 1). Some of these S₁-models were investigated theoretically, and it was predicted to suffer from severe radiation damage (Luber et al. 2011; Kato et al. 2021). It is worth noticing that the single agreement (in the case of the X-ray S₁ structures) is coming from a collected dataset using a low dose of synchrotron radiation (Suga et al. 2015). These observations emphasize the influence of the radiation damage on the OEC electronic structure and hence the geometrical structure, in agreement with several studies that generally assess the radiation damage in protein crystallography (Garman 2010; Garman and Weik 2013, 2011; Grabolle et al. 2006; Garman and McSweeney 2007; Garman and Nave 2009; Hendrickson 1991). Moreover, several studies have discussed the radiation damage in the case of PSII, particularly during X-ray (Askerka et al. 2015; Yano et al. 2005), and the cryoEM data collection (Kato et al. 2021).

On the other hand, the agreement of the prediction was relatively low for the XFEL structures (PDB: 5gti, 5tis, 5ws6, 6dhf, 6dho, 6dhp, 6jlk, 6jll, 6jln, 6jlo, 6jlp, 6w1p, 6w1v, 7cjj, 7rf3, 7rf8), which are for different excited S-states, S₂, S₃, and S₀. Although Mn emission spectroscopy shows a shift in the spectrum due to the Mn oxidation in the illuminated structures compared to the dark state, the shift may results from a low population of the excited S-states. Furthermore, even if a coherent transition for all nanocrystals takes place, the resolution may not be enough to accurately resolve the Mn-ligand positions, which may explain the discrepancy between the assigned and predicted S-states. In addition, we cannot eliminate the possibility that the presence of the Ca ion in the OEC affects the cluster’s geometry during the light activation. The Ca ion in the OEC has been intensively investigated; it plays a critical to the substrate insertion during the light activation (Boussac et al. 2004; Cox et al. 2011; Koua et al. 2013).

Overall, the predicted oxidation state of the Mn ions for the XFEL structures did not mostly show Mn(II) content, except for PDB 7cou, 7cji, 6jlp, where one of the two monomers contained a Mn ion that was predicted to be in Mn(II) oxidation state. Unlike the XFEL structures, most of the synchrotron or cryoEM structures contain Mn(II), and all Mn in the 7n8o, 7rcv structures are reduced to Mn(II), likely because of the high radiation dose (Gisriel et al. 2022). In addition, these structures show higher reductions states, i.e., S₋₁, …, or S₋₅, that are physiologically do not exist in an active PSII, indicating that the suffering of radiation damage significantly influences the electronic and geometrical structure of the OEC. The radiation damage influences the OEC geometry, and it is manifested in the prediction results of the S₁-structures. These results emphasize the importance of radiation-free data collection to study a functional PSII.

Interestingly, for all the structures that showed a presence of Mn(II) in the OEC, the Mn(II) oxidation state was permanently assigned to the Mn4 (the dangling Mn), indicating the high vulnerability of this Mn ion. It is recently suggested that Mn4 is a Mn(II) high-affinity site, where the Mn ions are oxidized in preparation for the OEC formation during the OEC assembly (Mino and Asada 2021).

According to our model, Mn2, Mn3 are mostly in the Mn(IV) oxidation state, while Mn1and Mn4 is mostly in the Mn(III) states (Fig. 6), which agrees with several theoretical studies proposed that Mn2 and Mn3 are oxidized in the S₁ state. The transition to S₂ takes place by likely oxidizing either Mn4 (as in 6dhf) or Mn1 (as in 6dho). According to theoretical studies supported by EPR measurements, the oxidation of Mn4 in the S2 is responsible for the g = 2 EPR signal, while the multi-lines g = 4.1 signal may be attributed to the oxidation of Mn1 or the conversion of O4 from a µ-oxo to hydroxo bridge (Corry and O’Malley 2019). Furthermore, several studies suggested that the transition between the S₁ and the S₃ will involve the oxidation of Mn4, which is then reduced and Mn1 is oxidized before both Mn are oxidized in the S₃ state. It is interesting to notice that this sequence of the reaction is observed in the structures resolved by Kern et. al. (6dhe, 6dhf, 6dho, 6dhp) (2018b) which are predicted to be in the S₁, S₂,_Mn4(IV), S₂,_Mn1(IV) and S₃ states (Table1) (Amin et al. 2019; Marius Retegan et al. 2016; Kaur et al. 2019). Although, the 6dhp structure is assigned for S₀ the existence of the additional water ligand near Mn1 support the prediction by our model.

Model validation

The OEC is a unique catalytic cluster, which has not been synthesized in artificial system. Thus, the transferrebility of the machine learning model, which is trained on a small compounds is questionable. However, since the model is predicting the oxidation state of each Mn center invidually based on its coordination chemistry and does not predict the chemical properties of the cluster, the predictions should be chemically reasonable. This assumption is valid because our data sets contains more than 1200 structures with single Mn, which are placed randomly in the training and test sets in tenfold cross validation steps and produced very high accuracy scores. In other words, the structures with mono-Mn center produces high accuracy when predicting the oxidation states of Mn in structures with multi-Mn centers.

To further validate our model we used three independent methods: (1) We predicted the Mn oxidation states in other proteins structures with significantly higher resolution. (2) The OEC is a unique cluster. Thus, we calculated the spin densities of the Mn in two of the mismatched structures of different S-states resolved by different groups. (3) We validated our predictions against the valence bond model.

Prediction of Mn oxidation states in high resolution structures of Mn-superoxide dismutase and oxidase. The structure of Manganese Superoxide Dismutase from Sphingobacterium is solved at 1.35 Å resolution (PDB code: 5A9G). The Mn ions in the two dimers are in the Mn(II) oxidation state. The averages of the axial and equatorial ligand bond lengths are 2.2 Å and 2.1 Å, respectively, in both monomers (unlike the PSII structures, where the two monomer are significantly different). It is clear that the Mn ions belong to the Mn(II) cluster (Fig. 2b) and the model predicts the correct oxidation state for Mn in the two monomers. In addition, we predicted the oxidation state of Mn in the crystal structure of R2-like ligand-binding oxidase from Saccharopolyspora Erythraea solved at 1.38 Å resolution. The Mn is assigned Mn(III) oxidation state, which is predicted correctly by our model. The averages of the axial and equatorial ligand bond lengths are 2.3 and 2.0 Å, respectively. The Jahn Teller distortion is clearly observed and the Mn belongs to the cluster of Mn(III) in Fig. 2b.

Comparing the predicted Mn oxidation states in the OEC against the calculated Mn spin densities from DFT. Because the OEC is a unique catalytic center, we validated our predictions against the calculated Mn spin densities from DFT. We calculated the spin densities for the Mn centers in two mismatching structures obtained from different groups for advanced S-states (since a good agreement is obtained for structures in the dark state) using Gaussian09 with B3LYP/6-31G(d) level of theory. The first structure is the 6dho, which is assigned a S₃ state, while it was predicted to be in the S₂ by the model. The second is 6jlk, which is assigned the S₂ state and predicted to be in the S₁ state.

The calculated Mn spin densities in the 6dho show that Mn4 is the most reduced center (Table 2), which is predicted to be in the Mn(III) by our model. Furthermore, after the optimizations all Mn are predicted to be in the Mn(IV) state, which agrees with the DFT spin densities. Similary, the spin densities of the Mn in the 6jlk structure suggest that the appropriate state for this structure is S₁, which agrees with our model (Table 2). However, after the optimization, the DFT spin densities and our model confirm that the structure is at the S₂ state.

Table 2 The Mn spin densities in the XFEL and DFT optimized structures of the OEC compared to the machine learning and the valence bond models

Full size table

Valence Bond Model (VBM). According to the valence bond model the oxidation state of the metal center could be calculated as the sum of valence from each bond:

$$v= \sum_{i}{e}^{\frac{\left({R}_{0}-{R}_{i}\right)}{B}},$$

(1)

where R₀ and B are empirical parameters obtained from the IUCR dataset by David Brown and R_i is the bond length of the individual ligands (Reeves et al. 2019; Brown 2009, 2017). Using this model (Eq. 1) we calculated the oxidation states of the Mn’s in the 6dho and 6jlk structures (Table 2). In general, the calculated oxidation states using VBM are more reduced than the predicted with the ML model. However, the calculations matches the ML model and the Mn spin densities for the DFT optimized structures.

Furthermore, we used the DFT structure of the S₁ state that showed a good match with the EXAFS (Luber et al. 2011) to estimate the empirical paramters R₀ in Eq. 1 and recalculated the oxidation states of Mn in the later structures. The calculated oxidation states based on the updated paramters agrees with the ML model for all Mn’s of all structures (prior and after the DFT optimizations) Table 2.

In addition to the previous validation steps, the model agrees mostly with the XFEL structures in the dark-S1 state, which provide another supporting evidence for the validity of the model.

Conclusion

In conclusion, we built two models based on the Decision Tree Classifier (DT) and the Gaussian Naïve Bayes Classifier (GNB) to predict the oxidation state of the Mn ions in the OEC using the available small molecules from the Cambridge Structure Data. The DT model showed better results than the GNB model; it has an accuracy of nearly 100% in the prediction of small molecules and ~ 75% in the case of XFEL-S₁ structures. Furthermore, the prediction of the synchrotron and cryoEM S₁ structures predicted reduced structures (S₀, S₋₁,…, S₋₅), indicating the presence of severe radiation damage, in agreement with several studies that suggested the presence of radiation damage during data collection. The cryoEM structures, in particular, showed significantly high reduced states, up to S₋₅. The observation of radiation damage signs emphasizes the importance of radiation-free data collection to investigate the functional OEC of PSII. In addition, the model predicted that Mn1 and Mn4 are more likely to be oxidized during the transitions S₁ → S₂ and S₂ → S₃ states. Moreover, the prediction model shows that Mn4 is the most susceptible Mn ion among the four ions to radiation damage.

Although the experimental methods such as XES or XANES are used to determine the oxidation states of the Mn₄O₅Ca²⁺ cluster, it cannot assign the oxidation states of each Mn separately, which is important to understand the mechanism of the water splitting reaction. Our model provides a tool for quickly evaluating the structure and to provide the oxidation states of each Mn center for the different structures of the S-states. Furthermore, the model could be used to evaluate the radiation damage in the X-ray and cryoEM structures.

Methods

Data collection

We used ConQuest software to search the Cambridge Structural Database for small molecules that contain Mn ions. The first quest resulted in more than 15,000 small molecules; however, several filters were used to end up with a reliable data set that resembles some features of the OEC. We started by eliminating the noncrystallographic structures, also any structure with R-factors ≥ 0.075. Another filter was added to improve the preciseness of the bond length in the structures by including only the error-free structures (at the level of 0.05 Å) (Bruno et al. 2002). Furthermore, only Mn compounds with oxygen (O) and nitrogen (N) ligands were included. Finally, overall, we built a database of thousands of octahedral coordination compounds containing 1734, 835, and 107 structures corresponding to the oxidation states Mn(II), Mn(III), and Mn(IV), respectively. The data includes 2795 structures that includes µ-oxo-bridges.

Machine learning models

Initially we used sklearn (Pedregosa et al. 2011) to build prediction models using Gaussian Naïve Bayes and Decision Tree classifiers based on several features: (1) each Mn-Ligand bond is a feature (i.e., six features) (2) The type of each atom ligating the Mn (i.e., six features). However, because both Mn oxidation state and the type of the ligand affect the Mn-Ligand distances, the same accuracy score is achieved with only two features: (1) the average distances from Mn to the equatorial and (2) axial ligands.

In-house python scripts are written to read the PDB files, extract the octahedral Mn ions, and calculate the distances between the Mn and the ligands using the BioPython (Cock et al. 2009) package. Then, a CSV file that contains the extracted data from all PDB files is created. Pandas library is used to parse the input data to the machine learning models. The K-means clustering algorithm in sklearn is used to cluster the Mn ions based on the average bond length of the equatorial and axial ligands into three clusters. Each cluster represents a different oxidation state of the Mn; then, the supervised learning process based on Gaussian Naïve Bayes and Decision Tree classifiers is repeated based on the clustered data. The split into training and test datasets is done randomly 10 folds by sklearn with 70% of the data for training and 30% for testing using cross-validation function.

References

Amin M, Kaur D, Yang KR, Wang J, Mohamed Z, Brudvig GW, Gunner MR, Batista V (2019) Thermodynamics of the S2-to-S3 state transition of the oxygen-evolving complex of photosystem II. Phys Chem Chem Phys 21(37):20840–20848
Article CAS PubMed Google Scholar
Askerka M, Vinyard DJ, Wang J, Brudvig GW, Batista VS (2015) Analysis of the radiation-damage-free X-ray structure of photosystem II in light of EXAFS and QM/MM data. Biochemistry 54(9):1713–1716
Article CAS PubMed Google Scholar
Barber J (2014) Photosystem II: its function, structure, and implications for artificial photosynthesis. Biochem Mosc 79(3):185–196
Article CAS Google Scholar
Bergmann U, Grush MM, Horne CR, DeMarois P, Penner-Hahn JE, Yocum CF, Wright D, Dubé CE, Armstrong WH, Christou GJ (1998) Characterization of the Mn oxidation states in photosystem II by Kβ X-ray fluorescence spectroscopy. J Phys Chem B 102(42):8350–8352
Article CAS Google Scholar
Boussac A, Rappaport F, Carrier P, Verbavatz J-M, Gobin R, Kirilovsky D, Rutherford AW, Sugiura M (2004) Biosynthetic Ca²⁺/Sr²⁺ exchange in the photosystem II oxygen-evolving enzyme of Thermosynechococcus elongatus. J Biol Chem 279(22):22809–22819
Article CAS PubMed Google Scholar
Brown ID (2009) Recent developments in the methods and applications of the bond valence model. Chem Rev 109(12):6858–6919
Article CAS PubMed PubMed Central Google Scholar
Brown ID (2017) What is the best way to determine bond-valence parameters? IUCrJ 4(Pt 5):514–515
Article CAS PubMed PubMed Central Google Scholar
Brudvig GW (2008) Water oxidation chemistry of photosystem II. Philos Trans R Soc Lond Ser B 363(1494):1211–1218
Article CAS Google Scholar
Bruno IJ, Cole JC, Edgington PR, Kessler M, Macrae CF, McCabe P, Pearson J, Taylor R (2002) New software for searching the Cambridge Structural Database and visualizing crystal structures. Acta Crystallogr Sect B Struct Sci 58(3):389–397
Article Google Scholar
Ciamician G (1912) The photochemistry of the future. Science 36(926):385–394
Article CAS PubMed Google Scholar
Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, de Hoon MJ (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11):1422–1423
Article CAS PubMed PubMed Central Google Scholar
Corry TA, O’Malley PJ (2019) Proton isomers rationalize the high- and low-spin forms of the S2 state intermediate in the water-oxidizing reaction of photosystem II. J Phys Chem Lett 10(17):5226–5230
Article CAS PubMed Google Scholar
Cox N, Messinger J (2013) Reflections on substrate water and dioxygen formation. Biochim Biophys Acta 1827(8–9):1020–1030
Article CAS PubMed Google Scholar
Cox N, Rapatskiy L, Su J-H, Pantazis DA, Sugiura M, Kulik L, Dorlet P, Rutherford AW, Neese F, Boussac A (2011) Effect of Ca²⁺/Sr²⁺ substitution on the electronic structure of the oxygen-evolving complex of photosystem II: a combined multifrequency EPR, 55Mn-ENDOR, and DFT study of the S2 state. J Am Chem Soc 133(10):3635–3648
Article CAS PubMed Google Scholar
Cox N, Retegan M, Neese F, Pantazis DA, Boussac A, Lubitz W (2014) Electronic structure of the oxygen-evolving complex in photosystem II prior to OO bond formation. Science 345(6198):804–808
Article CAS PubMed Google Scholar
Garman EF (2010) Radiation damage in macromolecular crystallography: what is it and why should we care? Acta Crystallogr D Biol Crystallogr 66(4):339–351
Article CAS PubMed PubMed Central Google Scholar
Garman EF, McSweeney SM (2007) Progress in research into radiation damage in cryo-cooled macromolecular crystals. J Synchrotron Radiat 14(1):1–3
Article PubMed Google Scholar
Garman EF, Nave C (2009) Radiation damage in protein crystals examined under various conditions by different methods. J Synchrotron Radiat 16(2):129–132
Article CAS PubMed Google Scholar
Garman EF, Weik M (2011) Macromolecular crystallography radiation damage research: what’s new? J Synchrotron Radiat 18(3):313–317
Article PubMed PubMed Central Google Scholar
Garman EF, Weik M (2013) Radiation damage to biological macromolecules: some answers and more questions. J Synchrotron Radiat 20(1):1–6
Article CAS PubMed Google Scholar
Gisriel CJ, Wang J, Liu J, Flesher DA, Reiss KM, Huang HL, Yang KR, Armstrong WH, Gunner MR, Batista VS, Debus RJ, Brudvig GW (2022) High-resolution cryo-electron microscopy structure of photosystem II from the mesophilic cyanobacterium, Synechocystis sp. PCC 6803. Proc Natl Acad Sci USA 119(1):e2116765118
Article CAS PubMed Google Scholar
Grabolle M, Haumann M, Müller C, Liebisch P, Dau H (2006) Rapid loss of structural motifs in the manganese complex of oxygenic photosynthesis by X-ray irradiation at 10–300 K. J Biol Chem 281(8):4580–4588
Article CAS PubMed Google Scholar
Hellmich J, Bommer M, Burkhardt A, Ibrahim M, Kern J, Meents A, Muh F, Dobbek H, Zouni A (2014) Native-like photosystem II superstructure at 2.44 A resolution through detergent extraction from the protein crystal. Structure 22(11):1607–1615
Article CAS PubMed Google Scholar
Hendrickson WA (1991) Determination of macromolecular structures from anomalous diffraction of synchrotron radiation. Science 254:5028
Article Google Scholar
Hussein R, Ibrahim M, Bhowmick A, Simon PS, Chatterjee R, Lassalle L, Doyle M, Bogacz I, Kim I-S, Cheah MH, Gul S, de Lichtenberg C, Chernev P, Pham CC, Young ID, Carbajo S, Fuller FD, Alonso-Mori R, Batyuk A, Sutherlin KD, Brewster AS, Bolotovsky R, Mendez D, Holton JM, Moriarty NW, Adams PD, Bergmann U, Sauter NK, Dobbek H, Messinger J, Zouni A, Kern J, Yachandra VK, Yano J (2021) Structural dynamics in the water and proton channels of photosystem II during the S2 to S3 transition. Nat Commun 12(1):6531
Article CAS PubMed PubMed Central Google Scholar
Ibrahim M, Fransson T, Chatterjee R, Cheah MH, Hussein R, Lassalle L, Sutherlin KD, Young ID, Fuller FD, Gul S, Kim IS, Simon PS, de Lichtenberg C, Chernev P, Bogacz I, Pham CC, Orville AM, Saichek N, Northen T, Batyuk A, Carbajo S, Alonso-Mori R, Tono K, Owada S, Bhowmick A, Bolotovsky R, Mendez D, Moriarty NW, Holton JM, Dobbek H, Brewster AS, Adams PD, Sauter NK, Bergmann U, Zouni A, Messinger J, Kern J, Yachandra VK, Yano J (2020) Untangling the sequence of events during the S2 –> S3 transition in photosystem II and implications for the water oxidation mechanism. Proc Natl Acad Sci USA 117(23):12624–12635
Article CAS PubMed PubMed Central Google Scholar
Jablonka KM, Ongari D, Moosavi SM, Smit B (2021) Using collective knowledge to assign oxidation states of metal cations in metal-organic frameworks. Nat Chem 13(8):771–777
Article CAS PubMed Google Scholar
Kato K, Miyazaki N, Hamaguchi T, Nakajima Y, Akita F, Yonekura K, Shen J-R (2021) High-resolution cryo-EM structure of photosystem II reveals damage from high-dose electron beams. Commun Biol 4(1):382
Article CAS PubMed PubMed Central Google Scholar
Kaur D, Szejgis W, Mao J, Amin M, Reiss KM, Askerka M, Cai X, Khaniya U, Zhang Y, Brudvig GW, Batista VS, Gunner MR (2019) Relative stability of the S2 isomers of the oxygen evolving complex of photosystem II. Photosynth Res 141(3):331–341
Article CAS PubMed Google Scholar
Kern J, Chatterjee R, Young ID, Fuller FD, Lassalle L, Ibrahim M, Gul S, Fransson T, Brewster AS, Alonso-Mori R, Hussein R, Zhang M, Douthit L, de Lichtenberg C, Cheah MH, Shevela D, Wersig J, Seuffert I, Sokaras D, Pastor E, Weninger C, Kroll T, Sierra RG, Aller P, Butryn A, Orville AM, Liang M, Batyuk A, Koglin JE, Carbajo S, Boutet S, Moriarty NW, Holton JM, Dobbek H, Adams PD, Bergmann U, Sauter NK, Zouni A, Messinger J, Yano J, Yachandra VK (2018a) Structures of the intermediates of Kok’s photosynthetic water oxidation clock. Nature 563(7731):421–425
Article CAS PubMed PubMed Central Google Scholar
Kern J, Chatterjee R, Young ID, Fuller FD, Lassalle L, Ibrahim M, Gul S, Fransson T, Brewster AS, Alonso-Mori R, Hussein R, Zhang M, Douthit L, de Lichtenberg C, Cheah MH, Shevela D, Wersig J, Seuffert I, Sokaras D, Pastor E, Weninger C, Kroll T, Sierra RG, Aller P, Butryn A, Orville AM, Liang M, Batyuk A, Koglin JE, Carbajo S, Boutet S, Moriarty NW, Holton JM, Dobbek H, Adams PD, Bergmann U, Sauter NK, Zouni A, Messinger J, Yano J, Yachandra VK (2018b) Structures of the intermediates of Kok’s photosynthetic water oxidation clock. Nature 563(7731):421–425
Article CAS PubMed PubMed Central Google Scholar
Kok B, Forbush B, McGloin M (1970) Cooperation of charges in photosynthetic O₂ evolution. 1. A linear 4-step mechanism. Photochem Photobiol 11(6):457–475
Article CAS PubMed Google Scholar
Koua FH, Umena Y, Kawakami K, Shen JR (2013) Structure of Sr-substituted photosystem II at 2.1 A resolution and its implications in the mechanism of water oxidation. Proc Natl Acad Sci USA 110(10):3889–3994
Article CAS PubMed PubMed Central Google Scholar
Li H, Nakajima Y, Nomura T, Sugahara M, Yonekura S, Chan SK, Nakane T, Yamane T, Umena Y, Suzuki M, Masuda T, Motomura T, Naitow H, Matsuura Y, Kimura T, Tono K, Owada S, Joti Y, Tanaka R, Nango E, Akita F, Kubo M, Iwata S, Shen JR, Suga M (2021) Capturing structural changes of the S1 to S2 transition of photosystem II using time-resolved serial femtosecond crystallography. IUCrJ 8(Pt 3):431–443
Article PubMed PubMed Central Google Scholar
Luber S, Rivalta I, Umena Y, Kawakami K, Shen JR, Kamiya N, Brudvig GW, Batista VS (2011) S1-state model of the O₂-evolving complex of photosystem II. Biochemistry 50(29):6308–6311
Article CAS PubMed Google Scholar
Mino H, Asada M (2021) Location of two Mn²⁺ affinity sites in photosystem II detected by pulsed electron–electron double resonance. Photosynth Res. https://doi.org/10.1007/s11120-021-00885-5
Article PubMed Google Scholar
Nakajima Y, Umena Y, Nagao R, Endo K, Kobayashi K, Akita F, Suga M, Wada H, Noguchi T, Shen JR (2018) Thylakoid membrane lipid sulfoquinovosyl-diacylglycerol (SQDG) is required for full functioning of photosystem II in Thermosynechococcus elongatus. J Biol Chem 293(38):14786–14797
Article CAS PubMed PubMed Central Google Scholar
Nocera DG (2012) The artificial leaf. Acc Chem Res 45(5):767–776
Article CAS PubMed Google Scholar
Nocera DG (2017) Solar fuels and solar chemicals industry. Acc Chem Res 50(3):616–619
Article CAS PubMed Google Scholar
Orio M, Pantazis DA (2021) Successes, challenges, and opportunities for quantum chemistry in understanding metalloenzymes for solar fuels research. Chem Commun 57(33):3952–3974
Article CAS Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Google Scholar
Reeves MG, Wood PA, Parsons S (2019) Automated oxidation-state assignment for metal sites in coordination complexes in the Cambridge Structural Database. Acta Crystallogr B Struct Sci Cryst Eng Mater 75(Pt 6):1096–1105
Article CAS PubMed Google Scholar
Retegan M, Krewald V, Mamedov F, Neese F, Lubitz W, Cox N, Pantazis DA (2016) A five-coordinate Mn(IV) intermediate in biological water oxidation: spectroscopic signature and a pivot mechanism for water binding. Chem Sci 7:72–84
Article CAS PubMed Google Scholar
Riggs PJ, Yocum CF, Penner-Hahn JE, Mei R (1992) Reduced derivatives of the manganese cluster in the photosynthetic oxygen-evolving complex. J Am Chem Soc 114(26):10650–10651
Article CAS Google Scholar
Shields GP, Raithby PR, Allen FH, Motherwell WD (2000) The assignment and validation of metal oxidation states in the Cambridge Structural Database. Acta Crystallogr B 56(Pt 3):455–465
Article PubMed Google Scholar
Suga M, Akita F, Hirata K, Ueno G, Murakami H, Nakajima Y, Shimizu T, Yamashita K, Yamamoto M, Ago H, Shen JR (2015) Native structure of photosystem II at 1.95 A resolution viewed by femtosecond X-ray pulses. Nature 517(7532):99–103
Article CAS PubMed Google Scholar
Suga M, Akita F, Sugahara M, Kubo M, Nakajima Y, Nakane T, Yamashita K, Umena Y, Nakabayashi M, Yamane T, Nakano T, Suzuki M, Masuda T, Inoue S, Kimura T, Nomura T, Yonekura S, Yu L-J, Sakamoto T, Motomura T, Chen J-H, Kato Y, Noguchi T, Tono K, Joti Y, Kameshima T, Hatsui T, Nango E, Tanaka R, Naitow H, Matsuura Y, Yamashita A, Yamamoto M, Nureki O, Yabashi M, Ishikawa T, Iwata S, Shen J-R (2017) Light-induced structural changes and the site of O=O bond formation in PSII caught by XFEL. Nature 543(7643):131–135
Article CAS PubMed Google Scholar
Suga M, Akita F, Yamashita K, Nakajima Y, Ueno G, Li H, Yamane T, Hirata K, Umena Y, Yonekura S, Yu L-J, Murakami H, Nomura T, Kimura T, Kubo M, Baba S, Kumasaka T, Tono K, Yabashi M, Isobe H, Yamaguchi K, Yamamoto M, Ago H, Shen J-R (2019) An oxyl/oxo mechanism for oxygen-oxygen coupling in PSII revealed by an x-ray free-electron laser. Science 366(6463):334–338
Article CAS PubMed Google Scholar
Tanaka A, Fukushima Y, Kamiya N (2017) Two different structures of the oxygen-evolving complex in the same polypeptide frameworks of photosystem II. J Am Chem Soc 139(5):1718–1721
Article CAS PubMed Google Scholar
Umena Y, Kawakami K, Shen J-R, Kamiya N (2011a) Crystal structure of oxygen-evolving photosystem II at a resolution of 1.9 Å. Nature 473(7345):55–60
Article CAS PubMed Google Scholar
Umena Y, Kawakami K, Shen J-R, Kamiya N (2011b) Crystal structure of oxygen-evolving photosystem II at 1.9 A resolution. Nature 473(7345):55–60
Article CAS PubMed Google Scholar
Uto S, Kawakami K, Umena Y, Iwai M, Ikeuchi M, Shen JR, Kamiya N (2017) Mutual relationships between structural and functional changes in a PsbM-deletion mutant of photosystem II. Faraday Discuss 198:107–120
Article CAS PubMed Google Scholar
Vinyard DJ, Brudvig GW (2017) Progress toward a molecular mechanism of water oxidation in photosystem II. Annu Rev Phys Chem 68:101–116
Article CAS PubMed Google Scholar
Visser H, Dubé CE, Armstrong WH, Sauer K, Yachandra VK (2002) FTIR spectra and normal-mode analysis of a tetranuclear manganese adamantane-like complex in two electrochemically prepared oxidation states: relevance to the oxygen-evolving complex of photosystem II. J Am Chem Soc 124(37):11008–11017
Article CAS PubMed PubMed Central Google Scholar
Wang J, Gisriel CJ, Reiss K, Huang HL, Armstrong WH, Brudvig GW, Batista VS (2021) Heterogeneous composition of oxygen-evolving complexes in crystal structures of dark-adapted photosystem II. Biochemistry 60(45):3374–3384
Article CAS PubMed Google Scholar
Yano J, Yachandra V (2014) Mn₄Ca cluster in photosynthesis: where and how water is oxidized to dioxygen. Chem Rev 114(8):4175–4205
Article CAS PubMed PubMed Central Google Scholar
Yano J, Pushkar Y, Glatzel P, Lewis A, Sauer K, Messinger J, Bergmann U, Yachandra V (2005) High-resolution Mn EXAFS of the oxygen-evolving complex in photosystem II: structural implications for the Mn₄Ca cluster. J Am Chem Soc 127(43):14974–14975
Article CAS PubMed PubMed Central Google Scholar
Young ID, Ibrahim M, Chatterjee R, Gul S, Fuller F, Koroidov S, Brewster AS, Tran R, Alonso-Mori R, Kroll T, Michels-Clark T, Laksmono H, Sierra RG, Stan CA, Hussein R, Zhang M, Douthit L, Kubin M, de Lichtenberg C, Long Vo P, Nilsson H, Cheah MH, Shevela D, Saracini C, Bean MA, Seuffert I, Sokaras D, Weng TC, Pastor E, Weninger C, Fransson T, Lassalle L, Brauer P, Aller P, Docker PT, Andi B, Orville AM, Glownia JM, Nelson S, Sikorski M, Zhu D, Hunter MS, Lane TJ, Aquila A, Koglin JE, Robinson J, Liang M, Boutet S, Lyubimov AY, Uervirojnangkoorn M, Moriarty NW, Liebschner D, Afonine PV, Waterman DG, Evans G, Wernet P, Dobbek H, Weis WI, Brunger AT, Zwart PH, Adams PD, Zouni A, Messinger J, Bergmann U, Sauter NK, Kern J, Yachandra VK, Yano J (2016) Structure of photosystem II and substrate binding at room temperature. Nature 540(7633):453–457
Article CAS PubMed PubMed Central Google Scholar
Zhang JZ, Reisner E (2020) Advancing photosystem II photoelectrochemistry for semi-artificial photosynthesis. Nat Rev Chem 4(1):6–21
Article CAS Google Scholar

Download references

Acknowledgements

We thank Dr. Mohamed Ibrahim, Prof. Dr. Christian Limberg and Dr. Beatrice Cula (Institute of Chemistry, Humboldt Universität zu Berlin) for supporting the CSD data collection. We acknowledges the support from the DOE Grants DESC0001423 (M.R.G. and V.S.B.). We thank Prof. Victor Batista, Prof. Gary Brudvig, Prof. Marilyn Gunner and Dr. Jimin Wang for the useful discussion.

Author information

Authors and Affiliations

Department of Sciences, University College Groningen, University of Groningen, Hoendiepskade 23/24, 9718 BG, Groningen, The Netherlands
Muhamed Amin
Rijksuniversiteit Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, Netherlands
Muhamed Amin
Center for Free-Electron Laser Science, Deutsches Elektronen-Synchrotron DESY, Notkestrasse 85, 22607, Hamburg, Germany
Muhamed Amin

Authors

Muhamed Amin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhamed Amin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (CSV 9 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Amin, M. Predicting the oxidation states of Mn ions in the oxygen-evolving complex of photosystem II using supervised and unsupervised machine learning. Photosynth Res 156, 89–100 (2023). https://doi.org/10.1007/s11120-022-00941-8

Download citation

Received: 01 March 2022
Accepted: 13 July 2022
Published: 27 July 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s11120-022-00941-8

Predicting the oxidation states of Mn ions in the oxygen-evolving complex of photosystem II using supervised and unsupervised machine learning

Abstract

Similar content being viewed by others

cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination

Rapid Prediction of Multi-dimensional NMR Data Sets Using FANDAS

A Statistical Learning Framework for Accelerated Bandgap Prediction of Inorganic Compounds

Introduction