Accelerated discovery of multi-elemental reverse water-gas shift catalysts using extrapolative machine learning approach

Wang, Gang; Mine, Shinya; Chen, Duotian; Jing, Yuan; Ting, Kah Wei; Yamaguchi, Taichi; Takao, Motoshi; Maeno, Zen; Takigawa, Ichigaku; Matsushita, Koichi; Shimizu, Ken-ichi; Toyao, Takashi

doi:10.1038/s41467-023-41341-3

Accelerated discovery of multi-elemental reverse water-gas shift catalysts using extrapolative machine learning approach

Article
Open access
Published: 21 September 2023

Volume 14, article number 5861, (2023)
Cite this article

Download PDF

You have full access to this open access article

From

View current issue

Accelerated discovery of multi-elemental reverse water-gas shift catalysts using extrapolative machine learning approach

Download PDF

5731 Accesses
9 Altmetric
Explore all metrics

Abstract

Designing novel catalysts is key to solving many energy and environmental challenges. Despite the promise that data science approaches, including machine learning (ML), can accelerate the development of catalysts, truly novel catalysts have rarely been discovered through ML approaches because of one of its most common limitations and criticisms—the assumed inability to extrapolate and identify extraordinary materials. Herein, we demonstrate an extrapolative ML approach to develop new multi-elemental reverse water-gas shift catalysts. Using 45 catalysts as the initial data points and performing 44 cycles of the closed loop discovery system (ML prediction + experiment), we experimentally tested a total of 300 catalysts and identified more than 100 catalysts with superior activity compared to those of the previously reported high-performance catalysts. The composition of the optimal catalyst discovered was Pt(3)/Rb(1)-Ba(1)-Mo(0.6)-Nb(0.2)/TiO₂. Notably, niobium (Nb) was not included in the original dataset, and the catalyst composition identified was not predictable even by human experts.

Data-driven discovery of electrocatalysts for CO2 reduction using active motifs-based machine learning

Article Open access 11 November 2023

Inspecting design rules of metal-nitrogen-carbon catalysts for electrochemical CO2 reduction reaction: From a data science perspective

Article 31 August 2022

Bridging the complexity gap in computational heterogeneous catalysis with machine learning

Article 23 February 2023

Introduction

The discovery of novel catalysts is essential for accelerating the transition to a sustainable future^1,2. Despite the significant progress in the development of highly efficient catalysts, heterogeneous catalysis remains largely an empirical science owing to the complexity of the underlying surface chemistry^3,4. Currently, there is a lack of data and design guidelines for heterogeneous catalysis because the computational cost of obtaining accurate theoretical models for such complex systems is currently prohibitively high while high-throughput experimental methods that have been applied successfully in related fields have not yet been thoroughly explored^5,6,7,8. Most of the important catalysts were discovered by chance or through trial-and-error processes extending over several years; the discovery of truly novel catalysts is still challenging⁹.

The recent revolution in data science is expected to accelerate the development of new catalysts significantly, and hence, impact catalysis research^{10,11,12,13,14}. Machine learning (ML) will play a central role in this paradigm shift. The application of ML-based approaches to catalysis^{15,16,17,18,19,20,21} and broader fields of chemistry and materials science has attracted considerable attention^{22,23,24,25,26,27}. Although proof-of-concept examples of reduction in time and cost of catalyst development have been demonstrated using ML-based approaches, most of the ML-based research is directed toward the resolution of benchmark problems, while truly novel compounds and materials have rarely been discovered^28,29. This is due to one of the most common limitations of ML—the assumed inability of the models to extrapolate and identify extraordinary materials beyond those present in the training dataset³⁰. In materials and catalysis informatics, we often desire to use ML models to discover an entirely new class of materials and catalysts with unprecedented combinations of elements. In this context, our group has developed a new ML approach wherein elemental features are used as input representations rather than inputting the catalyst compositions directly^31,32. Namely, each catalyst is represented as a set of elemental descriptors such as electronegativities and melting points, which are scaled by the element content, followed by aggregation into a single feature vector by a permutation-invariant readout operation (elementwise sort pooling, referred to as sorted weighted elemental descriptor (SWED))^31,32. This ML method can guide catalyst design and discovery in areas where there is limited overlap of catalyst compositions and even for elements that were previously never included in a given dataset, thereby enabling extrapolative and ambitious prediction beyond the training data. Other studies have also validated the possibility of such extrapolative prediction using relevant feature engineering/selection approaches³³. Despite the theoretical evidence on the possibilities of finding novel catalysts and exceptional materials through extrapolative prediction, the use of ML to identify truly new and exceptional materials has remained elusive³⁴.

In this study, we have applied the extrapolative ML approach to develop new multi-elemental catalysts based on supported Pt as an active metal and TiO₂ as a support for the low-temperature reverse water-gas shift (RWGS) reaction. This reaction was chosen because its product, CO, is an important intermediate in various well-established catalytic processes for manufacturing value-added chemicals; that is, the RWGS reaction enables highly flexible utilization of CO₂^35,36.

Results

ML-assisted discovery of RWGS catalysts

We explored M elements of up to five types for Pt(3)/M₁(X₁)-M₂(X₂)-M₃(X₃)-M₄(X₄)-M₅(X₅)/TiO₂ RWGS catalysts (3 wt% Pt, TiO₂ = P25). For M, elements with atomic number 3 (Li) through 83 (Bi), except for Be, B, C, N, O, P, S, As, Se, Tc, Te, Pm, Ta, Hg, Tl, halogens, noble gases, and platinum group metals, were used as catalyst components (50 elements in total). Each M element had a unique loading amount (X) for each catalyst. Thus, the total number of catalyst candidates easily exceeded 10¹¹ even though only integer values of up to 5 wt% were considered as the loading amount of M (₅₀C₅ × 5⁵ ≈ 800 billion). We have tested three types of ML approaches, each of which differs in the input representations of the catalysts: (i) a naive ML model, which uses only elemental compositions; (ii) an exploitative ML model, which uses both elemental compositions and elemental properties; and (iii) an explorative ML model, which uses only elemental properties. For the input representation of the elemental compositions, each catalyst was represented as a vector of the compositional fractions for all the 50 elements under consideration. On the other hand, for the input representation of the elemental properties, vectors of 8 selected elemental descriptors for each element, scaled by its composition fraction, were aggregated into a single feature vector by sum pooling. Therefore, the naive, exploitative, and explorative ML models had 50, 58, and 8 descriptor dimensions, respectively. The initial dataset consisting of 45 data points was constructed using the catalysts reported in our previous experimental study³⁷ and some catalysts fabricated in the present study (See the data directory in the GitHub repository https://github.com/shinya-mine); this dataset was set as “Iteration” = 0. We then trained the explorative ML model based on Extra-Trees regression (ETR)³⁸ with the initial dataset (45 data points), calculated the expected improvement (EI) for all the test points in the catalyst composition grid, selected several prominent catalyst candidates considering the EI values and catalyst variety, synthesized the catalysts using the sequential impregnation method, performed the RWGS reaction, and updated the dataset to close the loop (Supplementary Fig. 1). We continued this process for 44 loops to test 300 catalysts, as shown in Fig. 1. The explorative ML model was used in the initial effort to explore many elements, and because the model achieved the highest prediction accuracy among the three ML models. The exploitative ML model was used after the prediction accuracy reached a certain level (after 30 iterations). Although the naive ML model was not used for the catalyst discovery process in this study, its prediction results are given for comparison, because fractional representation in a one-hot encoding manner is known to perform as well as or better than many other featurization techniques when large datasets are used ref. ²⁹.

**Fig. 1: ML-assisted exploration of RWGS catalysts.**

Through experimental testing of 255 ML-predicted new catalysts corresponding to 44 cycles of the closed loop discovery system (ML prediction + experiment), we found more than 100 catalysts that showed higher activity than the previously reported high-performance catalyst (Pt(3)/Mo(10)/TiO₂)³⁷ (Fig. 1). In the early stages, the prediction accuracy of the ML model was not high; thus, finding good catalysts was difficult. However, as the amount of data increased and the prediction accuracy improved, we were able to identify good catalysts. This is widely known as the exploration–exploitation trade-off in ML, where we need to balance between “exploration” to obtain more data on uncertain parts and “exploitation” to rely on the already obtained data. Comparing the radar charts of the elemental descriptors for the best catalysts at each iteration (Fig. 1B) shows how the properties of each catalyst composition changed with successive iterations. Although our dataset is still small (300 data points) and the best prediction accuracy attained after 44 cycles (R² = 0.81) is not significantly high, the proposed design is iterative, i.e., a sequential experimental design. Thus, the focus is more on how to utilize the available data (even if the dataset is small in the statistical sense) to plan subsequent experiments and achieve better catalyst discovery. We believe that the prediction accuracies (up to R² = 0.81) achieved by a standard cross validation (CV) procedure (see the ML method section for details) would be sufficient to statistically sense promising directions for further research. It is also noteworthy that the obtained prediction accuracy (R² = 0.81) is somewhat higher than those attained in most ML studies using experimental data on heterogeneous catalysis and relevant material science topics, wherein the prediction accuracy is typically below R² = 0.8, even when experimental conditions are used as descriptors^{28,31,32,39,40,41,42}. The composition of the best catalyst discovered by this approach was Pt(3)/Rb(1)-Ba(1)-Mo(0.6)-Nb(0.2)/TiO₂, and it exhibited the highest CO formation rate per unit catalyst mass (mmol min⁻¹ g_cat⁻¹) at temperatures below 250 °C compared with the previously reported catalysts, while retaining 100% CO selectivity (Supplementary Table 4). Commercial water-gas shift catalysts⁴³ such as Cu/ZnO/Al₂O₃ (HiFUEL® W220) and FeCrCuO_x (HiFUEL® W210) were tested and found to be ineffective in this low temperature range (Supplementary Table 5). Control studies confirmed that all the components are necessary to obtain the highest CO formation rate. All the CO formation rates were tested at least three times, and the average values are shown in Supplementary Fig. 7, along with error bars representing the data range. Notably, Nb was not included in the original dataset (Fig. 2), and the identified catalyst composition could hardly be predicted even by human experts. The compositions of the second, third, and fourth best catalysts are Pt(3)/Mo(0.8)-Ba(0.7)-Na(0.4)-Ce(0.2)/TiO₂, Pt(3)/Rb(1)-Ba(1)-Mo(0.6)-Eu(0.4)/TiO₂, and Pt(3)/Tb(2)-Sm(1.5)-Ce(1.2)-Re(1.2)-Mo(0.6)/TiO₂, respectively. Note that we tested the performance of these top-four catalysts and the catalysts highlighted in the radar charts in Fig. 1 at least three times, and the reported values are the averages of these tests.

**Fig. 2: Visualization of RWGS catalyst datasets.**

The extrapolative search is driven by our coarse-grained abstraction of the feature representations (i.e., the descriptors of catalysts) rather than the ML model architecture. Typically, each element of a catalyst represents an individual coordinate in a search space; thus, the catalyst composition is represented in a one-hot encoding manner, for example, Mo 10 or Rb 1 Ba 1 Mo 0.6 Nb 0.2. By contrast, we used the feature representations describing each catalyst by elemental descriptors^31,32, i.e., not directly representing elements as distinct symbols but representing them as continuous quantities characterized by a user-chosen set of elemental properties, such as electronegativity and density (as seen in Fig. 1B). We believe that interpolating the targeted properties over this abstracted representation can lead to some out-of-training discovery, which we refer to as “extrapolative;” this includes catalysts containing elements never used in the training dataset. In addition, in this study, we used eight descriptors, and the descriptors have eight dimensions, resulting in lower dimensionality than the direct input representation that has 50 dimensions (50 elements). This low dimensionality for the explorative model may have contributed to its success by narrowing the search space.

For ML models, we primarily used tree-ensemble models that are equivalent to a histogram over data-dependent partitions. The tree-ensemble models make conservative predictions in the out-of-training regions (it is a histogram approximation, and any predicted values are the local averages of the training samples, even in the out-of-training regions). In that sense, our approach is based on highly safe/conservative predictions; nevertheless, it successfully found some catalysts containing elements not in the training data, which is worth emphasizing. Namely, our ML method can extrapolate from the perspective of materials science as it can identify new elements by moving across the periodic table, while it interpolates from a data science perspective within the elemental descriptor representations. The essential operation of ML prediction is grounded in the interpolation of the given data points; thus, no ML model architecture can directly make extrapolative predictions without further encoding any physics or data-independent hypotheses.

Note that we observe overfitting to the training data and a non-negligible gap between the training and test errors, as shown in Supplementary Figs. 11 and 15. This phenomenon, known as “benign/harmless overfitting,” is a topic of ongoing discussion in the field of ML^44,45,46. In principle, ETR works as a pseudo-piecewise-linear interpolation, and in cases where the number of data points is limited, interpolating noisy training data can provide more informative predictions than attempting to separate the noise from the data in such underspecified cases with small samples, as shown in Supplementary Fig. 4.

Figure 2A, B shows histograms of the component elements for our dataset which is composed of 300 experimental data with unique catalyst compositions including 50 elements. Elements Mo, Ba, and Nb appeared most frequently. The effect of the loading amount of some of the frequently appearing elements including Mo, Ba, Nb, Re, Rb, and Cs is shown in Supplementary Fig. 9. Catalysts having relatively low loading amounts of additive oxides (below 2 wt%) tend to show high CO formation rates.

Statistical analysis using ML

Although ML is often employed as a black box without any prior insight into what the model has actually learned, supervised ML models can be used to identify important chemical moieties influencing the prediction, even without any explicit knowledge of its underlying principles⁴⁷. Extrapolative ML can reveal not only the effective catalyst compositions but also the required elemental features and electronic properties for the precise designing of ideal catalysts. Feature-importance score and SHapley Additive exPlanations (SHAP)^48,49 analyses were used to understand the importance of the descriptors for ML prediction, as shown in Fig. 3A, B, respectively. Elemental properties such as group, electronegativity (EN), and density were identified as important factors. SHAP can be used to visualize the dependence of the model output (e.g. CO formation rate) on the value of each descriptor³¹. For example, relatively low values (red color in Fig. 3B) for the feature “group” are correlated to a high CO formation rate (SHAP value). The feature-importance score and SHAP analyses were also performed using the exploitative elemental descriptor representation because this method considers the elemental composition directly and facilitates the understanding of the contribution of the elements in the given data (Supplementary Fig. 16). For the catalyst composition, Mo, Tb, Na, and Ba were identified as important descriptors. The SHAP values were analyzed using waterfall plots for the two representative catalysts (Pt(3)/Rb(1)-Ba(1)-Mo(0.6)-Nb(0.2)/TiO₂ and Pt(3)/Mo(10)/TiO₂), as shown in Fig. 3C, D. The waterfall plot analysis reveals the descriptors that are responsible for the increase or decrease from the average value of the dataset (2.28) relative to the predicted value for each catalyst. EN, group, and oxide band gap (BG) values were found to strongly contribute to the high activity of our best catalyst (Pt(3)/Rb(1)-Ba(1)-Mo(0.6)-Nb(0.2)/TiO₂). Note that the summary plot shown in Fig. 3B describes overall predictions for all the datapoints used (300 datapoints here) whereas the waterfall plots (Fig. 3C, D) are designed to display explanations for individual predictions for each catalyst^48,49. This difference in methodology is reflected in the differences in ranking of important descriptors in each analysis method. Therefore, the summary plot is useful for obtaining information on the catalyst design guidelines for the RWGS reaction in general, whereas the waterfall plots provide more useful information on the reasons for the high (or low) activity shown by an individual catalyst. The waterfall plots for some additional catalysts are also included in Supplementary Figs. 13, 14, 17 and 18.

**Fig. 3: ML-assisted statistical analysis.**

Catalyst characterization

With the best catalyst composition in hand, we then performed structural analysis (Fig. 4, Supplementary Figs. 19–27, Supplementary Tables 6 and 7) and mechanistic studies (Fig. 5, Table 1, and Supplementary Figs. 28–33). This is important because investigations of extraordinary materials can provide new scientific insights. The X-ray diffraction pattern of Pt(3)/Rb(1)-Ba(1)-Mo(0.6)-Nb(0.2)/TiO₂ was essentially the same as that of pristine TiO₂ (P25) and showed peaks corresponding to both anatase and rutile phases (Supplementary Fig. 19). To investigate the morphologies and particle sizes of the introduced Mo and Pt species, high-angle annular dark-field scanning transmission electron microscopy (HAADF-STEM) was performed for TiO₂ (P25), Rb(1)-Ba(1)-Mo(0.6)-Nb(0.2)/TiO₂, and Pt(3)/Rb(1)-Ba(1)-Mo(0.6)-Nb(0.2)/TiO₂ (Fig. 4A). The oxide additive species was found to be highly dispersed over the TiO₂ surface. In addition, the Pt nanoparticles in Pt(3)/Rb(1)-Ba(1)-Mo(0.6)-Nb(0.2)/TiO₂ were highly dispersed, with an average Pt particle diameter of 1.8 nm (Supplementary Fig. 22). Comparison with the previously identified Pt(3)/Mo(10)/TiO₂ active catalyst (particle size of 2.6 nm)³⁷ revealed that the average particle size of the supported Pt was smaller in Pt(3)/Rb(1)-Ba(1)-Mo(0.6)-Nb(0.2)/TiO₂.

**Fig. 4: Structural analysis of the ML-identified RWGS catalyst.**

**Fig. 5: *Operando* spectroscopic studies.**

Table 1 Apparent reaction orders and activation energy (E_a) for the RWGS reaction over Pt(3)/Rb(1)-Ba(1)-Mo(0.6)-Nb(0.2)/TiO₂, Pt(3)/Rb(1)-Ba(1)-Mo(0.6)/TiO₂, Pt(3)/Mo(0.6)/TiO₂, and Pt(3)/TiO₂ catalyst

Full size table

X-ray absorption spectroscopy (XAS) was conducted to identify the chemical states of the introduced species in the RWGS catalyst (Fig. 3B and Supplementary Fig. 24). The Pt L₃-edge X-ray absorption near-edge structure (XANES) of the reduced Pt(3)/Rb(1)-Ba(1)-Mo(0.6)-Nb(0.2)/TiO₂ catalyst was identical to that of the Pt foil used as the reference. Extended X-ray absorption fine structure analysis shows the presence of Pt–Pt bond with coordination number of 5.6 at 2.75 Å (Supplementary Table 7). The observed distance is slightly shorter than that of the Pt–Pt bond observed in bulk Pt metal (2.76 Å), revealing the formation of nanoparticles⁵⁰ that were also found by STEM. Mo K-edge XANES showed that the shape and edge position of the unreduced Pt(3)/Rb(1)-Ba(1)-Mo(0.6)-Nb(0.2)/TiO₂ catalyst were identical to those of the reference MoO₃. For the reduced Pt(3)/Rb(1)-Ba(1)-Mo(0.6)-Nb(0.2)/TiO₂ sample, the absorption edge shifted toward lower energies, indicating the reduction of the Mo species upon pretreatment with H₂. X-ray photoelectron spectroscopy (XPS) measurements were conducted to identify the oxidation states of Mo (Fig. 4C). Peaks corresponding to Mo⁴⁺ were predominantly observed, in addition to small peaks of Mo⁶⁺ and Mo²⁺. The other additives, including Rb, Ba, and Nb, did not change their oxidation states and existed in the form of Rb₂O, BaO, and Nb₂O₅, respectively, after the reduction pretreatment with H₂ (Supplementary Fig. 26).

In situ CO adsorption IR spectroscopy experiments were conducted to examine the electronic state of the Pt species on a series of supported Pt catalysts to understand the effects of the introduced additives (Fig. 4D). All the spectra showed a peak at 2071–2077 cm^-1, corresponding to the CO bound to the on-top sites of the metallic Pt surface. The center of the CO adsorption peak shifted to higher wavenumbers, following the order Pt(3)/TiO₂, Pt(3)/Rb(1)-Ba(1)-Mo(0.6)/TiO₂, Pt(3)/Mo(0.6)/TiO₂ and Pt(3)/Rb(1)-Ba(1)-Mo(0.6)-Nb(0.2)/TiO₂. Therefore, the introduction of additives favors the formation of more electron-deficient metallic Pt⁰ species, compared to pristine Pt(3)/TiO₂, and is expected to improve the resistance to CO poisoning during the RWGS reaction. The same trend was also observed by XPS (Supplementary Fig. 27).

Mechanistic studies

Kinetic studies were conducted on the optimal catalyst (Pt(3)/Rb(1)-Ba(1)-Mo(0.6)-Nb(0.2)/TiO₂). The apparent activation energy (E_a), as calculated from the Arrhenius plot, was 45.6 kJ mol^-1 (Table 1 and Supplementary Fig. 28). Similarly, the E_a values of Pt(3)/Rb(1)-Ba(1)-Mo(0.6)/TiO₂, Pt(3)/Mo(0.6)/TiO₂, and Pt(3)/TiO₂ were 48.7, 52.8, and 58.4 kJ mol^-1, respectively. The apparent reaction orders with respect to H₂, CO₂, and CO were calculated to understand the effect of the introduced additives. The apparent reaction orders for both CO₂ and H₂ in the case of the catalyst with oxide additives decreased as compared with those for pristine Pt(3)/TiO₂, indicating weaker dependence on their concentrations. In addition, the reaction order with respect to CO was the smallest for Pt(3)/Rb(1)-Ba(1)-Mo(0.6)/TiO₂, indicating less inhibitory effect of CO for the best catalyst. This result is consistent with the results of the in situ IR and XPS experiments. These combined results indicate that the introduction of Nb renders Pt more electron-deficient and induces high tolerance to CO poisoning, leading to a high catalytic activity. The CO₂-TPD analysis of the catalysts without Pt (Supplementary Fig. 29) suggested that the introduced additives could facilitate the adsorption of CO₂ owing to the introduced base metal oxides, particularly Rb and Ba, thereby promoting the reaction efficiently.

The RWGS reaction is known to proceed mainly via the (i) redox mechanism and (ii) associative mechanism⁵¹. In the former, oxygen vacancies are formed on the surface of the support oxide by H₂, while CO₂ reoxidizes the partially reduced oxide to fill the formed oxygen vacancies⁵², resulting in the formation of CO. In the latter mechanism, CO is produced through the decomposition of the surface-reactive intermediates such as formates and carbonates⁵¹.

To elucidate the reaction mechanism, operando XANES measurements were conducted under CO₂, H₂, and CO₂ + H₂ flow at 250 °C (Fig. 5). The Mo K-edge XANES spectra of Pt(3)/Rb(1)-Ba(1)-Mo(0.6)-Nb(0.2)/TiO₂ show that the absorption edge shifts to higher energies after the introduction of CO₂, while CO was simultaneously detected by GC. The results clearly demonstrated that CO₂ acted as an oxidant to oxidize the Mo species. Notably, CO was formed even upon the introduction of H₂, suggesting that the reaction also proceeded through the associative mechanism. For the Pt L₃-edge (Supplementary Fig. 30), the white line intensity became slightly stronger under CO₂ flow, suggesting that metallic Pt was also oxidized by CO₂. Note that this change can be solely because of the adsorption of the CO formed, as it is well-known that the Pt L₃-edge XANES intensity and shape is altered by the adsorption of CO⁵³. The K-edge XANES spectra of Ti, Ba, Rb, and Nb were also obtained employing a protocol similar to that described above (Supplementary Fig. 30). The edge positions in all these XANES spectra hardly changed following the introduction of CO₂, indicating that no redox reactions of TiO₂, BaO, Rb₂O, and Nb₂O₅ occurred during the RWGS reaction.

Operando IR spectroscopy was also performed to investigate the adsorbed surface species that are likely to be involved in the RWGS reaction (Fig. 5B). Bands in the range 1700–1200 cm^-1, which can be assigned to the surface-adsorbed species such as carbonate and formate⁵¹, appeared immediately after the introduction of CO₂. Simultaneous formation of CO in the gas phase was also observed using an IR gas cell at the outlet. Bands at 2100–1950 cm⁻¹, which can be assigned to the adsorbed CO, were also observed. The amount of these surface species over the best catalyst was higher than those over Pt(3)/Mo(0.6)/TiO₂ and Pt(3)/TiO₂, yet lower than that over Pt(3)/Rb(1)-Ba(1)-Mo(0.6)/TiO₂ without Nb (Supplementary Fig. 31). The evolution of the bands in the ν_CH region (2800–2960 cm⁻¹) also supports the formation of formate species under the flow of CO₂ and H₂. These results indicate that the Ba and Rb species act as base components to generate the surface-adsorbed species that lead to the formation of CO. To confirm this, H₂ was introduced to the Pt(3)/Rb(1)-Ba(1)-Mo(0.6)-Nb(0.2)/TiO₂ catalyst with such adsorbed species, as shown in Fig. 5B and Supplementary Fig. 33. Note that for this purpose, a lower temperature (200 °C) was employed to clearly observe the adsorbate peaks. Intensities of the bands between 1700 and 1200 cm⁻¹ decreased upon the introduction of H₂, and simultaneous formation of CO in the gas phase was observed. These operando XAS and IR results indicated that Mo acted as a redox species while Rb and Ba acted as bases to promote the RWGS reaction. Nb was not directly involved in the reaction; it rather modified the electronic structure of Pt, ensuring high CO tolerance. These multiple functions realized by the combination of the oxide additives identified are vital for achieving high catalytic performance.

Catalyst durability

Finally, a durability test was conducted (Fig. 6). For the optimal Pt(3)/Rb(1)-Ba(1)-Mo(0.6)-Nb(0.2)/TiO₂ catalyst, the CO yield after 1 h time-on-stream was observed as 8.0% with the corresponding CO formation rate of 3.34 mmol min⁻¹ g⁻¹. Note that 100% CO selectivity was retained throughout the durability test. Although the CO yield decreased gradually over time, the CO formation rate after 300 h time-on-stream was still 2.52 mmol min⁻¹ g⁻¹. For comparison, the catalytic stabilities of Pt(3)/Rb(1)-Ba(1)-Mo(0.6)/TiO₂, Pt(3)/Mo(0.6)/TiO₂, Pt(3)/TiO₂, Pt(3)/Mo(10)/TiO₂ (reported previously by our group)³⁷ and a commercial Cu/ZnO/Al₂O₃ catalyst were also evaluated under the same reaction conditions. The CO yields obtained over these reference supported Pt catalysts were all lower than that on Pt(3)/Rb(1)-Ba(1)-Mo(0.6)-Nb(0.2)/TiO₂ throughout the durability test time period. Although the Cu/ZnO/Al₂O₃ catalyst exhibited relatively good stability for RWGS reaction under our conditions, its activity is much lower than that of the supported Pt catalysts. We also compared the degree of the activity loss for each catalyst (r_{CO, t}/r_CO,initial). It is observed that the optimal Pt(3)/Rb(1)-Ba(1)-Mo(0.6)-Nb(0.2)/TiO₂ is comparable to Cu/ZnO/Al₂O₃ even for this criterion. Therefore, the optimal Pt(3)/Rb(1)-Ba(1)-Mo(0.6)-Nb(0.2)/TiO₂ predicted by ML method is an outstanding state-of-the-art catalyst for the low-temperature (250 °C) RWGS reaction.

Discussion

In summary, using the extrapolative ML method, we discovered over 100 catalysts that produced higher activity than the previously reported best catalyst (Pt(3)/Mo(10)/TiO₂). The composition of the optimal discovered catalyst was Pt(3)/Rb(1)-Ba(1)-Mo(0.6)-Nb(0.2)/TiO₂. This unique composition could not be predicted by human experts in catalysis; therefore, computational methods, such as ML, would be required to design effective catalysts. Notably, Nb was absent in the original dataset, highlighting the effectiveness of our extrapolative ML model. We also used ML analysis to identify the physical and chemical properties that governed the catalytic activity. Our ML model revealed the effective catalyst compositions as well as the elemental features and electronic properties required for catalytic activity. Experimental mechanistic studies using in situ/operando techniques were also performed to explore the role of each catalyst component and the reaction mechanism. The obtained results indicated that Mo acted as a redox species, whereas Rb and Ba acted as bases to promote the RWGS reaction. By contrast, Nb did not directly participate in the reaction but instead altered the electronic structure of Pt, increasing the CO tolerance. Our study presents a new approach for discovering novel catalysts and materials that show extraordinary performance. Although we focused on investigating the effect of the catalyst composition only on the catalytic performance to limit the search space without changing the experimental conditions, we are aware that the preparation processes can significantly influence the structure of catalysts, which, in turn, can result in variations in the catalytic performance. Further studies are needed to explore the effect of altering the experimental conditions by using ML, even though that will necessitate a considerably large number of experiments. In addition, full optimization of catalysts is desired because we only dealt with exploring the additive oxide of the catalysts. Supported metals and supports instead of Pt and TiO₂ should also be explored. For this, we can use the same feature engineering strategy by harnessing the intrinsic properties of supported metals and supports. For instance, we can use “support descriptors” such as specific surface areas, band gaps, and acidity (which can be measured experimentally) for the support materials. In the future, we expect our study to facilitate the development of novel catalysts.

Methods

Chemicals

Chemicals and materials were purchased from commercial suppliers and used without further purification. TiO₂ (P25) having both anatase and rutile phases was obtained from Evonik (formerly Degussa). TiO₂ STR-100N having rutile phase was provided by Sakai Chemical Industry, while TiO₂ ST-01 with anatase phase was obtained from Ishihara Sangyo. The carbon and γ-Al₂O₃ (Puralox) supports were commercially obtained from Kishida Chemical and Sasol, respectively. ZrO₂ (JRC-5) was supplied by the Catalysis Society of Japan. SiO₂ (CariACT Q-10) was purchased from Fuji Silysia Chemical Company Ltd. Nb₂O₅ was prepared by calcination of niobic acid (Nb₂O₅ ∙ nH₂O, HY-340) supplied from CBMM (Companhia Brasileira de Metalurgia e Mineração) at 500 °C for 3 h. CeO₂ (Type-A) support was provided by Daiichi Kigenso Kagaku Kogyo Co., Ltd. The industrial CuZnAl catalyst known as a copper-based low-temperature water-gas shift catalyst (HiFUEL® W220; CuO = 52 wt%, ZnO = 30 wt%, Al₂O₃ = 17 wt%) and the FeCrCuO_x catalyst known as an iron–chrome-based high-temperature water-gas shift catalyst (HiFUEL® W210; Fe₂O₃ = 82.7 wt%, Cr₂O₃ = 7 wt%, CuO = 5 wt%) were purchased from Alfa Aesar.

Preparation of the catalysts

Pt(3)/M₁(X₁)-M₂(X₂)-M₃(X₃)-M₄(X₄)-M₅(X₅)/TiO₂ (3 wt% Pt, TiO₂ = P25, X_x is the loading amount of M_x) was prepared using the sequential impregnation method. Elements M having atomic numbers from 3 (Li) to 83 (Bi), except for Be, B, C, N, O, P, S, As, Se, Tc, Te, Pm, Ta, Hg, Tl, halogens, noble gases, and platinum group metals, were used as catalyst components in this work. For the source and purity of the chemicals, please see Supplementary Table 1. First, the single or multiple additive components supported TiO₂ (M₁(X₁)-M₂(X₂)-M₃(X₃)-M₄(X₄)-M₅(X₅)/TiO₂) was prepared by the impregnation method. In the process, a mixture of related amount of TiO₂ support and corresponding sources of M elements was charged in a 100 mL glass vessel containing an appropriate amount of deionized water and stirred for 15 min with 200 rpm agitation at room temperature. The mixture was evaporated to dryness at 50 °C, dried at 110 °C for 12 h, and calcinated at 500 °C in air for 3 h to give M₁(X₁)-M₂(X₂)-M₃(X₃)-M₄(X₄)-M₅(X₅)/TiO₂. The formed M₁(X₁)-M₂(X₂)-M₃(X₃)-M₄(X₄)-M₅(X₅)/TiO₂ was then impregnated in an aqueous HNO₃ solution of Pt(NH₃)₂(NO₃)₂ under magnetic stirring. The mixture was evaporated to dryness at 50 °C and further dried in air at 110 °C for 12 h to give PtO₂/M₁(X₁)-M₂(X₂)-M₃(X₃)-M₄(X₄)-M₅(X₅)/TiO₂ (unreduced sample). The catalyst used for the RWGS reaction was prepared by reduction of PtO₂/M₁(X₁)-M₂(X₂)-M₃(X₃)-M₄(X₄)-M₅(X₅)/TiO₂ in a quartz tube under a flow of H₂ (40 mL min^-1) at 300 °C for 0.5 h to give Pt(3)/M₁(X₁)-M₂(X₂)-M₃(X₃)-M₄(X₄)-M₅(X₅)/TiO₂.

Other supported catalysts were prepared by the same method described above by using M₁(X₁)-M₂(X₂)-M₃(X₃)-M₄(X₄)-M₅(X₅)/TiO₂ or M₁(X₁)-M₂(X₂)-M₃(X₃)-M₄(X₄)-M₅(X₅)/Support and other metal sources including aqueous solutions of NH₄ReO₄, RuCl₃, IrCl₃·nH₂O, AgNO₃ and aqueous HNO₃ solutions of Rh(NO₃)₃ and Pd(NH₃)₂(NO₃)₂.

Catalysts characterization

High-angle annular dark-field scanning transmission electron microscopy (HAADF-STEM) and energy dispersive X-ray spectroscopy (EDX) analysis were performed using an FEI Titan G2 microscope. Samples were prepared by dropping an ethanol solution containing the catalyst on carbon-supported Cu grids. XPS characterization was carried out on a JEOL JPS-9010MC spectrometer using Mg Kα (1253.6 eV) radiation. Binding energies were calibrated based on the C1s peak energy (285.0 eV). The samples were examined after the H₂ reduction pretreatment using a transfer vessel in order to avoid exposure to air. XPS spectra were analyzed by convolution of Gaussian and Lorentzian functions with a Shirley background.

In situ/ operando IR spectra were recorded on a JASCO FT/IR-4600 equipped with a mercury-cadmium-telluride detector and a quartz IR cell connected to a conventional flow system (100 mL min⁻¹). The sample was pressed into a 40 mg self-supporting wafer and mounted in the quartz IR cell with CaF₂ windows. Spectra were acquired by accumulating 20 scans at a resolution of 4 cm⁻¹. The reference spectrum of the catalyst wafer in He flow taken at the measurement temperature was subtracted from each spectrum.

Pt L₃-edge, Rb K-edge, Mo K-edge, Ba K-edge, and Ti K-edge XAS measurements were performed in a transmission mode, while Nb K-edge XAS were performed in a fluorescence mode at the BL14B2 of SPring-8 at the Japan Synchrotron Radiation Research Institute (Proposal No. 2021B1840 and 2022A1736). A Si(311) double crystal monochromator was used for the Pt L₃-edge, Rb K-edge, Nb K-edge, Mo K-edge, and Ba K-edge XAS measurements, while a Si(111) double crystal monochromator was used for the Ti K-edge XAS measurements. For operando XAS measurements, a high-sampling-rate TCD GC (490 Micro GC; Agilent Technologies Inc.) was used for the quantitative analysis of CO and CH₄. A mass spectrometer (BELMass; MicrotracBEL Corp.) was also used to monitor the eluent gas. Samples in pellet form (∅7 mm) were introduced into a cell equipped with Kapton film windows and gas lines connecting to the GC. Pretreatment of the samples involved heating under a flow of H₂ (300 mL min⁻¹) at 300 °C for 30 min. Subsequently, 25% CO₂/He (400 mL min⁻¹), 75% H₂/He (400 mL min⁻¹), and CO₂ (100 mL min⁻¹) + H₂ (300 mL min⁻¹) were introduced into the cell with intervals of He purge between the gas introduction steps. Note that boron nitride was used to make a pellet sample when the required amount is <40 mg. Spectra of reference compounds were recorded at room temperature in air. The obtained XAS spectra were analyzed using the Athena and Artemis software ver. 0.9.25 included in the Demeter package⁵⁴.

Catalytic reverse water-gas shift reactions

RWGS reactions were carried out in a fixed bed continuous flow reactor under atmospheric pressure. A straight quartz tube with an inner diameter of 4 mm was used. The catalyst (typically 10 mg) was pretreated under H₂ flow (40 mL min⁻¹) at 300 °C for 30 min prior to each activity test. Catalytic activity was measured at the temperature of 250 °C under the following composition of feed gas: 20 mL min⁻¹ CO₂, 60 mL min⁻¹ H₂, and 5 mL min⁻¹ N₂ added as an internal standard for quantitative analysis. The gas flows were controlled by mass flow controllers. The effluent gas phase was allowed to pass through an ice-bath unit to remove the water vapor and then analyzed online using a gas chromatograph (Agilent 490 Micro GC) equipped with Molsieve 5 Å and PoraPLOT Q columns and TCD detector.

ML methods

As elemental descriptors, we selected the following eight parameters: electronegativity (EN) according to the Allred-Rochow’s definition, melting point (m.p.), enthalpy of formation (∆H_fus), density, the group of the periodic table, BG in the most stable oxide from, oxidation number in the most stable oxide form, and adsorption energy (E_ads) of CO₂ on the metallic surface.

We used ETR³⁸ as an ML model. Widely used implementations of scikit-learn (version 0.23.2)⁵⁵ were employed for all ML models. For hyperparameter tuning, we tested a reasonable range of candidate values in an exhaustive way (grid search) shown in Supplementary Table 2, chose the best hyperparameter by 5-fold CV on the training set, and used the model for calculating the predicted values for the test set (the hyperparameters not explicitly indicated in the table were set to the scikit-learn defaults). Namely, to avoid data leakage, we strictly followed a standard practice of “nested” CV, also known as double CV, to estimate the prediction accuracies; we used 5-fold CV for the internal CV, and used Monte Carlo CV (also known as repeated random subsampling CV) with 100-times of random leave-20%-out trials for the external CV to increase the statistical reliability for validating the test prediction accuracies with fixing the number of training data.

We have used three types of ML approaches that differ in the input representations of catalysts; (i) naive ML model that uses only elemental compositions, (ii) exploitative ML model that uses both elemental compositions and elemental properties, and (iii) explorative ML model that uses only elemental properties. For the input representations of elemental compositions, each catalyst was represented as a vector of compositional fractions of each element for all 50 elements under consideration, i.e., $({c}_{1},\, {c}_{2},\, {c}_{3},\cdots,{c}_{50})$ where ${c}_{i}$ is the compositional fraction of the i-th element. For the input representation of elemental properties, each catalyst is represented as the sum of vectors of each elemental descriptor scaled by its compositional fraction, i.e., for a catalyst Pt(3)/M₁(X₁)-M₂(X₂)-M₃(X₃)-M₄(X₄)-M₅(X₅)/TiO₂,

$${X}_{1}\,{vec}({M}_{1})+{X}_{2}\,{vec}({M}_{2})+{X}_{3}\,{vec}({M}_{3})+{X}_{4}\,{vec}({M}_{4})+{X}_{5}\,{vec}({M}_{5}),$$

(1)

where ${vec}({M}_{i})$ is the elemental descriptor vector for element ${M}_{i}$, which is also called the composition-based feature vector in the literature³³. The former representation generates 50-dimensional features and tends to be very sparse and statistically uninformative when the training dataset is not large but contains many elements. Moreover, it is incapable of handling elements that are absent or statistically infrequent in the training data. On the other hand, the latter representation has the same dimension as the user-specified elemental descriptor that often produces statistically much more stable results for small-data problems and is not explicitly constrained by the elements covered in the training dataset. Moreover, technically, in the latter representation, each catalyst is represented as a set of elemental descriptors and scaled by its composition fraction and aggregated into a single feature vector for the given catalyst by sum pooling, a permutation-invariant operation.

Notably, the explorative ML model that represents catalysts only with respect to their physico-chemical properties via certain descriptors without directly specifying the individual contributions of distinct elements, enables a more extrapolative and ambitious exploration beyond the training data even to find unseen elements. In our previous study utilizing these ML approaches for the analysis of reaction data on oxidative coupling of methane (OCM)³¹, we also developed a procedure to recover the catalyst composition from the elemental property representation because the composition information is indispensable for catalyst synthesis. We employed a “local search” to find new catalyst candidates. However, in the present study, we employed the “grid search” approach to suggest new catalyst candidates by manually specifying the loading amount of each element M in order to perform global optimization. In this approach, we do not need to use the recovery procedure but rather calculate the expected improvement (EI)⁵⁶ score that is obtained using the following equation for the given compositions.

$${EI}\left(x\right)\,= \, {\mathbb{E}}\left\{\max \left(\mu \left(x\right)-{y}^{*},0\right)\right\}=\left(\mu \left(x\right)-{y}^{*}\right)\cdot \Phi \left(\frac{\mu \left(x\right)-{y}^{*}}{\sigma (x)}\right)\\ + \sigma (x)\cdot {{{{{\rm{\phi }}}}}}\left(\frac{\mu \left(x\right)-{y}^{*}}{\sigma (x)}\right)$$

(2)

Here, $\mu \left(x\right)$ and $\sigma \left(x\right)\,$ are the predicted value and the standard deviation of an ML surrogate for an input $x$, while the expectation ${\mathbb{E}}$ assumes a Gaussian distribution with a PDF of ${{{{{\rm{\phi }}}}}}$ and CDF of $\Phi$. EI scores can be intuitively considered as a quantity that indicates how much improvement over the current best ${y}^{*}\,$ can be expected for an input $x$. The EI is schematically presented in Supplementary Fig. 3.

Clustering was typically performed to group very similar candidates into K clusters. In cases where clustering was not used, we simply selected the catalysts based on the top proposed catalyst compositions. We normally used K = 100 because the elbow and silhouette analyses suggested that 100 was the optimal number of clusters. The elbow method was employed to find the point of inflection (elbow) in the plot of the explained variation as a function of the number of clusters, serving as a criterion for determining the optimal number of clusters. The silhouette analysis was applied to quantify the similarity among the observations within a cluster, thus providing additional support for identifying the optimal number of clusters. A representative analysis result using the 300 data points (See the data directory in the GitHub repository https://github.com/shinya-mine) with explorative ML methods based on ETR (Supplementary Fig. 5) revealed that K = 50–100 is optimal. In addition, no clusters had silhouette scores below the average when K = 100 (with N = 10 perturbations).

Procedure of ML-assisted RWGS catalysts discovery

The initial dataset consisting of 45 data points was constructed using catalysts reported in our previous experimental study and some new catalysts synthesized for the present study, as given in the data directory of our GitHub repository and labeled as “Iteration” = 0 (https://github.com/shinya-mine). We suggested the next catalyst candidates using the explorative ML model based on ETR and the initial dataset (45 data points), picked some suggested catalysts according to the EI ranking, synthesized the catalysts using the sequential impregnation method, performed the RWGS reaction, and updated the dataset to close the loop (Supplementary Fig. 1). Subsequently, we suggested the next catalyst candidates using the explorative ML model based on ETR and the updated dataset (50 data points) and performed the experiments according to the ML prediction to further update the dataset. We continued this procedure until we performed 44 loops to test 300 catalysts. Since we typically performed the clustering with K = 100, as mentioned above, our ML pipeline gave a list of 100 top-ranking candidates at each iteration, and we chose the catalysts for the actual experiments from this list. As it is practically difficult to test all the 100 candidates in actual experiments, only some of the suggested catalysts were tested (i.e., not all the 100 candidates were experimentally tested). The selection from the top 100 candidates suggested by the ML approach was manually performed by considering the diversity of the catalyst compositions. ETR was used throughout in this study. Only the explorative ML model was used for the initial effort because we wanted to explore many elements and its prediction accuracy was the highest among the three ML models at the initial stage while the exploitative ML model was also used after 30 iterations.

Data availability

The source data, which support the result of this study, can be found in the manuscript and Supplementary information. All experimental data used for machine learning are available in Excel format on the URL and can be freely used (https://github.com/shinya-mine).⁵⁷

Code availability

All machine learning codes used in this study were written under the anaconda distribution environment of python3 (https://www.anaconda.com) and can be found online at https://github.com/shinya-mine⁵⁷. The VASP code package used in this work can be accessible after a user license is authorized by the VASP company (https://www.vasp.at).

References

Yarulina, I., Chowdhury, A. D., Meirer, F., Weckhuysen, B. M. & Gascon, J. Recent trends and fundamental insights in the methanol-to-hydrocarbons process. Nat. Catal. 1, 398–411 (2018).
CAS Google Scholar
Nielsen, D. U., Hu, X. M., Daasbjerg, K. & Skrydstrup, T. Chemically and electrochemically catalysed conversion of CO₂ to CO with follow-up utilization to value-added chemicals. Nat. Catal. 1, 244–254 (2018).
CAS Google Scholar
Wang, Y., Kalscheur, J., Su, Y. Q., Hensen, E. J. M. & Vlachos, D. G. Real-time dynamics and structures of supported subnanometer catalysts via multiscale simulations. Nat. Commun. 12, 5430 (2021).
ADS CAS PubMed PubMed Central Google Scholar
Pablo-García, S. et al. Generalizing performance equations in heterogeneous catalysis from hybrid data and statistical learning. ACS Catal. 12, 1581–1594 (2022).
Google Scholar
Ulissi, Z. W., Medford, A. J., Bligaard, T. & Nørskov, J. K. To address surface reaction network complexity using scaling relations machine learning and DFT calculations. Nat. Commun. 8, 14621 (2017).
ADS PubMed PubMed Central Google Scholar
Grajciar, L. et al. Towards operando computational modeling in heterogeneous catalysis. Chem. Soc. Rev. 47, 8307–8348 (2018).
CAS PubMed PubMed Central Google Scholar
McCullough, K., Williams, T., Mingle, K., Jamshidi, P. & Lauterbach, J. High-throughput experimentation meets artificial intelligence: A new pathway to catalyst discovery. Phys. Chem. Chem. Phys. 22, 11174–11196 (2020).
CAS PubMed Google Scholar
Resasco, J. et al. Enhancing the connection between computation and experiments in electrocatalysis. Nat. Catal. 5, 374–381 (2022).
Google Scholar
Ras, E. J. & Rothenberg, G. Heterogeneous catalyst discovery using 21st century tools: a tutorial. RSC Adv. 4, 5963–5974 (2014).
ADS CAS Google Scholar
Kitchin, J. R. Machine learning in catalysis. Nat. Catal. 1, 230–232 (2018).
Google Scholar
Takahashi, K. et al. The rise of catalyst informatics: towards catalyst genomics. ChemCatChem 11, 1146–1152 (2019).
CAS Google Scholar
Chanussot, L. et al. Open Catalyst 2020 (OC20) Dataset and community challenges. ACS Catal. 11, 6059–6072 (2021).
CAS Google Scholar
Erdem Günay, M. & Yıldırım, R. Recent advances in knowledge discovery for heterogeneous catalysis using machine learning. Catal. Rev. 63, 120–164 (2021).
Google Scholar
Fung, V., Hu, G., Ganesh, P. & Sumpter, B. G. Machine learned features from density of states for accurate adsorption energy prediction. Nat. Commun. 12, 88 (2021).
ADS CAS PubMed PubMed Central Google Scholar
Schmack, R. et al. A meta-analysis of catalytic literature data reveals property-performance correlations for the OCM reaction. Nat. Commun. 10, 441 (2019).
ADS CAS PubMed PubMed Central Google Scholar
Esterhuizen, J. A., Goldsmith, B. R. & Linic, S. Uncovering electronic and geometric descriptors of chemical activity for metal alloys and oxides using unsupervised machine learning. Chem. Catal. 1, 923–940 (2021).
CAS Google Scholar
Wang, S. H., Pillai, H. S., Wang, S., Achenie, L. E. K. & Xin, H. Infusing theory into deep learning for interpretable reactivity prediction. Nat. Commun. 12, 5288 (2021).
ADS CAS PubMed PubMed Central Google Scholar
Wulf, C. et al. A unified research data infrastructure for catalysis research—challenges and concepts. ChemCatChem 13, 3223–3236 (2021).
CAS Google Scholar
Mazheika, A. et al. Artificial-intelligence-driven discovery of catalyst genes with application to CO₂ activation on semiconductor oxides. Nat. Commun. 13, 419 (2022).
ADS CAS PubMed PubMed Central Google Scholar
Pedersen, J. K. et al. Bayesian optimization of high-entropy alloy compositions for electrocatalytic oxygen reduction. Angew. Chem. Int. Ed. 60, 24144–24152 (2021).
CAS Google Scholar
Keith, J. A. et al. Combining machine learning and computational chemistry for predictive insights into chemical systems. Chem. Rev. 121, 9816–9872 (2021).
CAS PubMed PubMed Central Google Scholar
Butler, K. T., Davies, D. W., Cartwright, H., Isayev, O. & Walsh, A. Machine learning for molecular and materials science. Nature 559, 547–555 (2018).
ADS CAS PubMed Google Scholar
Shields, B. J. et al. Bayesian reaction optimization as a tool for chemical synthesis. Nature 590, 89–96 (2021).
ADS CAS PubMed Google Scholar
Rinehart, N. I., Zahrt, A. F., Henle, J. J. & Denmark, S. E. Dreams, False starts, dead ends, and redemption: a chronicle of the evolution of a chemoinformatic workflow for the optimization of enantioselective catalysts. Acc. Chem. Res. 54, 2041–2054 (2021).
CAS PubMed Google Scholar
Rosen, A. S. et al. Machine learning the quantum-chemical properties of metal–organic frameworks for accelerated materials discovery. Matter 4, 1578–1597 (2021).
CAS Google Scholar
Choudhary, K. et al. Recent advances and applications of deep learning methods in materials science. npj Comput. Mater. 8, 59 (2022).
ADS Google Scholar
Oviedo, F., Ferres, J. L., Buonassisi, T. & Butler, K. T. Interpretable and explainable machine learning for materials science and chemistry. Acc. Mater. Res. 3, 597–607 (2022).
CAS Google Scholar
Toyao, T. et al. Machine learning for catalysis informatics: recent applications and prospects. ACS Catal. 10, 2260–2297 (2020).
CAS Google Scholar
Murdock, R. J., Kauwe, S. K., Wang, A. Y. T. & Sparks, T. D. Is domain knowledge necessary for machine learning materials properties? Integr. Mater. Manuf. Innov. 9, 221–227 (2020).
Google Scholar
Kauwe, S. K., Graser, J., Murdock, R. & Sparks, T. D. Can machine learning find extraordinary materials? Comput. Mater. Sci. 174, 109498 (2020).
Google Scholar
Mine, S. et al. Analysis of updated literature data up to 2019 on the oxidative coupling of methane using an extrapolative machine-learning method to identify novel catalysts. ChemCatChem 13, 3636–3655 (2021).
CAS Google Scholar
Mine, S. et al. Machine learning analysis of literature data on the water gas shift reaction toward extrapolative prediction of novel catalysts. Chem. Lett. 51, 269–273 (2022).
CAS Google Scholar
Wang, A. Y. T., Kauwe, S. K., Murdock, R. J. & Sparks, T. D. Compositionally restricted attention-based network for materials property predictions. npj Comput. Mater. 7, 77 (2021).
ADS Google Scholar
Falkowski, A. R., Kauwe, S. K. & Sparks, T. D. Optimizing fractional compositions to achieve extraordinary properties. Integr. Mater. Manuf. Innov. 10, 689–695 (2021).
Google Scholar
Porosoff, M. D., Yan, B. & Chen, J. G. Catalytic reduction of CO₂ by H₂ for synthesis of CO, methanol and hydrocarbons: Challenges and opportunities. Energy Environ. Sci. 9, 62–73 (2016).
CAS Google Scholar
Zhang, W., Ma, D., Pérez-Ramírez, J. & Chen, Z. Recent progress in materials exploration for thermocatalytic, photocatalytic, and integrated photothermocatalytic CO₂‐to‐fuel conversion. Adv. Energy Sustain. Res. 3, 2100169 (2022).
CAS Google Scholar
Mine, S. et al. Reverse water-gas shift reaction over Pt/MoO_x/TiO₂: reverse Mars-van Krevelen mechanism via redox of supported MoO_x. Catal. Sci. Technol. 11, 4172–4180 (2021).
CAS Google Scholar
Geurts, P., Ernst, D. & Wehenkel, L. Extremely randomized trees. Mach. Learn. 63, 3–42 (2006).
MATH Google Scholar
Henle, J. J. et al. Development of a computer-guided workflow for catalyst optimization. descriptor validation, subset selection, and training set analysis. J. Am. Chem. Soc. 142, 11578–11592 (2020).
CAS PubMed Google Scholar
Miyake, Y. & Saeki, A. Machine learning-assisted development of organic solar cell materials: issues, analyses, and outlooks. J. Phys. Chem. Lett. 12, 12391–12401 (2021).
CAS PubMed Google Scholar
Huo, H. et al. Machine-learning rationalization and prediction of solid-state synthesis conditions. Chem. Mater. 34, 7323–7336 (2022).
CAS PubMed PubMed Central Google Scholar
Suvarna, M., Preikschas, P. & Pérez-Ramírez, J. Identifying descriptors for promoted rhodium-based catalysts for higher alcohol synthesis via machine learning. ACS Catal. 12, 15373–15385 (2022).
CAS PubMed Google Scholar
Juneau, M. et al. Assessing the viability of K-Mo₂C for reverse water-gas shift scale-up: Molecular to laboratory to pilot scale. Energy Environ. Sci. 13, 2524–2539 (2020).
CAS Google Scholar
Belkin, M., Hsu, D. & Mitra, P. P. Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate. Adv. Neural Inf. Proc. Syst. https://doi.org/10.1145/3422818 (2018).
Belkin, M., Hsu, D., Ma, S. & Mandal, S. Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proc. Natl Acad. Sci. USA 116, 15849–15854 (2019).
ADS MathSciNet CAS PubMed MATH PubMed Central Google Scholar
Bartlett, P. L., Long, P. M., Lugosi, G. & Tsigler, A. Benign overfitting in linear regression. Proc. Natl Acad. Sci. USA 117, 30063–30070 (2020).
ADS MathSciNet CAS PubMed MATH PubMed Central Google Scholar
Esterhuizen, J. A., Goldsmith, B. R. & Linic, S. Interpretable machine learning for knowledge generation in heterogeneous catalysis. Nat. Catal. 5, 175–184 (2022).
Google Scholar
Lundberg, S. M. & Lee, S. I. in Advances Neural Information Processing Systems 4765–4774 (ACM, 2017).
Lundberg, S. M. et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat. Biomed. Eng. 2, 749–760 (2018).
PubMed PubMed Central Google Scholar
Kang, J. H., Menard, L. D., Nuzzo, R. G. & Frenkel, A. I. Unusual non-bulk properties in nanoscale materials: Thermal metal-metal bond contraction of γ-alumina-supported Pt catalysts. J. Am. Chem. Soc. 128, 12068–12069 (2006).
CAS PubMed Google Scholar
Bobadilla, L. F., Santos, J. L., Ivanova, S., Odriozola, J. A. & Urakawa, A. Unravelling the Role of Oxygen Vacancies in the Mechanism of the Reverse Water-Gas Shift Reaction by Operando DRIFTS and Ultraviolet-Visible Spectroscopy. ACS Catal. 8, 7455–7467 (2018).
CAS Google Scholar
Mironenko, A. V. & Vlachos, D. G. Conjugation-driven ‘reverse Mars-van Krevelen’- type radical mechanism for low-temperature C-O bond activation. J. Am. Chem. Soc. 138, 8104–8113 (2016).
CAS PubMed Google Scholar
Safonova, O. V. et al. Identification of CO adsorption sites in supported Pt catalysts using high-energy-resolution fluorescence detection X-ray spectroscopy. J. Phys. Chem. B 110, 16162–16164 (2006).
CAS PubMed Google Scholar
Ravel, B. & Newville, M. ATHENA, ARTEMIS, HEPHAESTUS: Data analysis for X-ray absorption spectroscopy using IFEFFIT. J. Synchrotron Radiat. 12, 537–541 (2005).
CAS PubMed Google Scholar
Pedregosa, F. & Varoquaux, G. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res 12, 2825–2830 (2011).
MathSciNet MATH Google Scholar
Jones, D. R., Schonlau, M. & Welch, W. J. Efficient Global optimization of expensive black-box functions. J. Glob. Optim. 13, 455–492 (1998).
MathSciNet MATH Google Scholar
Wang, G., et al. Accelerated discovery of multi-elemental reverse water-gas shift catalysts using extrapolative machine learning approach. GitHub https://doi.org/10.5281/zenodo.8181610 (2023).

Download references

Acknowledgements

This study was financially supported by the KAKENHI grants (21K18185 (T.T. & I.T.) and 22K14538 (S.M.)) from the Japan Society for the Promotion of Science (JSPS), JST-CREST Program JPMJCR17J3, JST-FOREST Program JPMJFR211U, and the Joint Usage/Research Center for Catalysis. G.W. acknowledges the JSPS postdoctoral fellowship (P20345) (G.W.). A portion of the calculations was performed on supercomputers at RIIT (Kyushu University) and ACCMS (Kyoto University).

Author information

These authors contributed equally: Gang Wang, Shinya Mine, Duotian Chen.

Authors and Affiliations

Institute for Catalysis, Hokkaido University, N-21, W-10, Sapporo, 001-0021, Japan
Gang Wang, Shinya Mine, Duotian Chen, Yuan Jing, Kah Wei Ting, Taichi Yamaguchi, Motoshi Takao, Ken-ichi Shimizu & Takashi Toyao
School of Advanced Engineering, Kogakuin University, 2665-1, Nakano-cho, Hachioji, 192-0015, Japan
Zen Maeno
RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027, Japan
Ichigaku Takigawa
Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, N-21, W-10, Sapporo, 001-0021, Japan
Ichigaku Takigawa
Institute for Liberal Arts and Sciences, Kyoto University, 69-302, Yoshida-Konoe-cho, Sakyo-ku, Kyoto, 606-8315, Japan
Ichigaku Takigawa
Central Technical Research Laboratory, ENEOS Corporation, 8, Chidori-cho, Naka-ku, Yokohama, 231-0815, Japan
Koichi Matsushita

Authors

Gang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shinya Mine
View author publications
You can also search for this author in PubMed Google Scholar
Duotian Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Jing
View author publications
You can also search for this author in PubMed Google Scholar
Kah Wei Ting
View author publications
You can also search for this author in PubMed Google Scholar
Taichi Yamaguchi
View author publications
You can also search for this author in PubMed Google Scholar
Motoshi Takao
View author publications
You can also search for this author in PubMed Google Scholar
Zen Maeno
View author publications
You can also search for this author in PubMed Google Scholar
Ichigaku Takigawa
View author publications
You can also search for this author in PubMed Google Scholar
Koichi Matsushita
View author publications
You can also search for this author in PubMed Google Scholar
Ken-ichi Shimizu
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Toyao
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

G.W., D.C., S.M., and T.Y. synthesized the catalysts and performed the catalytic reactions. S.M., M.T., and I.T. composed the ML codes and conducted ML predictions and analysis. S.M., G.W., D.C., Y.J., and K.W.T. characterized the catalysts. S.M., G.W., Y.J., and Z.M. conducted the operando spectroscopic experiments. K.M. provided insights into the experimental work. I.T., K.S., and T.T. directed the project and provided guidance for the experimental and computational work. The manuscript was written by S.M., G.W., D.C., I.T., and T.T. with input from all authors.

Corresponding authors

Correspondence to Ichigaku Takigawa, Ken-ichi Shimizu or Takashi Toyao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, G., Mine, S., Chen, D. et al. Accelerated discovery of multi-elemental reverse water-gas shift catalysts using extrapolative machine learning approach. Nat Commun 14, 5861 (2023). https://doi.org/10.1038/s41467-023-41341-3

Download citation

Received: 13 March 2023
Accepted: 28 August 2023
Published: 21 September 2023
DOI: https://doi.org/10.1038/s41467-023-41341-3
Springer Nature Limited

Accelerated discovery of multi-elemental reverse water-gas shift catalysts using extrapolative machine learning approach

From

Abstract

Similar content being viewed by others

Data-driven discovery of electrocatalysts for CO2 reduction using active motifs-based machine learning

Inspecting design rules of metal-nitrogen-carbon catalysts for electrochemical CO2 reduction reaction: From a data science perspective

Bridging the complexity gap in computational heterogeneous catalysis with machine learning

Introduction

Results

ML-assisted discovery of RWGS catalysts

Statistical analysis using ML

Catalyst characterization

Mechanistic studies

Catalyst durability

Discussion

Methods

Chemicals

Preparation of the catalysts

Catalysts characterization

Catalytic reverse water-gas shift reactions

ML methods

Procedure of ML-assisted RWGS catalysts discovery

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Supplementary Information

Peer Review File

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation