Introduction

Hybrid organic-inorganic halide perovskite materials with prototypical formula ABX3 (where X is halogen) enjoy exceptional optoelectronic properties and have found extensive applications in solar cells, photodetectors, light-emitting diodes, scintillators, photo-catalysts and photo-batteries1,2,3. The halide perovskite materials are solution-processable and the fabrication routes are cost-effective, which are in vast difference with traditional semiconducting inorganic materials such as silicon and GaAs. However, they are still far from practical industrial deployment because of several notorious issues including lead contamination and material instability.

Unlike silicon or graphene, the halide perovskites are extremely unstable in the ambient conditions4,5,6. For example, the halide perovskite materials degrade quickly in the presence of atmospheric molecules such as oxygen and water vapors, debilitating the resulting device performance. A similar situation happens when the halide perovskite materials are in contact with acidic and basic chemical species. The interactions between perovskite and water tend to have adverse impacts on both the materials and devices, leading to diminished stability in operational conditions. Numerous studies have examined the stability of materials and devices under elevated temperatures and humidity levels7,8,9,10. Due to the inherent ionic nature, the interactions between perovskites and water lead to significant stability challenges, which represent a fundamental barrier to the practical utilization of perovskites. The deterioration of halide perovskites when exposed to water is associated with hydration, phase transition, degradation, and dissolution11,12. In certain instances, however, small amounts of water have proven beneficial, aiding in regulating the crystallization process to achieve superior-quality perovskite films and enhanced device performance13,14. The challenge is exacerbated in the environment of aqueous solution: if a cutting-edge CH3NH3PbI3 halide perovskite film is placed in contact with water, the corresponding optoelectronic conversion process deactivates instantaneously and the water-induced bleaching phenomenon occurs15.

Various materials engineering approaches have been devised to address the ‘aquaphobia’ issue of halide perovskite materials and improve their aqueous stability. Dimensional tailoring methods16,17,18,19,20 have been found to be practical, which is attributed to the intrinsic stable structural nature of the low-dimensional systems and can be further improved via hydrophobic surface groups. The inorganic substitution is another versatile method to improve the stability of halide perovskite materials, yet suffering from inferior optoelectronic properties. In contrast, researchers have employed various compatible organic molecules21,22,23 and surface species to stabilize the hybrid perovskite systems, offering balanced optoelectronic properties and overall structural stability. Photoelectrochemical properties of halide perovskite materials refer to their behavior and performance in photoelectrochemical processes, where light is used to drive chemical reactions at the material surface. The photoelectrochemical properties of perovskite materials are related to their photovoltaic and light-emitting performance. Understanding and optimizing these photoelectrochemical properties are crucial for the development of halide perovskite materials for efficient and sustainable energy conversion and storage technologies, including solar cells, water-splitting systems, photocatalysts, and environmental sensors. Given the highly complex and multi-dimensional virtual design space of molecule- and surface-engineered halide perovskites, more comprehensive and systematic investigations should be carried to obtain optimal molecule/perovskite composite systems delivering decent optoelectronic properties and aqueous stability in hostile conditions such as the aqueous solution.

Machine learning approaches are versatile to accelerate material design process in an inverse manner24,25. However, many issues still remain regarding this type of research paradigm including the dark-box model offering insufficient scientific interpretation. Recently, the scientific machine learning with emphasis on providing scientific interpretation is receiving significant attention in physical science and materials communities26,27, given the accessibility of a series of machine learning algorithms and molecular descriptors that can potentially decouple the molecular and structural factors. To this end, alternative machine learning methods are still required to further improve the machine accuracy and interpretability, including genetic algorithms28,29 that are able to construct mathematical equations to intuitively describe the target output from materials inputs.

Density functional theory (DFT) calculations20,30,31,32,33 are versatile methods to reveal salient structural inputs and optoelectronic properties of materials via considering the electron density in Schrödinger Equations. The detailed calculations include band structures, density of states, frontier orbitals etc. deduced from eigenvalues and wavefunctions with the assistance of density functionals in a Jacob’s ladder, facilitating the interpretation of experimental results of molecules and materials as well as the establishment of structure-property relationships in atomic and electronic levels.

In this study, we systematically investigate aqueous photoelectrochemical stability of CH3NH3PbI3 films modified by multiple molecules functioning at diverse surfaces. This results in the discovery of an effective multi-molecule perovskite material system ‘calcein+PbBr2 + DMSO+CH3NH3PbI3’ (PbBr2 as additive, DMSO as solvent and calcein as post-treatment molecule) affording outstanding aqueous photocurrent stability. A large aqueous photocurrent of 10−5 A/cm2 (103 times the output of CH3NH3PbI3) and an improved aqueous stability of 92.5% are achieved. The experimental aqueous photoelectrochemical properties of the molecularly modified perovskite materials are subsequently examined via genetic algorithms and extra tree-based Shapley Additive Explanations (SHAP) analysis to provide machine interpretation and decouple molecular contributions, highlighting the importance of synergy effects of these compatible molecules and their hydrophilicity/lipophilicity for the target outputs. The post-hoc DFT calculation suggests the presence of hydrogen bonds and anion··π surface interactions for stabilizing the interfacial structures. The overall workflow to evaluate the halide perovskite stability via photoelectrochemical, machine learning, and DFT models is depicted in Fig. 1. The data distribution, experimental variables, and fabrication steps are displayed in Fig. 2.

Fig. 1: Overall workflow of this study.
figure 1

Schematic of systematic multi-molecular design process to optimize aqueous optoelectronic properties of halide perovskite materials, combining photoelectrochemical experiments, feature analysis, machine learning (extra-tree and genetic algorithm) analysis, and DFT calculation.

Fig. 2: Fabrication details, classification method and molecular variables.
figure 2

a Fabrication steps to obtain 96 molecule-modified perovskite films in photoelectrochemical experiments, including introduction of precursor additives, solvents and post-treatment dye sensitizers. b Data visualization of photocurrents from the photoelectrochemical experiments. The photocurrent data are classified into two groups (’stable’ and ’unstable’) to represent the aqueous optoelectronic stability of the thin film materials. The label ’1’ represents the ’stable’ hybrid materials if 0.9 < residue index < 1.1, while the others (residue index ≥1.1 or ≤0.9) correspond to ’unstable’ systems and are labeled as ’0’. A total of 384 (96 materials × 4 measurements) photocurrent data points are collected. c Experimental variables (solvent ratio, additive and post-treatment molecule) to fabricate the surface molecule-modified CH3NH3PbI3 nanocomposite system.

Results and Discussion

Aqueous photocurrents of molecules/CH3NH3PbI3

The systematic aqueous photoelectrochemical measurement highlights the interplay among solvents, additives, and post-treatment molecules for the resulting optoelectronic properties, which is in distinctive contrast with the bare halide perovskite film that produces negligible photocurrent in the aqueous solution. For example, the solvent and additives molecules mainly modified the halide perovskite surface near the electron transporting layer (TiO2) and perovskite/perovskite grain boundary regions, while the post-treatment molecules predominantly control the perovskite surface in the solid/liquid (aqueous solution) boundary. The experimental data of 96 diverse perovskite materials with variable precursor solvent ratio, additive and post-processing dye molecules (Fig. 3a–t), in an effort to obtain an optimal molecular combination to stabilize aqueous optoelectronic properties of perovskites. Four measurements at different water immersion time (70 s, 110 s, 150 s, and 190 s) are carried out to account for the time-dependent water degradation, since the water molecules can swiftly infiltrate into MAPbI3 film within seconds even at a low relative humidity of 10%12. The photocurrents of the molecule-modified halide perovskite film in aqueous solution are obtained with a light on/off interval of 20 s and a total of 200 s measurement period. The water immersion time is associated the water stability of the halide perovskite film with the molecular modification. Interestingly, the calcein dye offers improved optoelectronic properties of CH3NH3PbI3 when it is employed as the post-treatment species, which is attributed to the optimal light absorption properties and minimized hydrophilic groups in the calcein dye molecule (vide infra). Additionally, the solvent ratio of 1:1 for the precursor solution is optimal for the target output; this is ascribed to the modest intermolecular interactions and surface compact layer formation when both DMF and DMSO are present. Furthermore, the contribution of additive in the precursor solution is non-negligible, with NH4Cl leading to an inferior aqueous photocurrent (< 20 μA) attributed to the lack of metallic species that results in poorer Lewis acid-base interactions in the molecule/perovskite surface. The comprehensive aqueous photoelectrochemical study witnesses a champion system based on the combination of “DMSO+PbBr2+calcein”, with an aqueous photocurrent reaching 34 μA and a retention rate of 92.50% within 200 s (Fig. 3o), signifying balanced optoelectronic property and aqueous stability. The superior aqueous performance of this hybrid system is owing to the intricate contributions such as appropriate intermolecular bonding, hydrophilicity, charge state, and molecular topology revealed in the machine learning models and DFT calculations (vide infra). Apart from “DMSO+PbBr2+calcein”, alternative high-ranking hybrid systems are available, preferring the incorporation of the calcein dye as the post-treatment species, namely “DMSO + CsBr + calcein”, “DMF + DMSO + calcein” and “DMSO + LiCl + calcein”. To sum up, the systematic aqueous photoelectrochemical measurement highlights the efficacy of multiple surface molecules to activate the aqueous photocurrents of the perovskite material, and calls for a global optimization strategy that comprehensively address diverse surfaces and interfaces in the perovskite photoelectrodes.

Fig. 3: Aqueous photocurrents of various molecule-modified perovskite films with diverse solvent ratios (DMF/DMSO), additives and post-treatment dyes, with four high-ranking systems highlighted (all based on calcein).
figure 3

From left to right: additive = none, LiCl, NH4Cl, CsBr, PbBr2. From top to bottom: solvent ratio = 1:1, 1:4, 0:1, 1:0. Black: bare systems. Red: solvent red. Green: solvent blue. Blue: oil black. Yellow: calcein. ae Molecule-modified perovskite systems with fixed solvent ratio (1:1) and variable additives and post-treatment dyes. fj Molecule-modified perovskite systems with fixed solvent ratio (1:4) and variable additives and dyes. ko Molecule-modified perovskite systems with fixed solvent ratio (0:1) and variable additives and dyes. pt Molecule-modified perovskite systems with fixed solvent ratio (1:0) and variable additives and dyes.

Machine learning

An accurate machine learning model describing the photoelectrochemistry of the multi-molecule-modified perovskites is constructed using the extra-trees algorithm based on the molecular and experimental features surviving the RFE process (Fig. 4a). The extra-trees model achieves a large area-under-curve (AUC) value of 0.86 for the test dataset in the receiver operating characteristic (ROC) plot; the corresponding confusion matrix further demonstrates the accurate stability classification model, with 100 samples correctly classified (82 true positive and 18 true negative samples). Furthermore, the AUC value of the train set displays a high accuracy of 0.94, with the confusion matrix indicating that 253 true samples are correctly identified (Fig. 4b, c). In conclusion, the extra-trees model presents a high accuracy in assessing the aqueous performance of the molecule-modified lead halide perovskites and establishes a solid foundation for the subsequent SHAP-based feature analysis.

Fig. 4: Feature correlations and machine learning model accuracy.
figure 4

a Heat map of remaining features after RFE. b ROC curve of the test set based on the extra-tree model. c ROC curve of the train set using the extra-tree model.

Genetic model

In order to provide alternative machine interpretability and complement the black-box machine learning model, a genetic model clearly revealing the structure of the machine learning model via mathematical equations relating the descriptors and the photoelectrochemical stability is constructed. This gives rise to the following molecule-photocurrent stability relationship:

$${\bf{Stability}}=\,{{\boldsymbol{f}}}_{{\bf{1}}}+{{\boldsymbol{f}}}_{{\bf{2}}}-{{\boldsymbol{f}}}_{{\bf{3}}}$$
(1)
$${f}_{1}=\frac{-\sqrt{\sqrt{\log \left(\frac{{D}_{{\rm{R}}}}{\sqrt{{D}_{{\rm{R}}}-{A}_{{\rm{LA}}}}}-T{-A}_{E{\rm{VS}}}{+D}_{{\rm{R}}}\right)-T}-2T}}{{D}_{{\rm{R}}}+\frac{{D}_{{\rm{R}}}}{\sqrt{\tan T}}-T-{A}_{{\rm{EVS}}}}+\frac{{D}_{{\rm{R}}}-{A}_{{\rm{EVS}}}}{{D}_{{\rm{R}}}\sqrt{\log \left(2{D}_{{\rm{R}}}\right)-T}}$$
(2)
$${f}_{2}=\cos {S}_{{\rm{V}}}-{S}_{{\rm{V}}}$$
(3)
$${f}_{3}=\sqrt{\tan T}+\tan T$$
(4)

where the detailed explanations of the variables are provided in the Supplementary Table 2. The genetic model demonstrates decent accuracies (i.e., 83% for the test set and 86% for the train set) to describe the photoelectrochemical outputs. The following chemical and materials insights can be provided by the genetic model:

  • The mathematical equation describing the aqueous photocurrents of the hybrid systems consists of multiple terms (joint contribution from both molecular and experimental details) connected by at least seven mathematic operators (+, −, ×, ÷, √, tan, cos, and log).

  • f1 corresponds to the surface area in a molecule with a specific electrotopological state of halide additive in the precursor solution (i.e., AEV), and the feature AEV has a negative impact on residue index (the larger value of AEV, the poorer photocurrent stability). We attribute this to the undesirable perovskite crystal formation process in the presence of additives with larger surface area (represented by higher AEV value) in the precursor solution, with a tendency to form less compact layers that is detrimental to the aqueous stability. In addition, the presence of ALA in the denominator with two subtraction operations suggest the negative correlation with the target stability, which is attributed to the larger surface area that leads to more significant interaction with the neighboring solvent molecules (ALA describes the molecular solubility via measuring the surface area on the compound molecule exposed to a solvent). To sum up, the first term (f1) describing the perovskite photoelectrochemical stability is a sophisticated interplay among the three types of surface molecules (solvent, precursor and post-treatment dye), especially the intricacies of electrotopological and surface area exposed to solvents for the target property.

  • f2 is another critical factor revealed in the genetic model (the second term) focusing on the Sv feature, with negative influence on the perovskite stability. The Sv parameter in f2 is associated with the hydrophilicity of the post-treatment molecule, and higher value implies a higher solubility of dye molecules in water, which may ultimately lead to weaker surface layer protection and thus poorer stability. This can be confirmed by the optimal combination “DMSO+PbBr2+calcein” which has a lower value of AEV and Sv. In addition, f2 is negatively correlated with the stability output because of the negative first derivative (slope) of the function cos SVSV (0 + 2kπ < Sv < π + 2kπ where k = 5). As a result, the aqueous stability of the molecularly modified perovskite material can be decoupled into f2 which highlights the negative correlation of the hydrophilicity of the post-treatment molecule to the material stability, which agrees with chemical intuition.

  • f3 is associated with the experimental parameter T and is negatively correlated with the stability output as demonstrated in the last term (\(-\sqrt{\tan T}-\tan T\)); this agrees with experimental instinct where T corresponds to the water immersion time and the prolonged water immersion leads to more dramatic degradation of the photoelectrode materials. As a result, the third mathematical term in the genetic model represents the experimental observation that longer water immersion severely degrades the halide perovskite materials.

Feature analysis

The SHAP method is utilized to understand the molecular contribution and structural importance based on the extra-trees machine learning model (Supplementary Fig. 3). The top three most important features using the SHAP method are APV, DR, and ASV (Fig. 5a), and the complete feature ranking for aqueous photocurrent is: APV > DR > ASV > AM > AK3 > AEVS > TPSA > AHA > ALA > VE8 > SV > AK1 > FDM1 > FDM2 > MAPC > MEI > T > MPC. It is interesting to observe that most molecular descriptors exhibit much higher ranking than the important experimental factor T (T is negatively correlated with the target output), indicating the strong influence of these structural, topological and property descriptors of the multi-molecules in diverse surfaces and interfaces on aqueous photocurrent. The temperature T survives after the RFE feature selection step. Importantly, it is only minor compared with other survived feature, but is much more important than most features before the RFE feature selection step. As a result, the temperature should not be eliminated in the post-hoc machine learning analysis. In the summary plot of SHAP, the lower values (blue points) of the features APV, ASV, AEVS, TPSA, AHA, ALA, VE8, SV, and AK1 distribute on the positive side of the SHAP values (the color of dots represents the magnitude of SHAP values, with a darker red color representing higher values), specifying that judicious selection of surface molecules with lower values of these features benefits the water stability of the lead halide perovskite film. The highest-ranking feature APV is related to the atomic charge distribution and functional group of additives, signifying the strength of the connection of the molecules to an actual charged atom; as a result, the additive molecules with higher APV values are associated with more charged regions that benefit the connections to the charged species in the precursor solution and thus lead to improved quality of the perovskite film. ASV is another high-ranking molecular feature, which is associated with the hydrophilicity of molecular additives; as a result, lower values of ASV are desirable to mitigate the interaction with the air-water interface and subsequent material degradation. In order to design aqueous stable halide perovskite materials, it is recommended to choose halide additives with lower ASV and APV values suggested in the SHAP analysis. On the other hand, the larger values (red dots) of the features DR and AM distribute on the positive side of the SHAP values, indicating that larger values of these features are preferred to designing water-stable perovskites. DR corresponds to the ratio between DMF and DMSO, suggesting the solvent concentration of both solvents should be considered in the materials design process. AM is referred to the degree of lipophilicity of the compound and the larger value is favored to alleviate water solubility. This agrees with the experimental intuition to improve optoelectronic stability via minimizing water contact and infiltration. The SHAP feature analysis highlights the hydrophilicity and lipophilicity of the organic molecules for the perovskite aqueous optoelectronic stability. Hydrophilicity refers to the tendency of a substance to interact favorably with water, while lipophilicity refers to the tendency to interact favorably with lipids or non-polar solvents. The van der Waals surface area has been demonstrated to be strongly related to lipophilicity and the negative SlogP_VSA2 is related to high hydrophobicity to alter the perovskite dimensionality34,35,36. Apart from that, the higher-ranking features in the importance histogram according to SHAP (Fig. 5b) are not solely contributed any particular molecules in a single surface/interface; rather, the aqueous photocurrent generation of the perovskite film is contributed by solvents, precursor additives, and post-treatment molecules in a synergistic manner, with the hydrophilicity and lipophilicity of the organic surface modifiers playing critical role.

Fig. 5: Feature analysis via SHAP.
figure 5

a Summary plot of the feature based on the extra trees model using SHAP. b Feature importance ranking histogram based on the extra-trees model using SHAP.

Post-hoc DFT calculation

DFT is employed to uncover the detailed intermolecular interactions of the champion system (DMSO+cacein+PbBr2) on MAPbI3 surface (Fig. 6a, b) presuming an adsorption model with molecular adsorbates. In addition, a H2O molecule is introduced to the molecule-modified perovskite supercell surface system (Fig. 6c, d) to better understand the aqueous influence on the halide perovskite surface structure and optoelectronic properties. The calculation identifies the co-existence of anion···π, halide bond and hydrogen bond molecular interactions to stabilize the overall molecule-perovskite interfacial structure (Fig. 6e). The calcein molecule displays a hydrogen bond of 4.65 Å with the perovskite surface lead species, and an anion···π interaction between the DMSO molecule and the perovskite surface iodine species at a distance of 4.27 Å; additionally, a hydrogen bond of 3.90 Å forms between the calcein adsorbate (hydroxyl hydrogen) and perovskite surface (iodine). Moreover, DMSO forms a hydrogen bond of 3.27 Å with the perovskite surface; PbBr2 demonstrates a halogen bond of 3.64 Å with the perovskite surfaces iodine, and a hydrogen bond of 4.42 Å with the calcein molecule. It is noteworthy that the H2O molecule does not disrupt the structure integrity of the interfacial perovskite surface system; this is attributed to the presence of complex intermolecular bonding among the organic molecules in addition to the adsorbate···adsorbent interactions, which form a compact self-assembled multilayer37 to prevent direct damage to the perovskite system by water molecules.

Fig. 6: Post-hoc DFT calculation (atomic and electronic structures).
figure 6

a, b Unit cell structure of MAPbI3 for supercell surface construction from different viewpoints. c, d Top and side views of the proposed atomic structure of the molecule-perovskite system with multiple surface molecular adsorbates (DMSO, calcein, PbBr2 and H2O) optimized by DFT. A 20 Å vacuum layer is inserted to avoid unnecessary interlayer interactions. e Featured molecular interactions among DMSO, calcein, PbBr2 and H2O, stabilized via anion···π, halide bond and hydrogen bond interactions. f PDOS spectra of the molecule-modified perovskite system. g Simulated UV-vis absorption spectra of the molecule-modified perovskite system and the bare system. h, i Potential profile and work function of the molecule-modified perovskite system and the bare perovskite system.

The projected density of states (PDOS) spectra are determined to reveal the joint contributions of post-treatment calcein molecule, additive PbBr2, and precursor solvent DMSO in the electronic properties of the perovskite system (Fig. 6f). The adsorbed molecules contribute strongly to both conduction band and valence band of the perovskite system via their p orbitals. In particular, the PbBr2 adsorbate contributes more significantly to the conduction band edges while the calcein molecule contributes extensively to both conduction and valence band edges in the PDOS spectra, signifying possible chemical bond formation and intimate physical contacts via effective hybridization between the p orbitals of the post-treatment chromophore and neighboring materials.

The simulated UV-vis absorption spectra (Fig. 6g) demonstrate the improved light-harvesting performance of the molecule-modified perovskite system (Supplementary Fig. 4) compared with the bare system in terms of visible light (400 nm ~ 800 nm) conversion into electrons, which aligns with the experimental findings38,39. This suggests the possibility of simultaneous improvement in the light absorption properties and insulation against water damage when these adsorbent molecules are present on the halide perovskite substrate. Furthermore, the molecule-modified hybrid system has a smaller work function (5.44 eV) than the bare system (5.98 eV), indicating that the champion system can emit photoelectrons at a smaller excitation energy in addition to decent conductivity (Fig. 6h, i). Moreover, the incorporation of these adsorbed molecules results in a reduction of band gap in the perovskite system: the band gap of the molecule-modified system is reduced to 1.439 eV (c.f., the bare system has a band gap of 1.795 eV) (Table 1), favouring a promoted energy transition from occupied energy levels to occupied ones for valence electrons. Noteworthily, the incorporation of the molecules does not introduce unnecessary in-gap states corresponding to detrimental deep-level defects, which is in vast difference with those undesirable molecular adsorbate systems in the literature40,41. The potential plot and density of states spectra of the perovskite systems without water adsorption are also obtained via the DFT calculation, suggesting larger work function of both bare and molecule-modified perovskite systems in the presence of water. In addition, the water adsorption leads to larger band gap, which is detrimental to the optical properties (Supplementary Fig. 5). To sum up, the presence of calcein, DMSO and PbBr2 on CH3NH3PbI3 surface leads to a stable interfacial structure via multiple intermolecular bonds with improved optoelectronic properties and absence of deep-level defects.

Table 1 Bandgap, SLME efficiency and S-Q efficiency of molecule-modified system and bare system

Shockley Queisser (S-Q) and spectroscopic limited maximum efficiency (SLME) values are calculated to evaluate the application for photovoltaic device of molecule-modified perovskite system. The theoretical S-Q efficiency of the bare perovskite system is only 27.2% (Fig. 7); when the incident light enters the perovskite system, 53.6% of energy is not absorbed, 10.9% is lost due to thermalization, 8.2% is lost due to extraction and only 27.9% energy of incident light is available for absorption. Additionally, the bare system only has an SLME efficiency of 26.84%, which is slightly larger than the S-Q counterpart because the latter method neglects the absorption process and only considers the band gap contribution. In contrast, the theoretical S-Q efficiency of the molecule-modified system is 32.9%, which is 1.2 times higher than that of the bare system; the corresponding deconvolution demonstrates 35.1% of the coming light energy is not absorbed, 19.9% is lost due to thermalization, 12.1% is lost due to extraction and 32.9% available for subsequent energy conversion. Meanwhile, the SLME efficiency of the molecule-modified system is 32.27%, which improves by 20% compared with that offered by the bare system. Besides, the presence of water molecule leads to inferior SLME and S-Q efficiency, which agrees with chemical intuition (Supplementary Table 2 and Supplementary Fig. 6). To sum up, both S-Q and SLME efficiency estimation methods confirm the superiority of the molecule-modified perovskite systems to offer enhanced solar energy conversion performance.

Fig. 7: S-Q and SLME efficiencies of the bare and molecule-modified system containing ’calcein+PbBr2+DMSO’.
figure 7

a Theoretical S-Q efficiency (27.2%) of the bare perovskite surface system. b Classification of the incoming solar energy loss in the bare perovskite system into three categories: energies that are not absorbed, thermalization loss, and extraction loss. c Relationship between theoretical SLME efficiency and film thickness of the bare perovskite system. d Theoretical S-Q efficiency (32.9%) of the champion molecule-modified perovskite system. e Classification of the incoming solar energy loss in the molecule-modified perovskite system into unabsorbed loss, thermalization loss and extraction loss. f Relationship between theoretical SLME efficiency and film thickness of the champion molecule-modified perovskite system.

A comprehensive photoelectrochemical investigation on 96 different molecule-modified CH3NH3PbI3 halide perovskite materials is performed, which helps evaluate the molecular influence (solvent molecule ratios, halide additives and post-treatment molecules) on the halide perovskite aqueous optoelectronic stability. A champion system based on ‘calcein+PbBr2 + DMSO’ is identified; it delivers a large aqueous photocurrent (3 × 10−5 A/cm2) and an improved aqueous stability (retention index) of 92.5% after 200 s water immersion. An accurate extra-tree machine learning model is constructed, with SHAP feature analysis highlighting the hydrophilicity and lipophilicity of the organic molecules for the perovskite aqueous optoelectronic stability. An accurate genetic model is provided to offer alternative machine interpretability and address the ‘black-box’ issue. A more approachable machine learning model with the mathematical expression of Stability = f1+f2f3 is designed to describe the perovskite stability. The resulting expression decouples the molecular contributions into hydrophilicity, electrotopology and surface areas of the tri-molecules. The post-hoc DFT calculation suggests the possibility of multiple surface intermolecular bonds, such as hydrogen bonds and anion··π surface interactions in the self-assembled layer to stabilize the interfacial structures in the champion system. The calculation suggests the absence of deep-level defects, improved light harvesting properties and higher S-Q and SLME efficiencies in the ‘calcein+PbBr2 + DMSO’ system. The present study confirms the efficacy of the proposed ‘global optimization’ strategy to address the aqueous instability issue of halide perovskite materials, and calls for more multi-mode modeling studies to comprehensively evaluate the molecule-modified materials.

Methods

Photoelectrochemical experiments

The TiO2 film (18NR-T, Greatcell Solar) is transferred onto TiO2 via doctor-blade method, which is sintered at 450 °C for two hours. Three types of molecules are considered in the surface molecular design process, including solvents, additives and post-treatment species. A precursor solution containing CH3NH3I (MAI) and PbI2 is prepared based on DMSO and DMF with different ratio, and halide additives such as NH4Cl, LiCl, PbBr2 and CsBr with a concentration of 0.02 mol/L are introduced into the solution. Subsequently, the halide perovskite precursor solution is introduced on TiO2 film and heated at 90 °C, and a post-treatment solution consisting of 0.002 mol/L organic molecules (represented by solvent red 24, solvent blue 97, calcein and solvent black 5 chromophores) (Fig. 2a) is drop-cast onto the surface after the halide perovskite film darkens. Finally, the halide perovskite film is continuously heated at 90 °C and the resulting halide film is cooled down slowly to the room temperature. This leads to the presence of molecular residues that functionalize diverse surfaces. The molecule-modified halide perovskite thin film material is subsequently illuminated by a light source (Ailike) with a 2000 lm intensity, which is positioned 5 cm away from the photoelectrode. The photocurrents of the molecule-modified halide perovskite film in aqueous solution are obtained with a light on/off interval of 20 s and a total of 200 s measurement period. A three-electrode potentiostat (CHI660E, Chenhua) is employed to obtain aqueous photocurrents to reflect their aqueous optoelectronic properties, consisting of the molecule-engineered perovskite film as the working electrode, Ag/AgCl as the reference electrode, Pt as the counter electrode, and a 0.1 M Na2SO4 aqueous solution as the electrolyte. Specifically, the experimental variables consist of solvent concentration ratios (DMF: DMSO = 1:1, 1:4, 0:4 and 4:0), halide additives (LiCl, NH4Cl, CsBr, PbBr2 and none) and post-treatment dye molecules (solvent red 24, solvent blue 97, calcein, solvent black 5 and none) (Fig. 2). The detailed fabrication steps follow the those described in the literature42.

Genetic and machine learning models

Output label

binary classification is carried out to label the material as ‘stable’ or ‘unstable’ in water. In total, 384 experimental data entries are obtained based on 96 different molecularly modified materials and 4 measurements at different water immersion time. The residue index describing the aqueous stability is calculated according to the following equation:

$${\rm{Residue\; index}}=\frac{{J}_{{\rm{initial}}}}{{J}_{{\rm{end}}}}\times 100 \%$$
(5)

where Jinitial is the starting aqueous photocurrent, Jend is the final aqueous photocurrent. The materials are assumed to be ‘stable’ if 0.9 < residue index < 1.1 (marked as label ‘1’), while the others (residue index ≥ 1.1 or ≤ 0.9) are considered as ‘unstable’ and are marked as label ‘0’ (Fig. 2b). As a result, the photocurrent data are classified into two groups (‘stable’ and ‘unstable’) to represent the aqueous optoelectronic stability of the thin film materials. The label ‘1’ represents the ‘stable’ hybrid materials if 0.9 < residue index < 1.1, while the others (residue index ≥ 1.1 or ≤ 0.9). The threshold values (0.9 and 1.1) are chosen because: (1) decent gap regions exist near 0.9 and 1.1, which help classify the stability; (2) if the retention index is either too large or too small, it is more probable to display undesirable structural disintegration and material dissolution is water43.

Feature selection

70 raw descriptors from RDKit (Supplementary Table 1) related to connectivity, constitutional and topological details of organic molecules are selected as the initial features. These raw descriptors are mutually dependent and repetitive (Supplementary Fig. 1); in order to avoid redundancy and overfitting issues, recursive feature elimination (RFE) is employed to conduct feature selection to screen proper descriptors. RFE is a feature selection method to recursively select the most important subset of features from a given feature set, gradually eliminating the low-ranking features until either the desired number of features (Supplementary Table 2 and Supplementary Fig. 2) or a threshold of feature importance is reached.

Machine learning

two machine learning methods are employed to decouple the molecular contributions on the aqueous stability of the materials. (1) The SHAP method based on Extremely Randomized Trees (extra-trees) Classifier algorithm is utilized to assist in interpreting the photoelectrochemical data. The extra tree classifier is an ensemble learning technique that aggregates the results of multiple decorrelated decision trees collected in a forest to output classification results, and SHAP is based on cooperative game theory and calculates the contribution of each feature towards the prediction, offering scientific insights using the present extra-trees model. (2) Genetic algorithm, inspired by the principles of biological evolution, is introduced to unveil the interconnections among features and investigate their impact on the aqueous stability of the halide perovskite material. The symbolic classification using the genetic programming algorithm simulates the genetic evolutionary processes to search for optimal mathematical expressions with terminals (molecular features) and operators (+, −, ×, ÷√, sin, cos, exp, and log). The hyperparameters of the extra-trees and genetic models are provided in the supporting information (Supplementary Table 3). The datasets, machine learning models and codes are openly accessible on the website: https://github.com/huangyiru123/global_optimization_molecules.

Density functional theory

Density functional theory (DFT) calculations are carried out in CASTEP44. The functional is Perdew-Burke- Ernzerhof (PBE) and the cutoff energy is 430 eV. A surface system of CH3NH3PbI3 (MAPbI3) is cleaved along the representative (001) direction and terminated with the PbI2 non-polar surface layer based on an orthorhombic crystal. A 2 × 2 × 1 supercell is constructed to accommodate the dye molecule, halide additive, solvent molecule, and H2O adsorbates. In order to reduce unnecessary interactions between slabs, a 20 Å vacuum layer is constructed. The convergence criteria of energy, force, and displacement are set to 1 × 105 eV, 0.03 eV/Å, and 0.001 Å for the geometrical optimization. Additionally, in order to avoid the erroneous estimation of band gaps by PBE, HSE06 functional is utilized to calculate band structure of the halide perovskite surface system with a k-point set of 2 × 2 × 1. The Tkachenko-Scheffler (TS) scheme45 is included to account for van der Waals interactions.