Introduction

Multi-principal element alloys (MPEAs) are made by combining multiple elements, where every element contributes a significant atom fraction to the alloy1. High entropy alloys (HEAs) represent a materials class within the broader family of MPEAs with outstanding mechanical, thermal, and electrochemical properties2,3,4,5,6,7,8,9,10,11,12. HEAs are unique amongst MPEAs because they contain multiple (at least five) principal alloying elements of nearly equi-atomic concentration and yet have a global crystal structure with well-defined Bragg reflections indicative of long-range order. HEAs are typically solid solutions of face centered cubic (FCC), body centered cubic (BCC), or hexagonally closed packed (HCP) phases. Recently, the community has started to explore high entropic versions of intermetallic and ceramic compounds13,14. To date, numerous elements in the periodic table have been explored to tune the properties of HEAs. However, not all compositions have resulted in the desired microstructure for application in extreme environments. In general, the physical and mechanical properties of HEAs vary depending on phase selection and their relative fractions in the microstructure15,16,17. In some applications, mixed phases are preferred18; whereas in others, a single-phase is desired19. Nonetheless, these observations have led many groups to develop effective and efficient phase prediction models for enabling discoveries of previously unexplored HEAs for targeted applications.

Traditional high-throughput approaches based on first-principles calculations are particularly not suitable to search for MPEAs due to the need for large supercells and complex crystal structure space involving multiple prototypes. Although computational thermodynamics-based methods have played an important role20,21, their limitations are also documented in the published literature22. More recently, various groups have demonstrated the potential of data-driven machine learning (ML) methods to guide the design of MPEAs and HEAs towards promising regions in the search space22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37.

One of the most explored ML implementations on MPEAs research is the phase classification problem, where the objective is to train ML models for predicting whether a given chemical composition will form in: (1) single-phase FCC, BCC, or HCP solid solutions, (2) FCC+BCC dual phase with varying phase fractions, (3) single-phase intermetallics, (4) mixed phases (FCC+intermetalics, BCC+intermetallics, FCC+BCC+intermetallics, two different intermetallics etc.), or (5) amorphous phase. ML models with fairly high accuracy (over 75%) have been trained using small and large data set sizes and different choices of outputs. Various elemental and thermodynamic properties have been considered as input features for the phase classification problem23,26,29,30,31,32,33. A number of published studies also report descriptor importance based on cross-entropy, Gini index, or permutation methods to gain some insight into the descriptor contribution to the overall predictive power of the model. There are also shortcomings in the current approaches. As an example, none of the published papers explain the predictions of the black-box models at the granularity of each observation. There is a lack of principled approach to glean insights that shed light on the formation of each phase in the training set. This is important because it is not straight-forward to compare the predictive performance of every published ML study using the data sets generated from different research groups because the ML models are not published along with the research paper.

In this work, we advance the application of ML methods in the MPEA phase classification problem in two significant ways. First, we apply two complementary instance-level (or local) post hoc model interpretability approaches, namely breakdown (BD) plots and Ceteris Paribus (CP) profiles, to glean insights into each observation. The BD method is based on the variable attribution principle, which decomposes the prediction of each individual observation into particular variable contributions38. In contrast, the more traditional global variable importance method provides a high-level or generic understanding of the inner workings of a black-box model and captures the relative importance of a given variable in impacting the overall model performance on the entire data set (that includes all phases). The CP profile method, on the other hand, evaluates the prediction response of a trained ML model to changes in a particular variable under the assumption that the values of all other variables do not change. We then develop an algorithm that combines the variable attribution data from the BD method with k-means clustering method to infer insights about similar instances. These results provide insight into explaining the relative variable contributions in the prediction of each phase or class label as inferred by the ML models. In addition, the CP profile plot captures the average partial relationship between the predicted response and the input variables. In this paper, we demonstrate the power of local model interpretability methods as a key post hoc model analysis tool for materials informatics research. We apply them to explain the predictions from an ensemble of support vector machine (eSVM) models trained on a high-dimensional, multi-class MPEA phase classification problem data set. SVMs belong to a class of black-box models that lack transparency39,40. More details about the eSVM approach are given in the “Methods” section. Although the idea of a global vs local model interpretability has been discussed before in the literature41, its impact is not fully realized in the materials informatics literature. Second, we build an interactive web application (https://adaptivedesign.shinyapps.io/AIRHEAD/) that allows the user to query our trained models directly and predict previously unexplored MPEA or HEA compositions with the desired phase. This effort is aimed at allowing interested researchers to examine carefully the model predictions and facilitate the decision-making process. Moreover, this will also allow the MPEA community to objectively compare future models and document the progress.

Results

Dataset

Our initial data set for ML was constructed by referring to several previous reports that meticulously compiled experimental data from the published literature22,32,42,43,44,45,46,47,48,49,50,51,52. The merged data set contained 3,715 compositions ranging from binary to multi-component alloys. The frequency of occurrence of elements in our data set is illustrated in Supplementary Fig. 1 and Supplementary Table 1, which indicates that the frequency of occurrence of d-block elements is higher compared to that of the p-block elements. Therefore, we anticipate our trained ML models to have a relatively large uncertainty when describing the chemical compositions containing the p-block elements compared to that of the the d-block elements. Each composition was also augmented with the phase information as reported in the literature. The phases were then simplified into seven classes: BCC, FCC, BCC+FCC, HCP, Amorphous (AM), Intermetallics (IM), and Mixed-phases (MP). The simplification mainly pertains to the IM and MP labels. As an example, the IM label indicates that the microstructure contains at least one intermetallic phase. Whereas, the MP label indicates the presence of complex mixtures of multiple phase combinations in the microstructure. For instance, we used the IM label to present ordered phases of B2, C14, and L12 structure type47. While the MP label was used to represent 2BCC+B2, B2+σ, BCC+B2, BCC+FCC+B2, BCC+IM, FCC+B2, FCC+C14, FCC+IM, FCC+σ, 2BCC, and 2FCC phases in the microstructure. A final data set with 1,817 observations were obtained by removing all the duplicate data, missing values, and excluding the alloys showing inconsistent phase data depending on the source. Each of the 1,817 observation was represented by a total number of 125 variables53,54. We did not track the processing history, which can have an impact on the thermodynamics and kinetics of phase formation in the MPEAs.

The number of variables were then reduced based on linear Pearson correlation coefficient (PCC)55. This is a common data pre-processing step in materials informatics and cheminformatics literature56,57,58,59,60. The workflow is shown in Fig. 1a. We considered two different absolute PCC threshold values (∣0.4∣ and ∣0.6∣) to down-select least linearly correlated input variables. Our choice of using a PCC criterion of ∣0.6∣ was motivated by the work of Pei et al.32. In addition, we also imposed a more stringent PCC criterion of ∣0.4∣ for further simplification. The PCC analysis resulted in identifying 12 and 20 variable sets for the ∣0.4∣ and ∣0.6∣ criterion, respectively. We note that the approach that we have explored in our work is more rigorous than some of the previous ML work on MPEAs in the literature55, where only one PCC criterion was used. We have tested two separate thresholds (PCC > ∣0.4∣ and PCC > ∣0.6∣) for feature selection. Unfortunately, there is no standard way to choose the thresholds, which makes the PCC analysis a challenging and an exploratory pre-processing step. In principle, one can resort to automated methods such as sure-independence screening61, but there are no convincing evidences in the literature showing that one method is better compared to the other on all data sets. In this work, we have made a concerted effort to retain features that have been well-studied in the literature (domain knowledge) so that we can show how our work advances the insights compared to the previous efforts. In our opinion, a brute-force, automated approach—agnostic of the domain knowledge—is not helpful in advancing the understanding of the MPEAs phase formation problem. The list of down-selected variables is given in Table 1. We can broadly subdivide the down-selected variables into three categories: (1) those that are chemistry-agnostic (e.g., Mixing Entropy), (2) those that depend on element pairs (e.g., DeltaHf), (3) those that depend on chemistry (everything else in Table 1).

Fig. 1: Overarching workflow.
figure 1

The flow chart for a feature selection using Pearson correlation coefficients (PCC) and b machine learning and local model interpretability approach. In this work, we used an ensemble of support vector machines (eSVM) for multi-class classification learning, breakdown plots and Ceteris Paribus (CP) profiles for local model interpretability, and k-means clustering.

Table 1 List of the descriptors identified from 125 descriptors by PCC > ∣0.4∣ or ∣0.6∣.

After the correlation analysis, we ended up with two pre-processed data sets (one with a 12 variable set and the other with a 20 variable set) for ML model building. The pre-processed data set was randomly split into two subsets with 75 and 25% data for training and testing, respectively. We used the eSVM algorithm for training the ML models. The optimal hyperparameters were determined using a grid search. The out-of-bag error rate was used to evaluate the performance. We systematically varied the number of bootstrap samples and found the 50 and 100 bootstrap eSVM models to show the best predictive performance on the test data for the 12 and 20 variable sets, respectively. Supplementary Tables 2 and 3 compare the relative performance of eSVM models on the test set in terms of accuracy, precision, recall, and F1-score. Both 12 and 20 feature sets of eSVM showed similar performance. Finally, we chose the simpler 12 feature set eSVM models for further analysis. The next step is the post hoc analysis of the trained eSVM models. We start with the global variable importance analysis, which is also the most common method within the ML MPEA community.

Global variable importance

The objective of global variable importance analysis is to evaluate the relative importance of each variable in impacting the overall predictive performance of the trained ML models. In this work, we used the well known permutation-method and cross-entropy loss function to assess the global variable importance62. In Fig. 2, we show the averaged global variable importance analysis from the 12 feature set eSVM model. All features appear to contribute to the prediction performance of the eSVM model. The error bar is the standard deviation from the 50 bootstrap samples. We note that Fig. 2 should be interpreted with caution because of the large standard deviation associated with the feature importance values. While it is not entirely common in the materials informatics literature to add error bars to the global feature importance analysis63,64,65, our work highlights the importance of adding them for improving the interpretability of the analysis. Mixing entropy, number of filled d or s valence electrons, covalent radius, and atomic weight are identified as relatively more important to affect the prediction performance. This result agrees well with the various ML papers in the literature23,26,29,30,31,33. While helpful, the global variable importance approach does not shed light on the following question: what variables contribute to the prediction of each phase (or class label) and how are these variables related to the predicted phase? This requires an implementation of local variable importance methods, which we discuss next.

Fig. 2: Global variable importance.
figure 2

The global variable importance for the 12 variable set eSVM model. Cross entropy loss was used as an indicator of variable importance. Error bars represent the standard deviation from 50 SVM models in the ensemble.

Local variable importance

We focused on two complementary local model interpretability methods: (1) Breakdown plots and (2) Ceteris Paribus profiles. In the breakdown (BD) approach, we decompose the model prediction for a single observation into contributions that can be attributed to different input variables62,66. The BD analysis can start from either a null set of indexes or a full set of relaxed features, which are referred to as step-up and step-down approaches, respectively. In the case of step-down approach (as considered in our work), each contribution of input variable is calculated by sequentially removing a single variable from a set followed by variable relaxation in a way that the distance to the prediction is minimized. For example, in Fig. 3, a BD plot is shown for the NbTaTiV composition. The eSVM model predicted the composition to form in BCC with 100% probability score. Thus, we will obtain only one BD plot for this composition representing the BCC phase prediction. The BD plots resemble a bar graph. Each variable can either contribute positively (positive weight) or negatively (negative weight) to the overall prediction. The intercept of the BD plot indicates the baseline, which is the average prediction of the ensemble SVM model. The size of each bar shows the feature contributions to the difference between a final prediction and the baseline. As an example, the average MixingEntropy for the entire training set (1,364 compositions) is 9.025. However, for the specific NbTaTiV composition the MixingEntropy value corresponds to 11.526. According to the BD plot, the MixingEntropy value of 11.526 for NbTaTiV will reduce the baseline value by a small amount (negative contribution). In NbTaTiV, the mean_MeltingT, mean_NValence, and mean_NsValence variables carry the largest weight and are recognized as important for predicting the composition as forming in the BCC phase. In a similar manner, we calculated the BD plots for all compositions in the training data. Readers can access the BD plots through our Web App (https://adaptivedesign.shinyapps.io/AIRHEAD/).

Fig. 3: Breakdown plot for NbTaTiV composition.
figure 3

The BD plot for NbTaTiV composition, which is predicted to form in BCC phase by the eSVM model. Each bar represents the averaged contribution for that variable towards the overall prediction. Both min_NpUnfilled and frac_pValence descriptors, strictly, take a value of 0 for NbTaTiV. However, it is important to note that the mean values for both min_NpUnfilled and frac_pValence in the entire training data set are not equal to 0. Thus, the plot indicates that when the min_NpUnfilled = frac_pValence = 0 for NbTiTaV, the lack of p-electrons in the valence configuration has a small but sizeable contribution to the overall prediction.

The Ceteris Paribus (CP) profiles convey complementary insights about the relationship between a variable and the response by showing how the prediction would be affected if we changed a value of one variable while keeping all other variables unchanged62. The method is based on the Ceteris Paribus principle; “Ceteris Paribus” is a Latin phrase meaning “other things held constant” or “all else unchanged”. CP profiles are an intuitive method to gain insights into how the black-box model works by investigating the influence of input variables separately, changing one at a time62. In essence, a CP profile shows the dependence of the conditional expectation of the dependent (or output) variable on the values of a particular input variable. In Fig. 4, we show a representative CP profile plot for the same NbTaTiV composition that was discussed in the previous BD section. Unlike the BD plot, we also observe the functional dependence of each variable on the model performance. In Fig. 4, x-axes are the input variables and the y-axes are the prediction probabilities from the eSVM models. There are seven curves in each panel and each curve represents a particular phase. For example, the red curve traces the prediction for the BCC phase. The CP profile plot highlights the presence of non-linear relationship between each of the features and the response. CP profiles for other compositions can be accessed through our Web App (https://adaptivedesign.shinyapps.io/AIRHEAD/).

Fig. 4: Ceteris Paribus profile for NbTaTiV.
figure 4

The CP profile for NbTaTiV composition with respect to the 12 input variables. The black dots indicate the true feature values. The gray region indicates the upper and lower boundaries based on standard deviation. Line colors denote phase information: blue, MP; violet, AM; cyan, FCC; orange, BCC+FCC; lightblue, HCP; red, BCC; green, IM.

While the global variable importance analysis functions at the entire data set level, the BD and CP analyses function at the granularity of each instance or composition. These two methods represent the two extremes in the spectrum of post hoc model interpretability analysis. In addition, there is a need for model interpretability at the intermediate level that will yield insights specific to each phase in our data set (based on the collective similarity or clustering of similar observations). To address this question, we combined the BD plots with the k-means clustering analysis and CP profile data. The pseudocode is summarized in Algorithm 1, which describes the implementation sequence of the BD method, k-means clustering, and CP analysis.

Algorithm 1

Local interpretable ML algorithm using the BD and CP methods along with k-means clustering.

The algorithm starts with the BD analysis for each composition. For a given composition, the BD values are calculated from each trained SVM model in the ensemble and averaged across all 50 ensembles. The results are stored as a data frame. We then perform clustering analysis using the k-means algorithm, assigning a cluster label to each data point. We also construct CP profiles for each composition in the data set and group them according to the cluster labels. We then calculate the average CP profile for each cluster. The final outcome is two plots for each cluster: (1) averaged BD plots and (2) averaged CP profiles. Visualization of the two plots will yield phase-specific interpretation of the eSVM model. For k-means clustering, we plotted the total within sum of square as a function of the number of clusters to infer about the optimal k value (Supplementary Fig. 2a). The common recommendation in the literature is to select a k that provides the most useful or interpretable solution67. Although we could not find a clear elbow point, we selected k = 10 clusters after exploring several k-values. In Supplementary Fig. 2b, we project the high-dimensional data into two-dimensions using principal component analysis. The 10 clusters were then analyzed using histograms as shown in Fig. 5 and Supplementary Fig. 3, where we plot the frequency of occurrence of the number of components in the alloy composition for each cluster. Figure 5 shows that clusters 1, 5, 7, and 10 capture patterns that are representative of the binary systems. Given our interest in the design of HEAs, which normally consists of more than four components, we do not discuss the results from clusters 1, 5, 7, and 10. All other cluster provide important clues for uncovering phase-specific variable importance analysis that pertain to the MPEAs and HEAs. Instead of explaining each cluster in detail (which is beyond the scope of this paper), we only focused on specific clusters where the ML predictions agreed closely with the experimental labels in the data set.

Fig. 5: Clustering analysis.
figure 5

The distributions of the number of components (denoted as NComp) for the 10 clusters from k-means clustering analysis. Each cluster is also identified by phase selections via the BD-based prediction as shown in the titles of each plot.

In Supplementary Table 4, we compared the ML prediction accuracy for each of the 10 clusters. Figure 5 indicates that clusters 8 and 9 are representative of the MPEAs. Although cluster 4 is also representative of MPEAs (six-component alloys), it contained fewer data points than clusters 8 and 9. Therefore, we focused on clusters 8 and 9 for model interpretation. The prediction accuracy data from eSVM reveals that clusters 8 and 9 are representative of the BCC and AM phases, respectively. The averaged variable attribution analyses from the BD method for clusters 8 and 9 are shown in Fig. 6a, b, respectively. The mean_NsValence and maxdiff_AtomicWeight variables are identified as important variables for both BCC and AM phases. Since the maxdiff_AtomicWeight variable can be related to the atomic size mismatch, this result is in good agreement with the previous studies68,69. Figure 6a indicates that mean_MeltingT, maxdiff_NUnfilled, and mean_DeltaHf are key variables for the formation of BCC phase. From Fig. 6b, it can be inferred that maxdiff_Electronegativity, mean_NValence, and MixingEntropy are important for forming the AM phase. The relationship between mean_DeltaHf and BCC phase also agrees well with the previous published results70.

Fig. 6: Averaged breakdown analysis of clusters 8 and 9.
figure 6

The averaged contribution from each variable for a cluster 8 (BCC phase) and b cluster 9 (AM phase). The first row contains the sum of the overall mean prediction value, along with the standard deviation. Red dots and yellow lines stand for median values and error bars, respectively.

The averaged BD plots from other clusters are also displayed in Supplementary Fig. 4, and the interpretations are summarized in Supplementary Table 5. The analysis reveals similarities between BCC and IM phases, and between FCC and AM phases. The MP phase does not appear to have distinct characteristics. This may be due to the fact that the alloys of MP phase have a wider range of data distribution arising from relatively more abundant data and many different types of mixed phases compared to those with other phases that are more unique.

We next visualize the averaged CP profiles for clusters 8 and 9, which provide a more detailed account of the relationship between the input variables and the phases. The CP profiles for BCC and AM phases are shown in Fig. 7a, b, respectively. Not all input variables have unique functional relationships. For example, in Fig. 7a (representative of BCC phase), similar functional relationships are observed between: (1) frac_pValence, maxdiff_NUnfilled, and min_NpUnfilled, (2) mean_CovalentRadius and mean_DeltaHf, (3) dev_NdValence, maxdiff_Electronegativity, and mean_NValance, and (4) mean_NsValence and mean_MeltingT. The maxdiff_AtomicWeight and MixingEntropy are the only two variables that do not share a similar relationship with any other variable. From Fig. 7a (with all else being equal), we infer that any change in the MixingEntropy will not affect the overall composition-phase relationship of the elements in the BCC phases. This gives us the flexibility to tune the composition, without having to worry about any competition with other phases. However, we cannot infer the same for the compositions in Fig. 7b (that correspond to the AM phase). At lower values of MixingEntropy, the Blue curve (MP phase) has a higher predicted probability compared to the Violet curve (AM phase). The phase formation region of the AM phase is predicted to be relatively narrow. This example shows the promise of CP profile plots to uncover important insights that govern the composition-phase relationships in the MPEA family.

Fig. 7: Averaged Ceteris Paribus profiles for clusters 8 and 9.
figure 7

The averaged CP profiles for a cluster 8 (BCC phase) and b cluster 9 (AM phase) with respect to the 12 input variables. The black dots indicate the true feature values (normalized) for all the data points within that cluster. Line colors denote phase information: blue, MP; violet, AM; cyan, FCC; orange, BCC+FCC; lightblue, HCP; red, BCC; green, IM.

We also made an attempt to connect the averaged BD plots (Fig. 6a) with the averaged CP profiles (Fig. 7a) for the BCC phase. We found that high mean_MeltingT, high mean_NsValence, and mean_DeltaHf values between 0.3 and 0.5 favor BCC phase formation. From the standpoint of maxdiff_AtomicWeight and maxdiff_NUnfilled variables, MPEAs tend to form in BCC phase when the constituent elements have moderately different atomic weights and similar number of the unfilled valence orbitals. In the case of AM phase (Fig. 7b), while high mean_NsValence values are preferred, low mean_NValence values favor AM phase formation. Low MixingEntropy should be avoided, because it appears to favor the formation of mixed phase (blue curve in Fig. 7b). There is a window of values for maxdiff_AtomicWeight and maxdiff_Electronegativity that favor AM phase formation. In Fig. 7b, extreme values of maxdiff_AtomicWeight and maxdiff_Electronegativity appear to favor mixed phase.

So far, we have been comparing the averaged CP profiles within a cluster. We also observe some interesting patterns between the two clusters. For example, maxdiff_AtomicWeight, mean_CovalentRadius, mean_NValance, mean_NsValence, frac_pValence, mean_DeltaHf, and min_NpUnfilled have similar functional forms. In contrast dev_NdValence, maxdiff_Electronegativity, MixingEntropy, maxdiff_NUnfilled and mean_MeltingT show distinct functional dependencies. The implications of these results are not entirely clear, but showcases the potential of local model interpretability methods for in-depth examination of the black-box models.

In Fig. 8, we show the distribution of constituent elements in clusters 8 and 9. The elements on the left side of the d-block in the periodic table, along with Al, are found in the BCC cluster (cluster 8). In contrast, the compositions representing the AM phase (cluster 9) show a scattered distribution of elements from the d-block. The existence of Be atom in the AM cluster likely implies the connection between the AM phase and a large difference in atomic weight. From the pie charts, we can see that both Ti and Zr are the major elements in both BCC and AM clusters. When it comes to unique elemental constituents, the elements of Nb, Ta, Mo, and V are commonly found in the BCC phase, whereas Cu, Ni, and Al are in the AM phase. Other clusters are also analyzed in the same manner and the results are shown in Supplementary Fig. 5. For FCC, the constituent elements are distributed in the first and second rows of the d-block from the periodic table. The MP phase is similarly related to the first row of the d-block, but several of the p-block elements also participate in the formation of MP phase.

Fig. 8: Periodic table representation of elements in clusters 8 and 9.
figure 8

The constituent elements present in clusters 8 (BCC phase) and 9 (AM phase) are a depicted in the periodic table and b analyzed by pie charts, where each number shows their frequency of occurrence. The purple (dashed), red (solid), and blue (dotted) circles indicate the elements appearing in both BCC and AM phases, only BCC phase, and only AM phase, respectively.

Discussion

There is an increasing interest in the application of model interpretability tools to problems in materials science71,72,73,74,75. The expectation that the ML model should also explain the underlying patterns of materials phenomena in addition to the predictions has been steadily increasing. There are also papers from other disciplines, such as bioinformatics, that share similar goals76. We have developed a post hoc ML model interpretability framework for the MPEA phase classification problem. The methodology provide an in-depth analysis of the complex black-box models and extracts interpretable patterns from an ensemble of trained models. In the materials informatics literature, the results from global variable importance are widely used to interpret which variables are strongly related to the ML performance. We argue that phase-specific (or class label specific) variable importance analysis based on local model interpretability offers a distinctive way to gain much deeper insights into the global variable importance results. To illustrate this point, we also compared the global and local variable importance plots to glean additional insights (main results are distilled in Supplementary Table 6). Note that the top three variables from the global variable importance analysis (interpreted solely on the basis of the mean values), namely MixingEntropy, dev_NdValence, and mean_CovalentRadius, are not associated with either the single-phase BCC or FCC compositions that have attracted interest for tailoring the mechanical properties of the HEAs77. The fact that these variables are connected to the MP phase indicates that the presence of a large fraction of the MP phase in the dataset significantly affects (or biases) the global variable importance analysis. One can also find that the important variables for BCC and FCC from the BD plots are not ranked highly by the global variable importance. Therefore, pursuing MPEA design based solely from global variable importance analysis could potentially mislead the researchers especially from the context of a multi-class classification learning setting. Augmenting global variable importance analysis with local feature importance has many desirable characteristics for rationally tailoring MPEA compositions with desired properties.

Methods

Data preprocessing

The dataset collected from the literature consists of 1,817 compositions after deleting the duplicate data and missing values. Descriptors are generated by the Magpie program53 which is a package to compute the concentration-weighted values of materials using the elemental or pairwise properties of components. The mixing enthalpy is calculated based on the Miedema semi-empirical theory54 for binary alloys and the Pauling electronegativity is considered for electronegativity descriptors whose data are also available in the Magpie program. All the formulae used in the Magpie program are summarized in Supplementary Note 1. To find the independent descriptors among 125 descriptors, the feature values are normalized by min-max scaling and then analyzed using pair-wise Pearson correlation within the RSTUDIO environment78.

Machine learning

We employed the eSVM models for multi-class classification learning tasks79. The eSVM algorithm comprises multiple SVM models generated by the bootstrap sampling method80. We used the nonlinear Gaussian radial basis function kernel, as implemented in the e1071 package81. One can generate a large number of training sets using the bootstrap sampling, where samples are randomly drawn with replacement. Every resampling produces two types of samples: (1) in-bag and (2) out-of-bag (OOB), which are used for training and testing the ML models, respectively. The optimization of eSVM hyperparameters is done by the OOB evaluation using grid search.

Breakdown and Ceteris Paribus methods

To interpret the trained eSVM model, the BD and CP profile methods as implemented in the DALEX package62 were applied to compute the contributions of features and individual profiles to ML prediction, respectively. The k-mean clustering algorithm from the factoextra package82 was used to divide the dataset containing the BD values into clusters in an unsupervised fashion. Local feature importance is analyzed based on the averaged BD data by identifying the correlation between each cluster and the phase selections as predicted by the BD method. The global variable importance of the eSVM is obtained by averaging the outputs of global variable importance for each individual SVM part across all the bootstrap samples.

Web application

Applications developed with the Shiny package83 in the R programming language allow users to interactively engage with models defined in the server end (server.R). The front end of the application, contained in the user-interface script (ui.R), takes a user inputted string composed of element symbols followed by the amount of the element (e.g., Al1.0V1.0Nb1.0T1.0) representing the composition of the HEA. The trained eSVM model in the backend generates the phase probability for the given composition. Additionally, the users can obtain the set of 12 descriptors (Table 1), generated using an R script based on the Magpie package. For each composition, the user can add the phase probability and descriptor information to a dynamic history, able to be exported as a comma-separated value file at the end of the session. For each of the 1,367 points in the training set, users can see the associated BD plot and CP profiles. The web app can be accessed at https://adaptivedesign.shinyapps.io/AIRHEAD/.