1 Introduction

High entropy alloys (HEAs) are classically defined as the multi-principal element alloys (MPEAs) formed by mixing equal or near-equal amounts of five or more elements, and they have been ascribed to possess four major characteristics, namely, high-entropy effect, sluggish diffusion effect, severe lattice distortion effect and cocktail effect [1,2,3,4] . On the other hand, the alloys consisting of three or four principal (base) elements in equal or near-equal atomic compositions are defined as medium entropy alloys (MEAs) [5]. In order to incorporate a wide range of alloys with more than three base elements (each element existing in significantly large proportion, but not necessarily ensuring equiatomic composition), the term multi-principal element alloy (MPEA) will be utilized in this study [2, 6]. HEA or MPEAs characterized by equiatomic composition of constitutent elements are located at the center of the multicomponent phase space [1, 7, 8], and this is illustrated in Fig.  1(a). It should be understood that MPEAs with principal elements not in the equiatomic proportion , will have a reasonable offset from the geometrical centroid of the triangle of the figure. Thus, it has to be understood that the term MPEAs includes both HEAs and MEAs, having composition of base elements in relatively larger proportion. Recently, a broader term compositionally complex alloys (CCA) has been coined to represent MPEAs  [9].

The common motivation behind increasing the number of principal or base elements in MPEAs is in maximizing the configurational entropy of the mixture so that suppression of intermetallic (IM) phase and stabilization of the disordered solid solution (SS) phase is obtained [2, 10]. Another purpose of utilizing HEAs is in the development of hierarchical nanostructures that can demonstrate stability and compatibility while simultaneously possessing soft and hard phases, or in general with two extremely opposite physico-chemical properties  [11]. In other words, HEAs or MPEAs are attributed with a wide range of properties or combination of physicochemical properties that are otherwise impossible to attain with alloys having a single major element  [12]. These multitude of characteristics associated with MPEAs enable them to be categorized in the list of materials for future.

The major challenge associated with the design and discovery of MPEA materials is the determination of the phase(s) stability in these alloys. The phase diagram of MPEAs are not yet well defined and their design is thus based on the use of empirical phase rules. The effect of additional elements on the structure of primary element is quite different in MPEAs from conventional multicomponent alloys. As MPEAs have several principal elements, so the traditional alloy-design methods and rules can no longer be applied for the phase predictions [13]. A single empirical criterion is not sufficient for the generic design of MPEAs [14] and thus a combination of many such criteria again produces a multitude of variables regarding the phase prediction. Data-driven approaches are the modern approaches that can address situations surrounding compositionally complex alloys  [15]. In this scenario, machine learning (ML) technique can be the most suitable tool for the phase stability prediction in a generic multiprincipal element alloy/high entropy alloy/compositionally complex alloy. Besides this, there have been ongoing interests among researchers to apply the MPEA materials in broad interdisciplinary areas. Chances are high that a person without sufficient knowledge of the thermodynamics and materials science of MPEA , can be working on applying these materials for several engineering applications. In this scenario, it is mandatory to make available some libraries or toolkits that can largely facilitate the phase design of MPEA without the requirement of advanced scientific knowledge in this field. This study is aimed at accelerating the materials design of MPEA by resolving the abovementioned issues.

An artificial neural network (ANN) is developed in this study to predict phase in a given MPEA. Then a python based Multi-principal Element Alloys Laboratory (pyMPEALab) software interface is built on top of this ANN model , so that a user without the prior knowledge of the detailed thermodynamic information of MPEA can predict its phase by solely entering the composition information of the alloy. Almost all of the numerical fingerprints of MPEAs considered at the input layer of the neural network model in this study are composition dependent, and this verrifies the rationale of creating an user interface accepting only the information of compostion. Finally, it is demonstrated how such phase predictions can be utilized to understand the change in mechanical behavior of the MPEAs on varying the composition of dopant elements.

2 Numerical Representation of Multi-principal Element Alloys

In order to enable the machine learning of multi-principal element alloys (MPEAs) ,it is necessary to describe an MPEA numerically and this numerical representation can be accomplished by employing the fingerprinting method [16] . Figure 1a illustrates how the CoCuFeNi MPEA can be numerically reduced to eight numerical fingerprints or descriptors, namely, composition, atomic size difference (\(\delta\)) , electronegativity (\(\Delta \chi\)), enthalpy of mixing (\(\Delta H_{mix}\)), entropy of mixing (\(\Delta S_{mix}\)), dimensionless \(\Omega\) parameter, valence electron concentration (VEC) and phase types (solid solution, intermetallics , and amorphous). The details about the mathematical and/or quantifying procedure associated with the numerical representation of this MPEA has been elaborated in Appendix  1. As many of these parameters are frequently encountered when discussing Hume-Rothery rules, it can be understood that the featurization of MPEA is performed in such a way that the model would produce thermodynamically consistent outcomes. MPEAs can turn out to be exceptions for Hume-Rothery rules , and in this context data-driven approaches tend to be the most appropriate way forward for studying the phases of MPEA. the This study is concerned with the phase prediction of MPEA, the phase of the alloy will be considered as the output of machine learning model. The \(\delta\) is related to the size effect of the components of the MPEA, whereas \(\Delta \chi\) i s associated with the chemical compatibility between the elements. Mixing enthalpy is also a measure of the chemical compatibility or probability of combination among the principal elements of the MPEA  [17, 18], and is an unavoidable feature for describing the MPEA  [19]. The mixing entropy (\(\Delta S_{mix}\)) for a N-element alloy tends to be maximum when the alloy is equi-atomic ratio and this justifies that HEA (subset of MPEA) in general are characterized by larger entropy of mixing than the traditional alloy. This distinction between traditional alloys and HEA serves as a qualifying factor for \(\Delta S_{mix}\) as a numerical representation of MPEA. It is difficult to compute free energy of mixing (\(G_{mix}\)) for different phases of the high entropy alloy or MPEA . The dimensionless \(\Omega\) parameter is defined as \(\Omega = \frac{T_m\,\Delta S_{mix}}{\left| \Delta H_{mix} \right| }\) and as it consists of the same quantities \(T_m\,\Delta S_{mix}\) and \(\Delta H_{mix}\) , it is appropriate to utilize this parameter as a numerical fingerprint alternative to \(G_{mix}\)  [20]. Although \(\Omega\) is a descriptor derived from \(\Delta H_{mix}\) and \(\Delta S_{mix}\), it contains an additional information about the melting point temperature (\(\hbox {T}_m\)) of the components of the alloys thereby allowing it to be eligible as a separate numerical fingerprint. While the numerical values of electronegativity and mixing enthalpy are more specific to indicating the formation probability of solid solution against the compounds, it is desirable to distinguish the particular solid solution phases (namely, FCC and BCC) . BCC phase is characterized by a smaller magnitude of VEC whereas FCC phase demonstrates a large VEC value  [21].

Fig. 1
figure 1

HEA (subset of MPEA) are located in the center of multicomponent phase space. A multi-principal element alloy (MPEA) is represented by the eight descriptors or fingerpringts as elaborated in a. These fingerprints serve as the model features of neural network for prediction of phases. The composition descriptor (x) consists of the information about the atomic fraction of the 34 elements considered in this study . For the CoCuFeNi MPEA, the atomic fraction of the elements Co, Cu, Fe and Ni is 0.25 each whereas that of the remaining 30 elements is 0. In order to show the data scatter of specific solid solution phases (BCC, FCC and BCC + FCC), some five descriptors are utilized in b to construct a pairplot

With a rigorous search for the data related to MPEA from the pre-existing literatures [6, 13, 14, 20,21,22,23,24,25,26,27,28,29,30] , a dataset consisting of the information about the abovementioned eight numerical representations (fingerprints) of total 1229 multi-principal element alloys is prepared. During the dataset generation procedure, the cells which are characterized with missing values were filled in using data imputting technique. In order to get an insight of the dataset, it is necessary to visualized the data. Figure  1b shows the pairplot of the scatter data of four numerical descriptors (\(\Delta H_{mix}\), \(\Delta S_{mix}\), \(\Omega\) and VEC), with the fifth descriptor (solid solution phase type) serving as the variable for hue parameter. In the figure, it is possible to visually separate out BCC phase data (blue color) from the FCC phase data (red color) when plotting VEC against either of the three variables (\(\Delta H_{mix}\), \(\Delta S_{mix}\) and \(\Omega\) ). However, when any two features among \(\Delta H_{mix}\), \(\Delta S_{mix}\) and \(\Omega\) are plotted against each other, it is not so easy to categorize the BCC phase with the FCC phase. This again justifies the previous explanation of the importance of using VEC for sorting out a specific phase (FCC, BCC or FCC+BCC) within a solid solution phase. When comparing a feature (\(\Delta H_{mix}\), \(\Omega\) or VEC) with \(\Delta S_{mix}\), it can be observed that most of the data is concentrated at larger values of \(\Delta S_{mix}\) (i.e. at 10 J/( mol K) or above). This implies that the single solid solution phases in MPEA ( e.g. BCC (blue colored data points) or FCC (red colored data points) ) are more favorably formed at high configurational entropy values. However , a very small amount data points of BCC or FCC phases are also observed at the range 2.5–7.0 J/( mol K). This observation suggests that larger magnitude of \(\Delta S_{mix}\) does not always guarantee the formation of single-phase solid solution phase in MPEA  [27]. Owing to  the fact that entropy may not always dominate the contribution in the system’s Gibbs free energy , it can be argued that the observation of the outliers (single-phase solid solution at low values of \(\Delta S_{mix}\)) is also thermodynamically consistent. With the presence of such outliers, it is now clear that \(\Delta S_{mix}\) can not be utilized as the sole variable to predict the stability of single-phase solid solution phase. This provides a justification or rationale for consideration of multiple numerical fingerprints for description of MPEA.

Fig. 2
figure 2

The (IM), (IM+SS) and Δ(SS or AM) are represented as the pair plots of 4 aspects -ΔHmix , number of components, Δχ  and δ in (a).  PCA based data exploration  technique shown in (b) does the task of dimensionality reduction of dataset with 40 input features by representing them as two orthogonal principal components (principal component 1 and principal component 2)

Figure 2a is the visualization of the data and is aimed at utilizing the descriptors to outline the regions at which only IMC compound are formed in the MPEA. The red colored spots , representing IMC-only containing MPEA alloys are predominantly concentrated at the centre of the graphs when the mixing enthalpy is plotted against either electronegativity or atomic-size difference. This corresponds to \(-5.0 kJ/mol \ge \Delta H_{mix} \ge -40.0 kJ/mol\), thus verifying that intermetallic compounds are predominantly formed when the mixing enthalpy has a larger negative value. The green colored spots , at the positive values of mixing enthalpy are predominantly solid solution phases. The blue colored regions characterized by negative \(\Delta H_{mix}\) values (features of IM phase) but lower electronegativity difference (feature for SS phase) characterize the MPEAs that are mixtures of both IM and SS phases. In the plot of \(\delta\) and \(\Delta H_{mix}\), the sparseley scattered green colored points with large \(\delta\) values (\(\delta \ge\) 15%) are not the SS phase but the amorphous phases (e.g. bulk metallic glasses). As shown in Fig. 2b, principal component analysis (PCA) is utilized to visualize the scatter plot of the three generic phases - SS (red color), IM (green color) and AM (blue color). Having achieved the dimensionality reduction [31] from many features to only two principal components (principal component 1 and principal component 2) through PCA, the three phases are clearly distinguishable in the figure.

Fig. 3
figure 3

The neural network of a has 40 input features : 6 features corresponding to \(\delta\), \(\Delta H_{mix}\), \(\Delta S_{mix}\), \(\chi\), \(\Omega\),and VEC ; and 34 features corresponding to the labels of 34 elements (compositions). The model of a is designed as a multi-label classification problem to output the occurrence of a phase or phases. While the model uses the generic labels for IM and AM phases, it uses two different way to recognize SS phase at the output. The phases identified as BCC and FCC in the training dataset are retained at the output with the same specific label names whereas HCP phase and unspecified solid solution phase are assigned the general label of solid solution (SS) phase. The image in b shows the plot of model accuracy against epochs. The workflow of the pyMPEALab GUI is schematically illustrated in c

Table 1 The originally prepared dataset with 1229 observations is characterized by the non-uniformity in the range of values for the numerical fingerprints designated as the input features

Much of the core dataset has been prepared with reference to the room temperature (\(\hbox {T}_{room}\)). However, owing to the variations in experimental techniques utilized to determine phases, the temperature (T) correpsonding to phase feature of MPEA is of a value defined as \(T_{room} \le T \le 0.9T_{m}\). Thus the uppper limit of the temperature value for phase data is a value lower than the melting point (\(\hbox {T}_m\)) of the MPEA. Unlike temperature dependent mechanical properties , phase has a fixed value for a range of temperature , and the phase information at elevated temperatures are validated with the results presented corresponding to lower temperatures.

3 Data-driven Model with GUI Interface for Phase Prediction

The seven numerical descriptors (atomic size difference, electronegativity, VEC, Entropy of mixing, Enthalpy of mixing, \(\Omega\) and composition) constitute the 40 input features or the variables for the neural network based machine learning model schematically shown in Fig.  3a. Each numerical fingerprint at the input except compostion are represented by a single feature, whereas the descriptor composition is represented by 34 features corresponding to the total number of elements that can be considered in the design of the MPEA. The elements that are designated as variables or features in the present study are Ag, Al, Au, B, Be, Ca, Ce, Co, Cr, Cu, Fe, Gd, Hf, La, Li, Mg, Mn, Mo, Nb, Nd, Ni, Pd, Sc, Si, Sn, Sr, Ta, Ti, V, W, Y, Yb, Zn and Zr . The numerical descriptors represent different physical quantities , and so the data is charaterized by the non-uniformity with respect to the range of feature values. As presented in Table  1, the feature \(\Omega\) ranges between 0.32 (minimum value) and 1423.28 (maximum value), and has the largest difference between the extreme values. On the other hand, the atomic-size difference (minimum value = 0.0027; maximum value=0.21) has the lowest range of values. If the input features are supplied to the neural network without scaling, then it is certain that the \(\Omega\) parameter will largely dominate the result of the machine learning model; whereas the change in parametric values of \(\delta\) will have no influence on the phase prediction. In the similar manner, the parameters \(\Delta H_{mix}\), \(\Delta S_{mix}\) and VEC would have notable impact on the result whereas \(\Delta \chi\) and features for atomic fraction of the alloying element would have least influence. In order to enable the machine learning model be insular of relative range of scattering of different input features in the datasets  [32] , scaling and normalization are employed in this work . For the 34 elemental compositions of MPEAs, scaler standardization was done with mean set to 0 and standard deviation to 1. Where as for the 6 alloy properties, due to presence of few outliers in the dataset which dragged the whole model to extremities during the calculation of mole and biases, robust standardization was implemented where the normalization was done on the basis of median and quartile deviation of the dataset reducing any effects of extreme numerical values of the alloy properties. With this mathematical transformation, all the features of the dataset are now numerically within the range 0–1. The standardization and normalization procedures thus enable the atomic-size difference and \(\Omega\) to now be relatively in a comparable level in terms of their influence or dominance over the result of the neural network analysis.

Fig. 4
figure 4

Al-doped Cantor alloys exist as either FCC phase or BCC phase depending upon the atomic ratio (x) of Al in \(\hbox {Al}_x\)CrCoFeMnNi alloys. At x in the range 0.1–0.3 , pyMPEALab predicts the alloy to exist in FCC structure wheres at values of x \(\ge\) 0.74, single BCC phase is predicted as the stable structure. The data-driven prediction model predicts the alloy to exist as duplex fcc and bcc structures at x in the range 0.3–0.73. The phase boundaries of the single phase crystal structures of this work is compared with the two reference works (He et al, 2014 and Sun et al, 2017)

The neural network of Fig.  3a consists of an input layer, 3 hidden layers (n = 3) and an output layer. The input layer has 40 neurons (blue colored circles) corresponding to each input feature. All of the three hidden layers are identical to each other in terms of their neuron size (green colored points). Each hidden layer (HL) in this study has 10 neurons. The output layer has 7 neurons (yellow colored circles) and each of these output neuron corresponds to the numerical encoding of a phase name.

The train-validation-test proportion was maintained as 8:1:1. 80% of the dataset was allocated as training dataset whereas the reamining 20% was divided equally to validation and test datasets. A small batch size of 5 was selected for model run as increasing the batch size resulted in reduced accuracy of model. Reducing the batch size to a smaller value increased the computation time with no any significant increase in accuracy.

The fact that two or more phases can co-exist in a multi-principal element alloys (MPEAs) makes it unsuitable to be designed as a multi-class classification problem. The possiblity of the co-existence of multiple phases in a MPEA requires the problem to be defined as a multi-label classification problem. In context of multi-label classification, an alloy can belong to more than one class or output feature. A mathematical condition wherein the outputs are not mutually exclusive, can be best represented by assigning sigmoid function as an activation function of the output layer. The sigmoid function squashes the pre-activated output values in the range (0,1) but it treats them independently such that there can be high probabilities in all, some or none of the classes. The sum of the probabilities of the activated output classes does not have to be necessarily 1. For a multi-label classification problem using sigmoid function at the output, binary cross entropy (BCE) function is selected as the cost function for the neural network. By running 40 different models each with a different set of HL activation function, learning rate and optimizers were compiled and run using tensorflow software  [33] . Inorder to avoid any over-fitting of the model during the training process, a built-in Keras function called Callbacks  [34] was used to early stop the training of the model if validation loss starts to increase during the model fitting process which was checked at every epoch. Among these 40 models, the model defined with Stochastic Gradient Descent(SGD) optimizer, designated a learning rate of 2.0E–03 and using Leaky ReLU activation functions at the 3 hidden layers showed the most preferrable accuracy metric for both training and validation data, and the plot of accuracy curves for this model are presented in Fig. 3b. At the end of 292 epochs, the model attains the accuracies of 91.97% and 93.05% for training and validation data respectively. This model is henceforth selected for performing the prediction task.

Most of the earlier works on machine learning based phase predictions of HEA/MPEA  [19, 35, 36] do not consider the composition vector of alloy as a fingerprint in the model. Neither do these models associate the composition vector with other fingerprints. In such a scenario, the user has to know in advance the numerical information of all the fingerprints of a MPEA to be supplied as input. This limits the ease of use and flexibility associated with the prediction task. The prediction model of the current study is housed with a graphical user interface (GUI), named as python based Multi-principal Elements Alloy Laboratory (pyMPEALab) toolkit, which is capable of mapping the composition vector with the remaining 6 numerical fingerprints. Thus all of the input features of the neural network model in this work are computed automatically by the toolkit upon the entry of sole information of the composition of the MPEA. Databases of the 34 elements having the complete information on atomic radius (\(\hbox {r}_i\) ), electronegativity (\(\chi _i\)), \(\hbox {VEC}_i\), melting point (\(\hbox {T}_{m,i}\)) of the individual elements and mixing enthalpy of binary alloys of constituent elements i and j(\(\Delta H_{ij}^{mix}\)), are constructed and the GUI accesses these databases during the computation of other input features from the composition vectors . The working principal of this GUI toolkit is schematically illustrated in Fig. 3c. The ability of the the pyMPEALab toolkit to generate the complete information of the input features of the prediction model by demanding only the information of atomic fraction of the alloy component, facilitates the dissemination of this study to a broader community. Users having no prior information of the physical, chemical or thermodynamic properties of a MPEA can correctly predict the phase of the alloy just by typing the information of the atomic fraction of the component. Even it is possible to supply the composition of all of the components with the numbers that are common multiple of their corresponding mole fractions. As shown in the figure, the atomic fractions of Cu, Ni, Co, Cr, Al and Fe in Cu0.5NiCoCrAl0.5Fe3.5 alloy are 0.067, 0.133, 0.133, 0.133, 0.067 and 0.467. However, it is possible to enter 0.5, 1 , 1, 1, 0.5 and 3.5 as the input values in the boxes representing composition for Cu, Ni, Co, Cr, Al and Fe respectively; and the pyMPEALab toolkit automatically converts these values to the corresponding atomic fraction values. Then it also generates the values for other input features, and predicts the phase (FCC) for the supplied information of the alloy.

As pyMPEALab is built on top of Tensorflow and Keras, and uses the python packages such as tkinter, pandas, numpy , scikit-learn and pickle, the script for this software toolkit is characterized by lucidity and possesses the flexibility for including additional capabilities.

4 Application-oriented Materials Design Assisted by pyMPEALab Toolkit

In silico design of engineering materials using data-driven approaches is considered as one of the major pillars that can help in realization of the goals of Industry 4.0. The neural network model described in Sect.  3 is the backbone of pyMPEALab toolkit. Considering the in-built ability of the GUI interface to map the data of alloy compositions with the properties of the alloy, the phase prediction procedure in the toolkit has been made possible without having the user to know the physico-chemical properties of the alloys. The rendering of simplicity of the toolkit without the compromise in accuracy, can be considered as the major impetus for guaranteering the accelerated design of high entropy alloys. Developing a robust phase diagram for a given MPEA system has always been a challenge for materials scientists and engineering. Further improvements and adaptations of the software features of pyMPEALab can surely help in reducing some of the challenges. This topic is the scope of future study.

Fig. 5
figure 5

Predicted phases for \(\hbox {CoCrNiNb}_x\) alloy at (i) x =0, and (ii) x = 0.5 are illustrated with a portion of the dashboard of pyMPEALab GUI in a. The unit of length for geometry of tensile test specimen shown in b is in m. The cross-section of the specimen is 0.004 m \(\times\) 0.002 m at the mid-length of the specimen whereras it is 0.01 m \(\times\) 0.002 m at the end. The image c shows the results of finite element analysis for principal strain in (i) CoCrNi, and (ii) \(\hbox {CoCrNiNb}_{0.5}\) alloys

The correct prediction of phase at different composition of MPEA is very useful in designing materials for engineering applications. In this section, we employ pyMPEALab toolkit to predict phases at different composition of two MPEAs - (I) Al-doped Cantor alloys, and (II) Nb-doped CoCrNi alloys, and subsequently play the vital role assisting or accelerating their design for engineering applications.

4.1 Application I: Facilitating the Phase Design of Al-doped Cantor Alloy

The FCC single phase alloy CrCoFeMnNi (having the five elements in equiatomic proportion) is known as Cantor alloy  [7, 37]. Recently, there have been ongoing interests in studying about the effects of doping of Al on the mechanical properties of Cantor alloy . It is well known that the size of Al is 12% larger than the elements of the Cantor alloy, and the addition of Al into the Cantor alloy at small proportion (atomic fraction in the range 0.0025-0.005) leads to the notable lattice distortion in the resulting material  [38]. The large distortion volume of Al atoms subsequently prevents the dislocation motion , and thus ensures the solid solution strengthening mechanism in the FCC structure.

However, when Al is doped at a larger atomic fraction, the crystal stucture of the resulting alloy may change. The phase structure of \(\hbox {Al}_x\)CrCoFeMnNi alloy may exist as either single FCC or dual FCC + BCC, or single BCC state depending upon the value of atomic ratio (x) . Since the mechanical behavior of the alloy can be correlated with phase structure, it is very important to outline the phase boundaries of this alloy. pyMPEALab GUI interface is utilized in this study to predict the phase structure of \(\hbox {Al}_x\)CrCoFeMnNi at the different values of x (relative atomic ratio of Al), and the results are presented in Fig. 4. The red colored square points are the prediction values of the pyMPEALab software. Pure Cantor alloy, CrCoFeMnNi (x = 0) is predicted as single phase FCC structure. At x in the range 0–0.3, the data-driven model predicts the \(\hbox {Al}_x\)CrCoFeMnNi alloy to exist in single phase FCC crystal structure. When x is varied within the range 0.31–0.73, the Al-doped Cantor alloy is predicted to exist as duplex FCC and BCC phase structures. And, when x is increased beyond 0.74, the predicted crystal structure is single BCC phase. The predicted phase boundary between single FCC phase and dual FCC + BCC phase in \(\hbox {Al}_x\)CrCoFeMnNi is x=0.3 and the boundary between dual phase and single BCC crystal structure is x=0.74. The corresponding phase boundaries between the single FCC phase and the dual phase reported in He et al  [39] , and Sun et al  [40] are x=0.435 and x=0.488 respectively. The reported phase boundaries between single BCC phase and dual FCC+BCC phase are x=0.952  [39] and x=1.658  [40]. The phase design criteria of pyMPEALab is more stricter than the references  [39, 40] when it comes to the prediction of single FCC phase and duplex FCC+BCC phase boundary. However, in the determination of BCC/(FCC+BCC) phase boundaries, Sun et al [40] reports that single BCC phase is possible only at x=1.658 or larger, and presents the strictest criteria. The prediction result of the neural network model are in reasonable agreement with regard to the values of VEC (one of the input features) . At VEC of equal to or less than 7.36 , the pyMPEALab interface predicts the phase as single BCC phase , whereas the alloy \(\hbox {Al}_x\)CrCoFeMnNi is predicted to exist as FCC phase for an input VEC eaual to or more than 7.72 . The reference VEC values for FCC/duplex and BCC/duplex phase boundaries reported in He et al  [39] are 7.60 and 7.20 respectively. The BCC/duplex phase boundary mentioned at Sun et al occurs at VEC value of 6.75 , whereas the VEC value associated with FCC/duplex phase boundary is 7.56. In general, FCC crystal structure is associated with a larger VEC as compared to BCC phase structure; and it confirms the thermodynamic consistency of our neural network model with respect to the phase prediction

In single FCC solid solution region, \(\hbox {Al}_x\)CrCoFeMnNi alloy is characterized by low strength and high ductility; it behaves like a composite (high strength, low ductility) in the mixed two phase FCC+BCC region, and the alloy is extremely brittle in single BCC region  [39]. With these extremities in the mechanical behavior of the alloy characterized by the phase structures , and again as these crystal structures are stable for a given range of atomic fraction of dopant Al element, it is therefore essential to have a data-driven toolkit in hand that can instantaneously predict the phase for a given point value of x. The merit of using pyMPEALab for designing alloys (including Al-doped) Cantor alloy is that it can express the crystal phase structure(s) of alloy as a piecewise continuous function of the composition variable. Though the predicted data points are shown only for discrete values in Fig.  4, it is possible to use the pyMPEALab software to construct a piecewise continuous line y(x) connecting these points, such that y(x) represents single FCC phase for 0 \(\le\) x \(\le\) 0.3, FCC + BCC duplex phase for 0.3 < x < 0.74 , and single BCC phase for x \(\ge\) 0.74 . With this capability, the phase prediction of \(\hbox {Al}_x\)CrCoFeMnNi alloy can be done for all practical values of x.

Though the software shows a correct trend in the phase prediction for Al-doped Cantor alloy, it needs improvement regarding the precise determination of phase boundaries. The phase boundaries predicted by the current model deviates from the experimental phase boundaries, and thus requires further improvement in future. Future work in the design of Al-doped Cantor alloy will be centered on finding out the temperature effects on the shift of phase boundaries. The current model for pyMPEALab is based upon a single temperature value (room temperature) of the input features , and ignores the temperature effects.

4.2 Application II: On the Elastic Behavior of Nb-doped CoCrNi Alloy

The ternary equiatomic CoCrNi medium entropy alloy (MEA) is stronger and tougher than the quinary equiatomic CrCoFeMnNi HEA  [41]. There have been ongoing studies involving mechanical properties enhancement of this MEA via addition of dopant characterized with lattice, atomic radius and elastic modulus mismatch with Co, Cr and Ni  [42,43,44,45]. Of particular interest is the association of the change in elastic behavior with the relevant crystal structure in D\(_x\)CoCrNi alloy , with D being the dopant element with atomic composition of x. We employ pyMPEALab software to predict the phase in Nb-doped CoCrNi (i.e. \(\hbox {CoCrNiNb}_x\)) alloy, and then study the elastic response (strain) of this doped alloy as compared to the CoCrNi alloy. The ternary equiatomic CoCrNi MPEA has a single FCC crystal structure  [41] and our machine learning based model does the truthful prediction as illustrated in Fig.  5 (a). In the same figure it is shown that when the atomic fraction of Nb element in \(\hbox {CoCrNiNb}_x\) is 0.1428 (corresponding to x = 0.5 ) , the software predicts the alloy to exist as intermetallic (IM) phase. The experimental finding by Lu et al  [46] confirms that \(\hbox {CoCrNiNb}_{0.5}\) is found to posses Laves phase structure. When there are no existing phase diagrams that apply universally to predict the phases of MPEAs, these predictions from our model can serve as a reliable tool for thermodynamic phase design of MEAs and HEAs.

In order to understand the elastic behavior of CoCrNi (FCC phase) and \(\hbox {CoCrNiNb}_{0.5}\) (IM phase) under constant external load, finite element analysis is considered as an appropriate method. The geometry / mesh of a tensile test specimen as shown in Fig. 5(b) is identical for both materials (CoCrNi and \(\hbox {CoCrNiNb}_{0.5}\)). A constant load of 200 N is applied at the bottom boundary of the model, and the top boundary is designated as fixed boundary (with zero displacement). The loading and boundary condition are identical with both models. The partial differential equation describing the deformation behavior of the tensile test specimen under linear elasticity is solved for the two materials using finite element method  [47] . The poisson ratio is set as 0.3 for both the materials. The Young’s modulus of elasticity (E) of CoCrNi alloy is assigned a value of 228.64 GPa  [48]. The relationship between yield strength (\(\sigma _{Y}\)), hardness (HV) and E provided in  [49], is utilized to compute the value of E. For \(\hbox {CoCrNiNb}_{0.5}\) , the experimental data of \(\sigma _{Y}\) and HV are obtained from Lu et al  [46], and the computed value of E is 627.43 GPa. The results of the numerical simulation are provided in Fig. 5c. From the figure, the principal strain values along the X direction (axial length of the specimen) is very negligible in the model with material properties of \(\hbox {CoCrNiNb}_{0.5}\) alloy, whereas these values are large for CoCrNi alloy. Since the Nb-doped alloy is an intermetallic phase, it is brittle and is characterized by very less ductility; and thus the phase prediction by pyMPEALab software helps in easy understanding of these phenomena.

5 Conclusion

Data-driven approaches as the fourth paradigm of materials design can largely enable the study of the multi-dimensional characteristics associated with compositionally complex alloys. The scope of the present study is to design a machine learning based toolkit for prediction of phases in MPEAs based upon the information of composition and properties of the constituent elements. The following conclusions was derived from the present work:

  1. 1

    Multi-principal element alloys (MPEA) occuring at and near the center of multicomponent phase space are represented by numerical fingerprints , namely, composition, atomic size difference (\(\delta\)) , electronegativity (\(\Delta \chi\)), enthalpy of mixing (\(\Delta H_{mix}\)), entropy of mixing (\(\Delta S_{mix}\)), dimensionless \(\Omega\) parameter, valence electron concentration (VEC) and phase types. A dataset is prepared for 1229 observations of MPEA databy featurizing these numerical descriptors.

  2. 2

    With composition (described by 34 features), \(\delta\) , \(\Delta \chi\), \(\Delta H_{mix}\), \(\Delta S_{mix}\), \(\Omega\) and VEC as input variables and phase types (SS, IM or AM) as output feature, an artificial neural network is built upon the dataset. The problem is defined as a multi-label classification problem, and the model is able to classify the alloys as a single phase or possible combinations of more than one phase. In context of SS phase, the model can further distinguish the BCC and FCC crystal structures from uncategorized SS phase.

  3. 3

    In order to accelerate the phase design criteria in MPEA, an user-friendly pyMPEALab GUI interface is used on top of the neural network model. The GUI toolkit takes in the information on the composition of the alloy , automatically maps the composition features with reamining other descriptors; and utilizes these all fingerprints to classify the crystal structure or phase structure of the MPEA. Requiring only the input information of atomic composition of a MPEA from the user for the phase prediction, this software interface largely eases the implementation of the machine learning model. A user with no prior knowledge of thermodynamics of MPEAs can easily predict the phase structure by using the pyMPEALab GUI interface.

  4. 4

    The machine learning model housed in the GUI interface tool, is employed to predict phase(s) in Al-doped Cantor alloy, and Nb-doped CoCrNI alloy. Then through engineering applications, it is explained how the phase prediction task can actually help in understanding the mechanical behaviors of these MPEAs.