pyMPEALab Toolkit for Accelerating Phase Design in Multi-principal Element Alloys

Multi-principal element alloys (MPEAs) occur at or nearby the centre of the multicomponent phase space, and they have the unique potential to be tailored with a blend of several desirable properties for the development of materials of future. The lack of universal phase diagrams for MPEAs has been a major challenge in the accelerated design of products with these materials. This study aims to solve this issue by employing data-driven approaches in phase prediction. A MPEA is first represented by numerical fingerprints (composition, atomic size difference , electronegativity , enthalpy of mixing , entropy of mixing , dimensionless Ω\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Omega$$\end{document} parameter, valence electron concentration and phase types ), and an artificial neural network (ANN) is developed upon the datasets of these numerical descriptors. A pyMPEALab GUI interface is developed on the top of this ANN model with a computational capability to associate composition features with remaining other input features. With the GUI interface, an user can predict the phase(s) of a MPEA by entering solely the information of composition. It is further explored on how the knowledge of phase(s) prediction in composition-varied Alx\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {Al}_x$$\end{document}CrCoFeMnNi and CoCrNiNbx\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {CoCrNiNb}_x$$\end{document} can help in understanding the mechanical behavior of these MPEAs.


Introduction
High entropy alloys (HEAs) are classically defined as the multi-principal element alloys (MPEAs) formed by mixing equal or near-equal amounts of five or more elements, and they have been ascribed to possess four major characteristics, namely, high-entropy effect, sluggish diffusion effect, severe lattice distortion effect and cocktail effect [1][2][3][4] . On the other hand, the alloys consisting of three or four principal (base) elements in equal or near-equal atomic compositions are defined as medium entropy alloys (MEAs) [5]. In order to incorporate a wide range of alloys with more than three base elements (each element existing in significantly large proportion, but not necessarily ensuring equiatomic composition), the term multiprincipal element alloy (MPEA) will be utilized in this study [2,6]. HEA or MPEAs characterized by equiatomic composition of constitutent elements are located at the center of the multicomponent phase space [1,7,8], and this is illustrated in Fig. 1(a). It should be understood that MPEAs with principal elements not in the equiatomic proportion , will have a reasonable offset from the geometrical centroid of the triangle of the figure. Thus, it has to be understood that the term MPEAs includes both HEAs and MEAs, having composition of base elements in relatively larger proportion. Recently, a broader term compositionally complex alloys (CCA) has been coined to represent MPEAs [9].
The common motivation behind increasing the number of principal or base elements in MPEAs is in maximizing the configurational entropy of the mixture so that suppression of intermetallic (IM) phase and stabilization of the disordered solid solution (SS) phase is obtained [2,10]. Another purpose of utilizing HEAs is in the development of hierarchical nanostructures that can demonstrate stability and compatibility while simultaneously possessing soft and hard phases, 1 3 or in general with two extremely opposite physico-chemical properties [11]. In other words, HEAs or MPEAs are attributed with a wide range of properties or combination of physicochemical properties that are otherwise impossible to attain with alloys having a single major element [12]. These multitude of characteristics associated with MPEAs enable them to be categorized in the list of materials for future.
The major challenge associated with the design and discovery of MPEA materials is the determination of the phase(s) stability in these alloys. The phase diagram of MPEAs are not yet well defined and their design is thus based on the use of empirical phase rules. The effect of additional elements on the structure of primary element is quite different in MPEAs from conventional multicomponent alloys. As MPEAs have several principal elements, so the traditional alloy-design methods and rules can no longer be applied for the phase predictions [13]. A single empirical criterion is not sufficient for the generic design of MPEAs [14] and thus a combination of many such criteria again produces a multitude of variables regarding the phase prediction. Data-driven approaches are the modern approaches that can address situations surrounding compositionally complex alloys [15]. In this scenario, machine learning (ML) technique can be the most suitable tool for the phase stability prediction in a generic multiprincipal element alloy/ high entropy alloy/compositionally complex alloy. Besides this, there have been ongoing interests among researchers to apply the MPEA materials in broad interdisciplinary areas. Chances are high that a person without sufficient knowledge of the thermodynamics and materials science of MPEA , can be working on applying these materials for several engineering applications. In this scenario, it is mandatory to make available some libraries or toolkits that can largely facilitate the phase design of MPEA without the requirement of advanced scientific knowledge in this field. This study is aimed at accelerating the materials design of MPEA by resolving the abovementioned issues.
An artificial neural network (ANN) is developed in this study to predict phase in a given MPEA. Then a python based Multi-principal Element Alloys Laboratory (pyM-PEALab) software interface is built on top of this ANN model , so that a user without the prior knowledge of the A multi-principal element alloy (MPEA) is represented by the eight descriptors or fingerpringts as elaborated in a. These fingerprints serve as the model features of neural network for prediction of phases. The composition descriptor (x) consists of the information about the atomic fraction of the 34 elements considered in this study . For the CoCuFeNi MPEA, the atomic fraction of the elements Co, Cu, Fe and Ni is 0.25 each whereas that of the remaining 30 elements is 0. In order to show the data scatter of specific solid solution phases (BCC, FCC and BCC + FCC), some five descriptors are utilized in b to construct a pairplot detailed thermodynamic information of MPEA can predict its phase by solely entering the composition information of the alloy. Almost all of the numerical fingerprints of MPEAs considered at the input layer of the neural network model in this study are composition dependent, and this verrifies the rationale of creating an user interface accepting only the information of compostion. Finally, it is demonstrated how such phase predictions can be utilized to understand the change in mechanical behavior of the MPEAs on varying the composition of dopant elements.

Numerical Representation of Multi-principal Element Alloys
In order to enable the machine learning of multi-principal element alloys (MPEAs) ,it is necessary to describe an MPEA numerically and this numerical representation can be accomplished by employing the fingerprinting method [16] . Figure 1a illustrates how the CoCuFeNi MPEA can be numerically reduced to eight numerical fingerprints or descriptors, namely, composition, atomic size difference ( ) , electronegativity ( Δ ), enthalpy of mixing ( ΔH mix ), entropy of mixing ( ΔS mix ), dimensionless Ω parameter, valence electron concentration (VEC) and phase types (solid solution, intermetallics , and amorphous). The details about the mathematical and/or quantifying procedure associated with the numerical representation of this MPEA has been elaborated in Appendix 1. As many of these parameters are frequently encountered when discussing Hume-Rothery rules, it can be understood that the featurization of MPEA is performed in such a way that the model would produce thermodynamically consistent outcomes. MPEAs can turn out to be exceptions for Hume-Rothery rules , and in this context data-driven approaches tend to be the most appropriate way forward for studying the phases of MPEA. the This study is concerned with the phase prediction of MPEA, the phase of the alloy will be considered as the output of machine learning model. The is related to the size effect of the components of the MPEA, whereas Δ i s associated with the chemical compatibility between the elements. Mixing enthalpy is also a measure of the chemical compatibility or probability of combination among the principal elements of the MPEA [17,18], and is an unavoidable feature for describing the MPEA [19]. The mixing entropy ( ΔS mix ) for a N-element alloy tends to be maximum when the alloy is equi-atomic ratio and this justifies that HEA (subset of MPEA) in general are characterized by larger entropy of mixing than the traditional alloy. This distinction between traditional alloys and HEA serves as a qualifying factor for ΔS mix as a numerical representation of MPEA. It is difficult to compute free energy of mixing ( G mix ) for different phases of the high entropy alloy or MPEA . The dimensionless Ω parameter is defined as Ω = T m ΔS mix |ΔHmix| and as it consists of the same quantities T m ΔS mix and ΔH mix , it is appropriate to utilize this parameter as a numerical fingerprint alternative to G mix [20]. Although Ω is a descriptor derived from ΔH mix and ΔS mix , it contains an additional information about the melting point temperature ( T m ) of the components of the alloys thereby allowing it to be eligible as a separate numerical fingerprint. While the numerical values of electronegativity and mixing enthalpy are more specific to indicating the formation probability of solid solution against the compounds, it is desirable to distinguish the particular solid solution phases (namely, FCC and BCC) . BCC phase is characterized by a smaller magnitude of VEC whereas FCC phase demonstrates a large VEC value [21]. With a rigorous search for the data related to MPEA from the pre-existing literatures [6,13,14,[20][21][22][23][24][25][26][27][28][29][30] , a dataset consisting of the information about the abovementioned eight numerical representations (fingerprints) of total 1229 multiprincipal element alloys is prepared. During the dataset generation procedure, the cells which are characterized with missing values were filled in using data imputting technique. In order to get an insight of the dataset, it is necessary to visualized the data. Figure 1b shows the pairplot of the scatter data of four numerical descriptors ( ΔH mix , ΔS mix , Ω and VEC), with the fifth descriptor (solid solution phase type) serving as the variable for hue parameter. In the figure, it is possible to visually separate out BCC phase data (blue color) from the FCC phase data (red color) when plotting VEC against either of the three variables ( ΔH mix , ΔS mix and Ω ). However, when any two features among ΔH mix , ΔS mix and Ω are plotted against each other, it is not so easy to categorize the BCC phase with the FCC phase. This again justifies the previous explanation of the importance of using VEC for sorting out a specific phase (FCC, BCC or FCC+BCC) within a solid solution phase. When comparing a feature ( ΔH mix , Ω or VEC) with ΔS mix , it can be observed that most of the data is concentrated at larger values of ΔS mix (i.e. at 10 J/( mol K) or above). This implies that the single solid solution phases in MPEA ( e.g. BCC (blue colored data points) or FCC (red colored data points) ) are more favorably formed at high configurational entropy values. However , a very small amount data points of BCC or FCC phases are also observed at the range 2.5-7.0 J/( mol K). This observation suggests that larger magnitude of ΔS mix does not always guarantee the formation of single-phase solid solution phase in MPEA [27]. Owing to the fact that entropy may not always dominate the contribution in the system's Gibbs free energy , it can be argued that the observation of the outliers (single-phase solid solution at low values of ΔS mix ) is also thermodynamically consistent. With the presence of such outliers, it is now clear that ΔS mix can not be utilized as the sole variable to predict the stability of single-phase solid 1 3 solution phase. This provides a justification or rationale for consideration of multiple numerical fingerprints for description of MPEA. Figure 2a is the visualization of the data and is aimed at utilizing the descriptors to outline the regions at which only IMC compound are formed in the MPEA. The red colored spots , representing IMC-only containing MPEA alloys are predominantly concentrated at the centre of the graphs when the mixing enthalpy is plotted against either electronegativity or atomic-size difference. This corresponds to −5.0kJ∕mol ≥ ΔH mix ≥ −40.0kJ∕mol , thus verifying that intermetallic compounds are predominantly formed when the mixing enthalpy has a larger negative value. The green colored spots , at the positive values of mixing enthalpy are predominantly solid solution phases. The blue colored regions characterized by negative ΔH mix values (features of IM phase) but lower electronegativity difference (feature for SS phase) characterize the MPEAs that are mixtures of both IM and SS phases. In the plot of and ΔH mix , the sparseley scattered green colored points with large values ( ≥ 15%) are not the SS phase but the amorphous phases (e.g. bulk metallic glasses). As shown in Fig. 2b, principal component analysis (PCA) is utilized to visualize the scatter plot of the three generic phases -SS (red color), IM (green color) and AM (blue color). Having achieved the dimensionality reduction [31] from many features to only two principal components (principal component 1 and principal component 2) Much of the core dataset has been prepared with reference to the room temperature ( T room ). However, owing to the variations in experimental techniques utilized to determine phases, the temperature (T) correpsonding to phase feature of MPEA is of a value defined as T room ≤ T ≤ 0.9T m . Thus the uppper limit of the temperature value for phase data is a value lower than the melting point ( T m ) of the MPEA. Unlike temperature dependent mechanical properties , phase has a fixed value for a range of temperature , and the phase information at elevated temperatures are validated with the results presented corresponding to lower temperatures.

Data-driven Model with GUI Interface for Phase Prediction
The seven numerical descriptors (atomic size difference, electronegativity, VEC, Entropy of mixing, Enthalpy of mixing, Ω and composition) constitute the 40 input features or the variables for the neural network based machine learning model schematically shown in Fig. 3a. Each numerical fingerprint at the input except compostion are represented by a single feature, whereas the descriptor composition is represented by 34 features corresponding to the total number of elements that can be considered in the design of the MPEA.
The elements that are designated as variables or features in the present study are Ag, Al, Au, B, Be, Ca, Ce, Co, Cr, Cu, Fe, Gd, Hf, La, Li, Mg, Mn, Mo, Nb, Nd, Ni, Pd, Sc, Si, Sn, Sr, Ta, Ti, V, W, Y, Yb, Zn and Zr . The numerical descriptors represent different physical quantities , and so the data is charaterized by the non-uniformity with respect to the range of feature values. As presented in Table 1, the feature Ω ranges between 0.32 (minimum value) and 1423.28 (maximum value), and has the largest difference between the extreme values. On the other hand, the atomic-size difference (minimum value = 0.0027; maximum value=0.21) has the lowest range of values. If the input features are supplied to the neural network without scaling, then it is certain that the Ω parameter will largely dominate the result of the machine learning model; whereas the change in parametric values of will have no influence on the phase prediction.
In the similar manner, the parameters ΔH mix , ΔS mix and VEC would have notable impact on the result whereas Δ and features for atomic fraction of the alloying element would have least influence. In order to enable the machine learning model be insular of relative range of scattering of different input features in the datasets [32] , scaling and normalization are employed in this work . For the 34 elemental compositions of MPEAs, scaler standardization was done with mean set to 0 and standard deviation to 1. Where as for the 6 alloy properties, due to presence of few outliers in the dataset which dragged the whole model to extremities during the calculation of mole and biases, robust standardization was implemented where the normalization was done on the basis of median and quartile deviation of the dataset reducing any effects of extreme numerical values of the alloy properties. With this mathematical transformation, all the features of the dataset are now numerically within the range 0-1. The standardization and normalization procedures thus enable the atomic-size difference and Ω to now be relatively in a comparable level in terms of their influence or dominance over the result of the neural network analysis. The neural network of Fig. 3a consists of an input layer, 3 hidden layers (n = 3) and an output layer. The input layer has 40 neurons (blue colored circles) corresponding to each input feature. All of the three hidden layers are identical to each other in terms of their neuron size (green colored points). Each hidden layer (HL) in this study has 10 neurons. The output layer has 7 neurons (yellow colored circles) and each of these output neuron corresponds to the numerical encoding of a phase name.
The train-validation-test proportion was maintained as 8:1:1. 80% of the dataset was allocated as training dataset whereas the reamining 20% was divided equally to validation and test datasets. A small batch size of 5 was selected for model run as increasing the batch size resulted in reduced accuracy of model. Reducing the batch size to a smaller value increased the computation time with no any significant increase in accuracy.
The fact that two or more phases can co-exist in a multiprincipal element alloys (MPEAs) makes it unsuitable to be designed as a multi-class classification problem. The possiblity of the co-existence of multiple phases in a MPEA requires the problem to be defined as a multi-label classification problem. In context of multi-label classification, an alloy can belong to more than one class or output feature. A mathematical condition wherein the outputs are not mutually exclusive, can be best represented by assigning sigmoid function as an activation function of the output layer. The sigmoid function squashes the pre-activated output values in the range (0,1) but it treats them independently such that there can be high probabilities in all, some or none of the classes. The sum of the probabilities of the activated output classes does not have to be necessarily 1. For a multi-label classification problem using sigmoid function at the output, binary cross entropy (BCE) function is selected as the cost function for the neural network. By running 40 different models each with a different set of HL activation function, learning rate and optimizers were compiled and run using tensorflow software [33] . Inorder to avoid any over-fitting of the model during the training process, a built-in Keras function called Callbacks [34] was used to early stop the training of the model if validation loss starts to increase during the model fitting 1 3 process which was checked at every epoch. Among these 40 models, the model defined with Stochastic Gradient Descent(SGD) optimizer, designated a learning rate of 2.0E-03 and using Leaky ReLU activation functions at the 3 hidden layers showed the most preferrable accuracy metric for both training and validation data, and the plot of accuracy curves for this model are presented in Fig. 3b. At the end of 292 epochs, the model attains the accuracies of 91.97% and 93.05% for training and validation data respectively. This model is henceforth selected for performing the prediction task.
Most of the earlier works on machine learning based phase predictions of HEA/MPEA [19,35,36] do not consider the composition vector of alloy as a fingerprint in the model. Neither do these models associate the composition vector with other fingerprints. In such a scenario, the user has to know in advance the numerical information of all the fingerprints of a MPEA to be supplied as input. This limits the ease of use and flexibility associated with the prediction task. The prediction model of the current study is housed with a graphical user interface (GUI), named as python based Multi-principal Elements Alloy Laboratory (pyMPEALab) toolkit, which is capable of mapping the composition vector with the remaining 6 numerical fingerprints. Thus all of the input features of the neural network model in this work are computed automatically by the toolkit As pyMPEALab is built on top of Tensorflow and Keras, and uses the python packages such as tkinter, pandas, numpy , scikit-learn and pickle, the script for this software toolkit is characterized by lucidity and possesses the flexibility for including additional capabilities.

Application-oriented Materials Design Assisted by pyMPEALab Toolkit
In silico design of engineering materials using data-driven approaches is considered as one of the major pillars that can help in realization of the goals of Industry 4.0. The neural network model described in Sect. 3 is the backbone of pyMPEALab toolkit. Considering the in-built ability of the GUI interface to map the data of alloy compositions with the properties of the alloy, the phase prediction procedure in the toolkit has been made possible without having the user to know the physico-chemical properties of the alloys. The rendering of simplicity of the toolkit without the compromise in accuracy, can be considered as the major impetus for guaranteering the accelerated design of high entropy alloys. Developing a robust phase diagram for a given MPEA system has always been a challenge for materials scientists and engineering. Further improvements and adaptations of the software features of pyMPEALab can surely help in reducing some of the challenges. This topic is the scope of future study.
The correct prediction of phase at different composition of MPEA is very useful in designing materials for engineering applications. In this section, we employ pyMPEALab toolkit to predict phases at different composition of two MPEAs -(I) Al-doped Cantor alloys, and (II) Nb-doped CoCrNi alloys, and subsequently play the vital role assisting or accelerating their design for engineering applications.

Application I: Facilitating the Phase Design of Al-doped Cantor Alloy
The FCC single phase alloy CrCoFeMnNi (having the five elements in equiatomic proportion) is known as Cantor alloy [7,37]. Recently, there have been ongoing interests in studying about the effects of doping of Al on the mechanical properties of Cantor alloy . It is well known that the size of Al is 12% larger than the elements of the Cantor alloy, and the addition of Al into the Cantor alloy at small proportion (atomic fraction in the range 0.0025-0.005) leads to the notable lattice distortion in the resulting material [38]. The large distortion volume of Al atoms subsequently prevents the dislocation motion , and thus ensures the solid solution strengthening mechanism in the FCC structure. However, when Al is doped at a larger atomic fraction, the crystal stucture of the resulting alloy may change. The phase structure of Al x CrCoFeMnNi alloy may exist as either single FCC or dual FCC + BCC, or single BCC state depending upon the value of atomic ratio (x) . Since the mechanical behavior of the alloy can be correlated with phase structure, it is very important to outline the phase boundaries of this alloy. pyMPEALab GUI interface is utilized in this In single FCC solid solution region, Al x CrCoFeMnNi alloy is characterized by low strength and high ductility; it behaves like a composite (high strength, low ductility) in the mixed two phase FCC+BCC region, and the alloy is extremely brittle in single BCC region [39]. With these extremities in the mechanical behavior of the alloy characterized by the phase structures , and again as these crystal structures are stable for a given range of atomic fraction of dopant Al element, it is therefore essential to have a datadriven toolkit in hand that can instantaneously predict the phase for a given point value of x. The merit of using pyM-PEALab for designing alloys (including Al-doped) Cantor alloy is that it can express the crystal phase structure(s) of alloy as a piecewise continuous function of the composition variable. Though the predicted data points are shown only for discrete values in Fig. 4, it is possible to use the pyM-PEALab software to construct a piecewise continuous line y(x) connecting these points, such that y(x) represents single FCC phase for 0 ≤ x ≤ 0.3, FCC + BCC duplex phase for 0.3 < x < 0.74 , and single BCC phase for x ≥ 0.74 . With this capability, the phase prediction of Al x CrCoFeMnNi alloy can be done for all practical values of x.
Though the software shows a correct trend in the phase prediction for Al-doped Cantor alloy, it needs improvement regarding the precise determination of phase boundaries. The phase boundaries predicted by the current model deviates from the experimental phase boundaries, and thus requires further improvement in future. Future work in the design of Al-doped Cantor alloy will be centered on finding out the temperature effects on the shift of phase boundaries. The current model for pyMPEALab is based upon a single temperature value (room temperature) of the input features , and ignores the temperature effects.

Application II: On the Elastic Behavior of Nb-doped CoCrNi Alloy
The ternary equiatomic CoCrNi medium entropy alloy (MEA) is stronger and tougher than the quinary equiatomic CrCoFeMnNi HEA [41]. There have been ongoing studies involving mechanical properties enhancement of this MEA via addition of dopant characterized with lattice, atomic radius and elastic modulus mismatch with Co, Cr and Ni [42][43][44][45]. Of particular interest is the association of the change in elastic behavior with the relevant crystal structure in D x CoCrNi alloy , with D being the dopant element with atomic composition of x. We employ pyMPEALab software to predict the phase in Nb-doped CoCrNi (i.e. CoCrNiNb x ) alloy, and then study the elastic response (strain) of this doped alloy as compared to the CoCrNi alloy. The ternary equiatomic CoCrNi MPEA has a single FCC crystal structure [41] and our machine learning based model does the truthful prediction as illustrated in Fig. 5 (a). In the same figure it is shown that when the atomic fraction of Nb element in CoCrNiNb x is 0.1428 (corresponding to x = 0.5 ) , the software predicts the alloy to exist as intermetallic (IM) phase. The experimental finding by Lu et al [46] confirms that CoCrNiNb 0.5 is found to posses Laves phase structure. When there are no existing phase diagrams that apply universally to predict the phases of MPEAs, these predictions from our model can serve as a reliable tool for thermodynamic phase design of MEAs and HEAs. In order to understand the elastic behavior of CoCrNi (FCC phase) and CoCrNiNb 0.5 (IM phase) under constant external load, finite element analysis is considered as an appropriate method. The geometry / mesh of a tensile test specimen as shown in Fig. 5(b) is identical for both materials (CoCrNi and CoCrNiNb 0.5 ). A constant load of 200 N is applied at the bottom boundary of the model, and the top boundary is designated as fixed boundary (with zero displacement). The loading and boundary condition are identical with both models. The partial differential equation describing the deformation behavior of the tensile test specimen under linear elasticity is solved for the two materials using finite element method [47] . The poisson ratio is set as 0.3 for both the materials. The Young's modulus of elasticity (E) of CoCrNi alloy is assigned a value of 228.64 GPa [48]. The relationship between yield strength ( Y ), hardness (HV) and E provided in [49], is utilized to compute the value of E. For CoCrNiNb 0.5 , the experimental data of Y and HV are obtained from Lu et al [46], and the computed value of E is 627.43 GPa. The results of the numerical simulation are provided in Fig. 5c. From the figure, the principal strain values along the X direction (axial length of the specimen) is very negligible in the model with material properties of CoCrNiNb 0.5 alloy, whereas these values are large for CoCrNi alloy. Since the Nb-doped alloy is an intermetallic phase, it is brittle and is characterized by very less ductility; and thus the phase prediction by pyMPEALab software helps in easy understanding of these phenomena.

Conclusion
Data-driven approaches as the fourth paradigm of materials design can largely enable the study of the multi-dimensional characteristics associated with compositionally complex alloys. The scope of the present study is to design a machine learning based toolkit for prediction of phases in MPEAs based upon the information of composition and properties of the constituent elements. The following conclusions was derived from the present work: 1 Multi-principal element alloys (MPEA) occuring at and near the center of multicomponent phase space are represented by numerical fingerprints , namely, composition, atomic size difference ( ) , electronegativity ( Δ ), enthalpy of mixing ( ΔH mix ), entropy of mixing ( ΔS mix ), dimensionless Ω parameter, valence electron concentration (VEC) and phase types. A dataset is prepared for 1229 observations of MPEA databy featurizing these numerical descriptors. 2 With composition (described by 34 features), , Δ , ΔH mix , ΔS mix , Ω and VEC as input variables and phase types (SS, IM or AM) as output feature, an artificial neural network is built upon the dataset. The problem is defined as a multi-label classification problem, and the model is able to classify the alloys as a single phase or possible combinations of more than one phase. In context of SS phase, the model can further distinguish the BCC and FCC crystal structures from uncategorized SS phase. 3 In order to accelerate the phase design criteria in MPEA, an user-friendly pyMPEALab GUI interface is used on top of the neural network model. The GUI toolkit takes in the information on the composition of the alloy , automatically maps the composition features with reamining other descriptors; and utilizes these all fingerprints to classify the crystal structure or phase structure of the MPEA. Requiring only the input information of atomic composition of a MPEA from the user for the phase prediction, this software interface largely eases the implementation of the machine learning model. A user with no prior knowledge of thermodynamics of MPEAs can easily predict the phase structure by using the pyM-PEALab GUI interface. 4 The machine learning model housed in the GUI interface tool, is employed to predict phase(s) in Al-doped Cantor alloy, and Nb-doped CoCrNI alloy. Then through engineering applications, it is explained how the phase prediction task can actually help in understanding the mechanical behaviors of these MPEAs.

Appendix 1 : Mathematical Details for Reduction of CoCuFeNi MPEA into its Numerical Fingerprints
CoCuFeNi multi-principal element alloy is constituted by Co, Cu, Fe and Ni elements in equimolar proportion, and the basic physical and chemical properties of the atoms of these individual elements is provided in Table 2. The molar fraction ( x m,i ) of each element in the alloy is also provided in the same  With the information of i values of individual constituent elements along with the composition already known, the corresponding numerical descriptor of the alloy known as electronegativity difference(Δ ) of CoCuFeNi can be statistically expressed as following: The weighted mean electronegativity ̄ in Eq. 2 utilizes composition as weight , and is defined as ̄= . The numerical value of ̄ for the alloy is deduced as following: Finally, the electronegativity difference of the CoCuFeNi alloy is computed as following: (iv) Enthalpy of Mixing For an MPEA with 4 constituent elements, the enthalpy of mixing is given by the following formula: In addition to the information on the molar composition , the deduction of the numerical value of ΔH mix will require knowledge about the quantity w ij . Mathematically, this quantity is four times the enthalpy of mixing of binary liquid alloy constituted by element i and j, i.e. w ij = 4ΔH mix ij . The values of mixing enthalpy of binary alloys formed by the constituent elements of CoCuFeNi are calculated with the theoretical guidance provided in [22][23][24][25], and the estimated values of ΔH mix ij corresponding to the alloy are presented in Table 3. Now, the corresponding numerical value of mixing enthalpy of the MPEA can be derived as following: 3   (v) Entropy of Mixing The mathematical expression for ΔS mix of CoCuFeNi alloy is provided as given below: where, R = 8.314 J/(mol K). The right hand side of Eq. 4 shows that the information on the composition of the alloy is sufficient to compute the mixing enthalpy. CoCuFeNi MPEA has 4 elements in equiatomic proportion and Eq. 4 can be simplified as following: T h u s , t h e n u m e r i c a l v a l u e o f m i x i n g enthalpy for this alloy can be determinted as ΔS mix = 8.314 × ln(4) = 11.526J∕(molK).

(vi) Ω Parameter
This parameter not only compares the relative magnitudes of mixing entropy to mixing enthapy but also incorporates the equivalent melting temperature ( T m ) of CoCuFeNi alloy. The definition of dimensionless Ω parameter is given by the following equation: With the values of melting temperature of individual elements obtained fro m Table 2, the (composition-weighted) equivalent melting temperature of the MPEA becomes: Mathematically, the equivalent VEC of quaternary MPEA is computed statistically as the composition weighted sum of the VEC i of the individual elements.
In context of CoCuFeNi, the numerical value of VEC becomes: 0.25 × 10 = 9.5 (viii) Phase Information The stable phase for CoCuFeNi MPEA has been established as FCC [30]. It is essential to encode the the phase name FCC into a numerical value that is operatable by the machine learning model. In order to account for the multi-label classification problem, phase or a mixture of phases have been mathematically defined as a vector with 7 elements. Numerically, each element of the vector can be either 1 (presence of a particular phase) or 0 (absence of the phase). The first 3 elements of the phase vector correspond to amorphous , intermetallic and (undefined) solid solution phase in sequence. The 4 t h and 6 th elements of this vector are allocated for BCC phase (BCC1 and BCC2 respectively). Finally, the 5 t h and 7 th elements of the phase vector correspond to FCC phase (FCC1 and FCC2 respectively). As CoCuFeNi MPEA exists as FCC phase (FCC1 phase specifically), the numerical fingerprint describing this alloy corresponding to the phase information is (0000100).

Data Availibility Statement
The codes required to build a minimal working example (MWE) of pyMPEALab GUI toolkit are made available at https:// github. com/ subed iupad esh/ pyMPE ALab. The additional data used beyond this MWE to support the findings of this study are available from the corresponding author upon reasonable request.