1 Introduction

The steel industry is undergoing a profound transformation thanks to the progress of Digitalization[1], which is becoming pervasive in any stage of the production chain and is now fully recognized as a major enabler for preserving the competitiveness of the sector and improving its socio-economic and environmental sustainability. Artificial intelligence (AI) and machine learning (ML) are indeed at the core of the ongoing and foreseen progresses of process automation, which is a natural field of development under the leadership of the major equipment providers, which are usually keen to introduce innovation in order to improve the efficiency and reliability of their product.

On the other hand, one very important and quite recent consequence of the digitalization is the increasing attention and importance attributed to the so-called data-driven materials science for new high-added value materials design. In effects, ML techniques, and in particular neural networks (NNs), are now increasingly applied in the design of novel steel qualities with target properties, as they appear capable to overcome the lack of efficiency characterizing traditional experimental science in capturing the complex combination of processing conditions and chemical compositions. Since the end of the last century, Badeshia [2] highlighted that NNs could highly support material science, and in the first decade of this century Sha and Edwards [3] further enforced such thesis with their thorough analysis. However, it is only in the last decade that the evolution of the measuring and analytical systems, on one hand, and the availability of large and low cost computational resources, on the other hand, allow material scientists to get the maximum profit of ML tools and techniques, such as highlighted in the analysis recently conducted by Smith for NNs [4]. Nowadays numerous applications can be found concerning NNs applications for forecasting the mechanical properties of steel products based on chemical composition and manufacturing process parameters, such as described, for instance, in the exemplar works of Khalaj et al. [5], Faizabadi et al.[6] and Liu et al. [7].

However, most of current applications of ML and NNs to steel design are limited to the direct analysis, which starts from a given chemistry and/or microstructure and process conditions in order to forecast material properties and check whether they are compliant with the customer (i.e. steel user) requirements for the considered application. On the other hand, an inverse approach starting from the desired property and finding the steel chemical composition(s) and/or the process conditions, which allow meeting the target is highly more relevant from the industrial point of view, as it provides direct indications on how to realize the demanded product. An example of such approach can be found in the recent study of Wang et al. [8], which is focused on the elaboration of a suitable microstructure that can convey a balanced property of tensile strength and total elongation. Another example is provided in the work of Shen et al. [9], who exploit extreme learning machine for the optimization of the composition of a particular steel grade suitable to the realization of experimental nuclear reactors. A further relevant application has been recently proposed by Zhu et al. [10], who combine deep NNs-based tensile properties prediction and genetic algorithm (GA)-based optimization in order to design the chemical composition of low-alloy steels.

This approach can also be extended in order to provide guidelines to the technical personnel of the steelworks on how to manage the different stages of the production processes in order to obtain a material, which is perfectly compliant with the specifications. An example in this sense is provided by the work of Colla et al. [11], who applied an ensemble of NNs and an ad-hoc developed iterative procedure for identifying the variability ranges of the most relevant process variables of the hot dip galvanizing process that ensure uniformity of the tensile properties along steel strips for automotive applications.

The present work is in line with the above-mentioned approach: the considered property is hardenability, one of the most important mechanical features, which is extremely relevant for a wide range of steel applications and refers to the steel capability to improve its hardness following a heat treatment. The purpose is to find the most suitable steel chemistry ensuring the achievement of a target hardenability together with the compliancy with respect to a series of constraints. This issue is faced as an optimization problem and the definition of “suitability” of the steel chemistry is part of the optimization process in the form of the definition of the objective function of the optimization. The proposed system, which is called JoMiner, exploits a NNs-based predictor of hardenability at the core of the optimization as a means to associate one particular steel chemistry to the microalloyed steel hardenability. The main elements of novelty of this system lies in its flexibility, relative simplicity of use and maintenance and, mostly, in the fact that, differently form previously developed work, it allows taking into account the different strategies that the technical personnel of the steelworks usually follows in the steelmaking phase, in accordance with customers’ specifications and overall production targets. A trade-off is always searched between joint achievement of a target accuracy for specific hardenability values and of limited production costs, e.g. by reducing the use of the microalloying elements, which are more expensive or which can affect the reuse and recycling of the slag, a by-product of the steel process, by therefore reducing the profit margins. JoMiner allows implementing such philosophy by a suitable design of the objective function to minimize and this feature ensures an easier acceptance by technicians, who might not be familiar with AI and ML-based tools and techniques.

The paper is organized as follows: Sect. 2 provides some background on the considered industrial application, by also revising the state of the art on the models forecasting the steel hardenability curve and the problem of optimizing the steel chemistry in order to obtain a desired hardenability. Section 3 presents the details of the JoMiner approach. Section 4 describes and discusses the numerical results of the developed tests, while Sect. 5 proposes some concluding remarks and hints for future work.

2 Background on the faced industrial application

2.1 Hardenability and Jominy end-quench test

Hardness is a very important property of structural metallic materials and alloys, such as steel, as it affects most of their applications. Hardness is defined as the resistance that the surface of a material opposes to localized plastic deformation (e.g., a small dent or a scratch) [12] and depends on the material microstructure. The microstructure can be modified by altering the chemical composition of the material (e.g. in steelmaking by adding micro-alloying elements to the steel chemistry) and/or by means of a particular heat treatment named quenching, which basically consists of a high temperature heating followed by an abrupt cooling.

Hardenability is defined as the ability of a material to change its hardness as a result of a given heat treatment. In the steelmaking field, hardenability depends on the steel capability to form a particular microstructure, named martensite, during the heat treatment. In this treatment, often named quenching, the steel is heated up to a specific temperature, named Austenitizing temperature (A3, lying in the range 800–925 °C), as at this temperature the structure becomes austenite. Afterwards, it is cooled down at a cooling speed that prevents high temperature transformations and favors the formation of martensite by quenching in a cooling agent, e.g. (in order of decreasing efficiency) air, oil, water, brine or molten salts.

One standard procedure, which is widely applied to determine hardenability, is the Jominy end-quench test [13]. According to such procedure, all factors affecting the depth to which a material hardens (i.e., specimen size and shape, quenching treatment), apart from the chemical composition, are kept constant. The specimen has a cylindrical shape, with fixed diameter D (usually D = 25.4 mm = 1.0 in.) and length L = 100 mm = 4  in. Such specimen is austenitized at a prescribed temperature for a fixed time, afterwards it is removed from the furnace and quickly mounted on a fixture and its lower end is quenched by a jet of water of specified flow rate and temperature (see Fig. 1). Therefore, the maximum cooling rate is observed in the area closer to the quenched end and decreases with the distance from the quenched end along the specimen length. After the specimen has cooled to room temperature, shallow flats deep are ground along the specimen length and N measurements (usually N = 15) of Rockwell or Vickers hardness measurements [14] are taken at predefined positions for the first 50 mm along each flat (see Fig. 1).

Fig. 1
figure 1

Schematic description of the Jominy end-quench test

The so-called Jominy profile is the diagram obtained by plotting the measured hardness values as a function of the distance from the quenched end: it is a monotonic decreasing curve, as the quenched end is cooled most rapidly and exhibits the maximum hardness (for most steels 100% martensite is observed at the quenched end). As the cooling rate decreases with the distance from the quenched end, the hardness also decreases, as lower values of the cooling rate allow more time for carbon diffusion and the formation of a greater proportion of softer microstructures (i.e. pearlite possibly mixed with martensite and bainite). A highly hardenable steel shows large hardness values for relatively high values of the distance from the quenched end.

Each steel alloy is characterized by its own unique hardenability curve. The main alloying elements, which affect hardenability include carbon (C), chromium (Cr), manganese (Mn), molybdenum (Mo), silicon (Si), nickel (Ni) and boron (B). Carbon strongly affects the hardness of the martensite, therefore steel hardenability generally increases with the C content, as C delays the generation of pearlite and ferrite and such delay stimulates the formation of martensite at slower cooling rates. However, the effect is not significant enough to be used as hardenability controller, and other elements are commonly used to control the hardenability. Cr, Mo, Mn, Si, Ni and V (especially Cr, Mo and Mn) retard the transformation from austenite to ferrite and pearlite. The delay is due to the need to distribute the alloying elements during the transformation from austenite to ferrite and pearlite. Moreover, the different elements show complex interactions among each other affecting the temperature during the transformation phase. Finally, B is a very powerful alloying element and its effect increases for low C content, thus, it is commonly used for low C steels, but it affects hardenability only if it is in solution.

Steels with high hardenability are used to create high-strength components and are usually more valuable with respect to steels with low hardenability, which can be used for small components. The Jominy profile is used for characterize each steel and, depending on the aimed application, specific requirements are imposed by the customers to the steel producers, in the form of upper and lower bound for the hardness value corresponding to specific distance values. The most commonly found constraints refer to the first 2–3 points of the Jominy curve and on the position of the inflection point of the profile itself.

2.2 Data-driven approaches to forecast steel hardenability

As the Jominy End Quench Test is very expensive and time consuming, considerable research efforts have been and are still being spent to develop models forecasting steel hardenability from the chemical composition. The first attempts were based on traditional statistical techniques: for instance, in the 40ies Grossmann proposed an approach based on multiplicators [15]; this approach was subsequently improved and generalized in the seventies by Grange [16], Brown and James [17], Kunze and Russle [18] and Doane [19]. This last researcher proposed several empirical and statistical methods to predict hardenability by analyzing their advantages and limitations.

Regression analysis also provided good results for the prediction of hardenability [20, 21]. In particular, Komenda et al. in [21] investigated the variables that mostly affect each hardness value of the Jominy profile on a limited range of Boron steels, by demonstrating that C, Si, Mn, P and Cr are important variables at each point of the profile, while Ni, Cd, B and N mainly affect the hardness of the sixth one. More recently, Gong et al. applied an approach based on nonlinear regression for predicting the Jominy profile of gear steels [22].

Other approaches have been proposed, which rely on numerical studies concerning the cooling graphic trend and considering its thermodynamics [23,24,25].

In [26] a parametric mathematical model was proposed, where each parameters was linked to the steel chemical composition by non-linear equations. However, this method failed when treating multi-alloyed medium-C steels, as it neglects the interaction among the different alloying elements. The same approach was improved in [27], where the interaction of the alloying elements is considered through some empirically tuned interaction parameters, by improving the forecasting performance on some Jominy profiles of multi-alloyed medium-C steels. However, the validity is proved on a limited range of steel grade and the empirical tuning procedure disregards any chemical or physical consideration.

A different approach from Yazdi et al. [28] involves the quench factor analysis (QFA), a proven technique introduced in the early seventies and improved in 1993 by Rometsch et al. [29], which correlates the cooling curves to metallurgical response. QFA is applied to estimate hardness from simulated cooling curves, by providing a good correlation between predicted and measured hardness. However, forecasting accuracy is acceptable only for high hardness values.

To sum up, most numerical models forecasting the Jominy profile of steels provide good results only on very limited ranges of steel grades, on which their internal parameters were tuned, and do not show good generalization properties when applied outside those ranges, as the relationship linking such parameters with the steel chemical composition are mostly empirical and difficult to extend. Moreover, often the accuracy is acceptable only on a few points of the Jominy curve. This is mainly due to the fact that the effect of each alloy element is individually analyzed, while interactions are neglected.

In order to overcome the above-mentioned drawbacks, since the nineties NNs have been explored as a tool to forecast the Jominy profile of based on the steel chemical composition. In the seminal works of Chan et al. [30] and Vermeulen et al.[31] NNs of the multi-layer perceptron (MLP) type have been applied for the pointwise estimation of Jominy curve based on the chemical composition. Further similar works have been developed by Dobrazanski and Sitek on different constructional steels [32, 33] and, more recently, by Knap et al. on special microalloyed steel grades [34, 35]. On the other hand, Pouraliakbar et al. developed a study related to a particular class of pipeline steels by using as input of the NN both some elements of the chemical composition but also other mechanical properties, i.e. yield strength, ultimate tensile strength and percent elongation, which are measured through other specific although standard tests [36].

However, all the approaches, which forecast each hardness value of the Jominy profile, neglect the correlations among values, which correspond to neighboring distance values. In order to overcome such issue, Colla et al. in [37] proposed a parametric approach, where the Jominy profile is represented through a parametric mathematical function of the distance from the quenched end (e.g. a quasi-sigmoidal monotonic decreasing function) and wavelet NNs are applied to correlate the steel chemistry to the function parameters. Quite recently also fuzzy systems have been applied for the determination of the Jominy hardenability curve based on the chemical composition, although the investigation was limited to structural steels for quenching and tempering [38].

2.3 The problem of optimizing the steel chemistry for obtaining the desired Jominy profile

The availability of a reliable system to forecast the Jominy profile of various categories of steel leads also to the possibility of facing a further issue, which is harder but far more relevant from the industrial point of view, namely the identification of a suitable steel chemistry, which allows obtaining a target Jominy profile.

A tool for the rapid determination of the steel chemical composition (mostly in terms of micro-alloying elements which can be added in the steelmaking stage) leading to the achievement of a target Jominy profile, allows technical operators and managers drastically reducing the time required for the steel grade design phase and for the actual tests on a number of tentative product specimens, by also mitigating the laboratory work burden. Moreover, relevant economic advantages can be achieved in terms of reduced testing costs and waste of material and energy, as a consequence of a better and faster matching of customers’ specifications.

Last but not least, from the metallurgical point of view the possibility arises that different steel chemistries lead to the same or very similar Jominy profiles. Therefore, the formulation of an optimization problem concerning the identification of the steel chemistry leading to a target Jominy profile does not necessary refer only to the minimization of some performance error among the actual and the desired Jominy curve. It can also include the minimization of the most expensive micro-alloying elements, while keeping the achieved Jominy profile in reasonable agreement with the target one, according to constraints, which might be specific of a certain customer or application. If available, such tool can lead to relevant savings in terms of production costs due to a lower usage of costly micro-allowing elements and even easier re-use of slags, which are a by-product of the steelmaking stage, whose reuse and valorization is often hampered by the unavoidable presence of a portion of the added micro-alloying elements [39].

Several studies can be found, which focus on the optimization of the chemical composition of steel grades for achieving target values of mechanical properties, among which hardenability, such as, for instance, in [40, 41]. However, most of such studies require complex, costly and cumbersome physical simulation eventually based on finite elements modelling, which cannot be included in a tool dedicated to the steelworks technical personnel.

Trzaska et al. firstly conceived the idea of using a NN-based Jominy profile predictor in order to optimize the steel chemistry so as to reach a desired hardenability [42]. In particular, they used a previously designed NN-based model to build a representative set of data and to work out the neural classifier for the selection of steel grade with the required hardenability. However, no studies can be found so far, which face the problem of achieving a target hardenability profile as a formalised optimization problem by exploiting a reliable and computationally sustainable data-driven Jominy profile predictor.

The present paper fills this gap by exploiting a NN-based Jominy profile predictor presented by the authors in [43] and recently used in [44]. This predictor proved to be very reliable, robust and fast and can be easily maintained by an end user, as it only needs to be trained with experimental Jominy profiles and associated steel chemical composition available in any steelworks. Such model exploits and merges the results of several previous studies, as it provides a pointwise prediction of the Jominy profile by a set of quite simple NNs, each one dedicated to a point of the Jominy curve, and with a peculiar set of inputs, by thus considering the effect of the various micro-alloying element of the different points of the profile. On the other hand, the model takes into account the correlation among the hardness values measured at different distances from the quenched end by means of a hierarchical structure, where some previously calculated hardness values are used to estimate other ones. Finally, the limited dimension of the networks make the computational burden very limited in the training phase and negligible in the relaxation one, so that the time required to compute a whole curve is suitable for an optimization framework based on evolutionary computation. Further details are provided in Sect. 3.1.

3 The JoMiner approach

In this work a novel tool—the so-called JoMiner—that performs the automatic design of steel grades in order to meet as much as possible a desired Jominy profile is introduced.

In line with industrial standards JoMiner considers the user requirements and constraints in terms of steel chemical composition ranges expressed as the minimum and maximum concentration of each chemical element and a target Jominy profile expressed in its standard form, i.e. as a N couples of distance and hardness values (di, Ji). Subsequently the optimization engine explores the search space looking for the most suitable chemical composition that fulfills the user chemical constraints and corresponds to a predicted Jominy profile that is as close as possible to the target profile. The whole process exploits a hierarchical Neural Networks (NN)-based Jominy profile predictor, which is used within an optimization framework that aims at the selection of a suitable chemical composition. In this context each candidate chemical composition is fed to the predictor that provides as output the corresponding estimated Jominy profile. This latter result, together with the associate chemical composition and the target profile, are then used within the optimization loop until an optimal solution—in line with constraints—is found. The interaction among the components forming the JoMiner system is represented in Fig. 2.

Fig. 2
figure 2

Flowchart depicting the JoMiner components interactions

To sum up, the JoMiner system receives as inputs the user’s demands, namely the target Jominy profile to achieve and the constraints on the steel chemical composition, and outputs the steel chemistry, which ensures to achieve the best fitting of the target Jominy profile and is compliant with the provided constraints.

In the following Sects. 3.1 and 3.2 the Jominy profile predictor and the chemical composition optimizer components, respectively, are described in deeper detail.

3.1 The NN-based Jominy profile predictor

The Jominy profile predictor component is the core of JoMiner, since its performance and robustness affect the usability and reliability of the entire tool. A Jominy profile can be represented as a vector \(\text{J}\in {\mathbb{R}}^{\text{N}}\) (here N = 15) and the predictor is formed by a set of N interacting NNs, each one devoted to the prediction of a point forming the Jominy profile measured at the standard distances.

In the light of these features the Jominy profile predictor component sequentially estimates the N values of the Jominy profile by using as input of each NN part of the chemical composition variables and the hardenability values predicted for some of the previous points, such as shown in Fig. 3.

Fig. 3
figure 3

Functional scheme of the Jominy profile predictor for the prediction of an arbitrary point of the profile, where both chemical composition and previously predicted hardenability values are fed as inputs to the NN

The 15 NNs that form the predictor are fed by using different input variables concerning chemical elements concentration and hardenability values corresponding to lower values of the distance from the specimen quenched end (i.e. values of the Jominy profile that are estimated by the previous stages of the system itself). The inputs for each of the networks were selected by coupling the analysis of specific literature studies on the influence of chemical elements on the Jominy profile (see Sect. 2) and an ad-hoc variable selection analysis aiming at highlighting the correlation between the potential input variables and the target hardenability[45]. Table 1 shows the list of the input variables used by each NN by reporting for each point of the profile the selected chemical elements and the predicted hardenability values \(\user2{\hat{J}}\, = \,\left[ {\hat{J}_{{\text{1}}} \ldots \hat{J}_{N} } \right]\) (where \(\hat{J}_{i}\) represents the hardness value estimated at ith distance value di) that are fed to the network.

Table 1 Summary of the input variables used by each of the NNs forming the Jominy profile predictor [44]

As shown in Table 1, M = 15 chemical components, are considered in this application (C, Mn, Si, P, S, Cu, Cr, Ni, Mo, B, Ti, Nb, Sn, Al, V), but the corresponding values are not simultaneously fed into each NN. This aspect is common also to other literature approaches to the estimate of the Jominy profile, including the ones which are not based on machine learning, as it is a consequence of the different effect of each microalloying element on the shape of the Jominy curve.

The adopted NNs are two layers feed-forward NNs of the perceptron type with a variable number of neurons in the hidden layer (such number is indicated with the symbol Nh in Table 1) having a sigmoidal activation function, which are trained by means of a variant of the back-propagation algorithm that employs Bayesian regularization in order to improve the network generalization capabilities and robustness [46]. The number of neurons in the hidden layer Nh of each network was experimentally determined by taking into account a well-known empirical rule, which sets the maximum total number of NN parameters Nw as a function of the number of available training samples Ns, namely Ns ≥ 5Nw. In the case of the simple feed-forward NNs, with one output neuron and Ni input variables, which are adopted in the present case, Nw = (Ni + 2)Nh + 1, by taking into account inter-layers weights and neurons biases. By imposing that Nw ≤ 0.2 Ns, an upper bound Nh_max is calculated for Nh. Then the most suitable Nh value has been determined through an exhaustive search, i.e. by training different NNs holding all the values of Nh in the range [1, Nh_max] and selecting the one, which achieves the best predictive performance on the validation dataset.

The above described profile predictor must be tuned by using data coming from real Jominy tests. Depending on the range of products that are produced by the company and on the number of the available experimental data, a general purpose predictor can be designed, by exploiting data related to different steel types, or a highly customized predictor can be generated targeting specific company needs and products. This is the case, for instance, of steels for automotive applications [44], or Boron steels, for which the parametric NN-based approach was attempted in the past [47].

3.2 The steel chemistry optimizer

The chemistry optimizer is the component of JoMiner appointed to find the chemical composition that leads to an estimated Jominy profile \(\user2{\hat{J}}\) as close as possible to an arbitrary target profile JT. This component exploits the Jominy profile predictor described in Sect. 3.1 and performs the search by minimizing an arbitrary distance function d: \({\mathbb{R}}^{\text{N}}\times {\mathbb{R}}^{\text{N}}\to \mathbb{R}\), which measures the dissimilarity (i.e. the higher d, the lower the similarity) of \(\user2{\hat{J}}\) with respect to the target profile JT, and by fulfilling an arbitrary set of constraints on the variability ranges of the considered chemical elements or their relative ratios. Clearly, the selection of the distance function affects the results. The most straightforward choice is to select the N-dimensional Euclidean distance between the two vectors \(\user2{\hat{J}}\) and JT. A further possibility is represented by the so-called Manhattan or city-block distance, which corresponds to the sum of the absolute differences between the corresponding entries of the two vectors. For a give couple of vectors \(\user2{\hat{J}}\) and JT, the Manhattan distance is greater or equal than the Euclidean one, and gives more importance to small difference in each single entries, therefore it appears more suitable for this application. Moreover, not all the points of the profile might have the same importance, as it will be deepened later on, when discussing the objective function for the optimization problem. As a consequence, the need can arise to weight the differences among the values of some of the entries of \(\user2{\hat{J}}\) and JT, and the determination of the weights is simpler if the Manhattan distance is adopted.

To sum up, the optimization problem can be formalized as finding the chemical composition, which is codified as a vector C* \(\in {\mathbb{R}}^{\text{M}}\) (each entry corresponds to the content of one chemical component in wt.%) that minimizes d(\(\user2{\hat{J}}\), JT) subject to a set of constraints.

The chemical composition optimizer employs genetic algorithms (GAs) to perform the main optimization task. The basic operating principles of GAs can be briefly summarized as follows:

  1. 1.

    An initial population of candidate solutions P0 is created, which is composed by Q vectors Cq (1 ≤ q ≤ Q) containing sets of values of relevant chemical elements, which are compliant with the provided constraints. Such compliancy are a-priori set according to specific criteria, which depend on the considered steel grade and target product.

    Then an iterative loop is started.

  2. 2.

    At the kth iteration the goodness of each solution in the population Pk is evaluated through a so-called fitness function.

  3. 3.

    According to single solutions fitness value, population Pk is evolved into Pk+1 by rewarding fittest solutions:

  4. (a)

    through the selection of survivors from Pk;

  5. (b)

    through the selection of couples of solutions from Pk from which new offspring solutions are generated through the crossover operator;

  6. (c)

    through the mutation of the newly formed Pk+1.

    At the sub-steps (b) and (c) a verification step of the compliancy of the considered chemistry with respect to the constraints could be needed, depending on the complexity of such constraints. As an alternative, compliancy can be embedded in the crossover and mutation operators, so that they always generate compliant off-springs, if the parents are compliant. Therefore, compliancy needs to be ensured only at step 1, namely when generating the initial population.

  7. 4.

    An arbitrary terminal condition (i.e. achievement of a target performance or of a pre-determined maximum number of iterations) is checked. If such condition is achieved, the algorithm stops and returns the fittest individual throughout generations, otherwise jumps back to step 2.

GAs are used in order to efficiently explore the high-dimensional (here the dimension is M = 15) non-convex (due to the presence of arbitrary user-defined constraints) search domain of the problem in relation to error surface determined by the function f, on which no a-priori knowledge is available. GAs are extremely performing on this kind of problems, especially when facing industrial problems, thanks to their ability to suitably merge the exploration (through the mutation operator) of the search space and the exploitation (through the selection of fittest elements for survival and crossover and the exploitation of the crossover operator itself) of the knowledge on the problem gained through generations. In particular, GAs proved to be capable to converge faster than other search methods and to avoid local minima of the objective function.

The GA optimizer engine of JoMiner exploits real codification for each solution Cq and random initialization of the initial population within the space search allowed by the constraints. More in detail, for all candidates Ci in P0 the value of each of its elements, corresponding to chemical elements, is randomly drawn from a Gaussian distribution whose parameters (mean and standard deviation) are determined experimentally according to training data.

The simplest version of the adopted fitness function that evaluates each solution is proportional to the opposite of the Manhattan distance between target and estimated Jominy profile, and is expressed as follows:

$$ f\left( {\user2{C}_{\user2{q}} } \right) = - \frac{1}{N} \cdot d\left( {\user2{\hat{J}}\left( {\user2{C}_{\user2{q}} } \right),\user2{J}^{\user2{T}} } \right) = - \frac{1}{N} \cdot \mathop \sum \limits_{{i = 1}}^{N} \left| {\hat{J}_{i} \left( {\user2{C}_{\user2{q}} } \right) - J_{i}^{T} } \right| $$
(1)

Noticeably the fitness function is customizable, thus it is possible to introduce any sort of weighting function forcing the optimizer to award higher accuracies on some points of the profile, which might be a relevant option for particular products devoted to specific industrial applications. The simplest example is the introduction of weighting factors in the form of a vector \(\text{W}\in {\mathbb{R}}^{\text{N}}\), with N entries 0 ≤ wi ≤ 1 and the adoption of the following modified weighted fitness function:

$$ h\left( {\user2{C}_{\user2{q}} } \right) = - \frac{1}{N} \cdot \mathop \sum \limits_{{i = 1}}^{N} w_{i} \cdot \left| {\widehat{J}_{i} \left( {\user2{C}_{\user2{q}} } \right) - J_{i}^{T} } \right| $$
(2)

Furthermore, it is possible to drive the optimization process to solutions that discourage the use of arbitrary elements taking into account, for instance, their economical cost or environmental impact. In that case the fitness function is modified by adding a penalty factor proportional to the content of the chemical elements whose content shall be minimized:

$$ h\left( {\user2{C}_{\user2{q}} } \right) = - \left( {\frac{1}{N} \cdot \mathop \sum \limits_{{i = 1}}^{N} \left| {\widehat{\vartheta }_{\iota } \left( {\user2{C}_{\user2{q}} } \right) - J_{i}^{T} } \right| + ~\beta \mathop \sum \limits_{{j = 1}}^{M} \gamma _{j} \cdot C_{{qj}} ~} \right) $$
(3)

where M is the number of chemical elements that constitute the candidate solution, γ is a vector whose items represent the cost of associate chemical elements and β is a equalization factor that balances the order of magnitude of the penalty with respect of the Jominy profile discrepancy part of the equation.

The adopted crossover operator generates an offspring solution by randomly choosing with equal probability each entry from the corresponding ones belonging to the parents. This practice is commonly known as uniform crossover.

The adopted mutation operator randomly varies r < M entries of the candidate solution Cq within a specified range [− α%; + α%] with respect to the original value by taking into account the constraints.

The adopted termination condition consists in the achievement of a maximum number of generations NG of the GA, which ensures a deep exploration of the search space and the achievement of the convergence. Moreover, a fuzzy adaptive genetic algorithm (FAGA) is applied, which has been introduced by the authors in the past as an advanced implementation of GA exploiting a fuzzy inference system to govern the GA recombination strategies [48]. FAGA was proven to grant higher speed and improved capability to avoid local minima with respect to standard GA [49].

4 Experimental results

JoMiner was trained and validated by using data gathered by an Italian steelmaking company. The provided dataset holds more than 800 samples that include both the steels chemical composition and the result of the Jominy test on actual specimens. The samples span various steel grades and chemical compositions, as shown in Table 2, which reports the variability range of each chemical element within the dataset.

Table 2 Variability ranges of the chemical elements within the experimental dataset (expressed in wt.%)

The variability of the Jominy profiles in dataset is reported in Table 3 and depicted in Fig. 4, where the maximum, minimum and average hardness values are shown for each value of the distance d from the specimen quenched end.

Table 3 Variability ranges of hardness value (in HRC) for the Jominy tests included in the experimental dataset
Fig. 4
figure 4

Variability of the Jominy profiles in the experimental dataset as a function of the distance from the specimens quenched ends

The dataset was divided into two parts: the first one, Dtr, includes the 70% of the available experimental Jominy profiles and is used for the training of the Jominy profile predictor component. The remaining data represent the test set Dts, which is composed of Dimts = 243 Jominy profiles and is used for testing the optimizer, namely they are used as target Jominy profiles. The corresponding suggested chemical composition is computed through the optimizer, fed as input to the Jominy profile predictor and the similarity among of the estimated profile and the target one is assessed as a measure of the goodness of the optimizer.

In this work the optimizer is firstly used for the base case of the minimization of the discrepancy between the actual and predicted profile (see Eq. 1). This task allows the determination of suitable values of the hyperparameters of the optimizer. Subsequently two additional tasks are pursued: one for optimization of the profile accuracy in arbitrary points of the profile (Eq. 2) and the other including the minimization of the usage of selected chemical elements (Eq. 3).

In all the developed tests, as far as the GA-based optimization engine is concerned, the size of the population is set to Q = 50 and for the termination criterion the value NG = 100 is adopted.

4.1 Base case optimization: minimization of the average error on predicted Jominy profiles

In the base case task, the fitness function for the GA-based optimization is the one reported in Eq. (1). Moreover, 20 variants of the mutation function have been tested, corresponding to two couples of values of the parameters r and α, namely r = 1, 2, 3, 4 and α = 5, 10, 15, 25, 50.

The results obtained by JoMiner are reported for the 20 above-mentioned combinations of the parameters of the mutation operator are reported in Table 4 in terms of mean value ει, maximum value Ei, and standard deviation Si computed over Dts of the absolute difference \(\left|{\widehat{J}}_{i}-{J}_{i}^{T}\right|\) between the computed and target hardness value at distance di, (with 1 ≤ i ≤ N), according to the following classical definitions:

$$ \varepsilon _{i} = \frac{1}{{{\text{Dim}}_{{ts}} }} \cdot \mathop \sum \limits_{{j = 1}}^{{{\text{Dim}}_{{ts}} }} \left| {\hat{J}_{{i,j}} - J_{{i,j}}^{T} } \right| $$
(4)
$$ E_{i} = \mathop {\max }\limits_{{1 \le j \le {\text{dim}}_{{ts}} }} \left| {\hat{J}_{{i,j}} - J_{{i,j}}^{T} } \right| $$
(5)
$$ S_{i} = \sqrt {\frac{{\mathop \sum \nolimits_{{j = 1}}^{{{\text{Dim}}_{{ts}} }} \left( {\left| {\hat{J}_{{i,j}} - J_{{i,j}}^{T} } \right| - \varepsilon _{i} } \right)^{2} }}{{{\text{Dim}}_{{ts}} - 1}}} $$
(6)

where \({J}_{i,j}\) and \({J}_{i,j}^{T}\) are, respectively, the forecasted and target value of the jth Jominy curve in the test set in correspondence to the distance value di.

Table 4 Results of the tests for different values of the parameters related to the mutation function

The last column of Table 4 shows the average values of the above listed indexes over the whole Jominy profile, namely:

$$ \varepsilon = \frac{1}{N} \cdot \mathop \sum \limits_{{i = 1}}^{N} \varepsilon _{i} $$
(7)
$$ E = \frac{1}{N} \cdot \mathop \sum \limits_{{i = 1}}^{N} E_{i} $$
(8)
$$ S = \frac{1}{N} \cdot \mathop \sum \limits_{{i = 1}}^{N} S_{i} $$
(9)

Figure 5a, b shows through two distinct box-plot charts the distribution of the average error ε throughout the performed tests according to the values of the hyper-parameters α and r, respectively. These results, combined to those reported in Table 4, put into evidence the overall good performance of the method and highlight the combination r = 3 and α = 15 as the best performing one.

Fig. 5
figure 5

Box-plots showing the distribution of the e error measure for the different values of the α (a) and r (b) hyperparameters throughout the performed tests

A further useful performance index is the average absolute percent error for each hardness measure, which is defined as:

$$ \theta _{i} = 100 \times \frac{1}{{{\text{Dim}}_{{ts}} }} \cdot \mathop \sum \limits_{{j = 1}}^{{{\text{Dim}}_{{ts}} }} \frac{{\left| {\hat{J}_{{i,j}} - J_{{i,j}}^{T} } \right|}}{{J_{{i,j}}^{T} }}~1 \le i \le N $$
(10)

The performance achieved by JoMiner while using the combination r = 3 and α = 15 is shown in Fig. 6 for all the points of the profile. The performance of the optimizer is satisfactory, keeping the percent error lower than 1% in the first points of the curve and lower than 2.5% in the central part of the profile, where variability is much higher.

Fig. 6
figure 6

Average percent error achieved by JoMiner on test data by using the hyperparameters values α = 15 and r = 3 and the basic fitness function provided in Eq. (1)

The qualitative achievements of JoMiner are shown in Fig. 7, which compares some target Jominy profiles belonging to the test dataset and the corresponding profiles forecasted by JoMiner by using as input of the predictor the computed optimal chemical composition and the selected hyper-parameters. As it is clear from this figure, the results are very satisfactory and the forecasted curve is really close to the target one.

Fig. 7
figure 7

Exemplar comparison between some target Jominy profiles belonging to Dts and the corresponding profiles forecasted by JoMiner by using as input of the predictor the computed optimal chemical composition and the selected hyper-parameters values

The outcome of JoMiner can be analysed also in terms of similarity of the optimal chemical composition with respect to the real chemical composition corresponding to the target profile. In other words, for each relevant chemical element El, we can evaluate the average absolute difference ε[El] between the value of the content of this element in the computed optimal composition [El]opt and corresponding real content value [El]T in the experimental chemistry of the target Jominy profile, as follows:

$$ \varepsilon _{{\left[ {El} \right]}} = \frac{1}{{{\text{Dim}}_{{ts}} }} \cdot \mathop \sum \limits_{{j = 1}}^{{{\text{Dim}}_{{ts}} }} \left| {\left[ {El} \right]_{{{\text{opt}}}} - \left[ {El} \right]^{T} } \right| $$
(11)

Table 5 reports the values of the ε[El] index computed for all the 15 chemical elements which are fed as inputs to the Jominy profile predictor for the 4 above-mentioned combinations of the parameters of the mutation operator.

Table 5 Results of the tests in terms of ε[El] for different values of the parameters related to the mutation function assessed in terms of error on the values of the chemical components expressed in wt.%

It can also be useful to assess, for each chemical element, the average relative error ρ[El] with respect to the maximum value [El]max of the contents of the same element (as reported in the last row of Table 2), which is defined as:

$$ \rho _{{\left[ {El} \right]}} = \frac{{\varepsilon _{{\left[ {El} \right]}} }}{{\left[ {El} \right]_{{{\text{max}}}} }} $$
(12)

Please note that the index in Eq. (12) differs from the relative error on the hardness values, which is provided in Eq. (10), as here the error between actual and target content value is divided by the maximum value of the content. The reason for this choice is that, provided that the Jominy curve is similar to the target one, the similarity of the chemistry is not particularly relevant, but it is important not to provide steel chemistry which include very high contents of the micro-allowing elements. Under this perspective (which is deepened in Sect. 4.3), it is much more important to compare the error on the suggested chemical content of each chemical element to its maximum value, as derived from the available dataset.

Figure 8 graphically shows the values of the ρ[El] index for all the relevant chemical elements and for all the combinations of the parameters of the mutation operator.

Fig. 8
figure 8

Relative error ρ[El] for each chemical element that is fed as input to the Jominy profile predictor for the 4 considered combinations of the parameters α and r of the mutation operator

Figure 8 shows that the relative error on the chemical composition is generally bigger that the one on the Jominy profile, especially for V and Nb. Two possibly concurrent reasons can be argued for this fact.

From the merely computational point of view, Vanadium is relevant for estimating one value of the Jominy profile, i.e. \(\hat{J}_{8}\), corresponding to the distance value d8 = 15 mm, together with other 8 inputs (3 related to the steel chemistry and 5 related to values of the profile), such as reported in Table 1. As the objective function defined in Eq. (1) is adopted for these tests, which gives the same importance to the errors on all the values of the Jominy profile, small mismatching on one value has a limited importance on the overall goodness of the fitting. On the other hand, Niobium is relevant for estimating four values of the Jominy profile, i.e. \(\hat{J}_{5}\), \(\hat{J}_{6}\), \(\hat{J}_{7}\) and \(\hat{J}_{{13}}\) (related to d5 = 9 mm, d6 = 11 mm, d7 = 13 mm and d13 = 40 mm). The NNs forecasting these values hold several inputs, Nb is only one of them and its impact is marginal in the determination of the output value.

From the metallurgical point of view, the combined effect of microalloying elements on steel hardenability is very complex and not yet perfectly understood. It is indeed possible that two different steel chemical compositions give rise to very similar Jominy profiles, and this fact is reflected in the exploited experimental data. As the target of the optimization is the minimization of the error between the target and forecasted Jominy profiles, the system is free to select any steel chemistry, which, according to the data used for training, allows providing a forecasted profile as close as possible to the target one.

The results depicted in Table 8 put into evidence a further aspect of the JoMiner. Combinations of hyper-parameters characterized by low values of α (mutation percent range) and r (number of mutated elements) correspond to optimized chemical compositions that are closer to the actual ones than those obtained by other hyper-parameters couples with higher α and/or r values. Low hyper-parameters values configurations of the GA engine correspond to conservative approaches to the search within the space of possible chemical compositions. An approach that does not extensively explores the solutions domain remains indeed closer to the initial chemical compositions, which are drawn from the original distributions of chemical elements in the training dataset. On the other hand, high-valued hyper-parameters combinations lead to optimized chemical compositions that are quite different from the actual ones, since in these cases exploration of the domain is favored. This discrepancy does not affect the goodness of the solution that is measured in terms of distance between the target Jominy profile and the one predicted by using the optimal chemical composition. In this scenario the selected best performing combination of hyper-parameters is a trade-off between a conservative and an explorative approach to the search task.

4.2 Jominy profile points weighted optimization

As introduced in the previous section, it is possible to use different objective functions in order to exploit the search capabilities of JoMiner to respond to specific industrial requirements. One common case corresponds to the need of more accuracy in arbitrary points of the optimized profile \(\user2{\hat{J}}\) with respect to the target profile JT. In this context, a typical industrial application is the achievement of a higher precision in forecasting the central points of the profile (i.e.11 mm ≤ di ≤ 35 mm), since such points are intrinsically affected by a higher variability with respect to the other ones and mostly influence a number of mechanical properties of interest. This optimization can be pursued by using the objective function shown in Eq. (2), where a weighting system is employed to set the relative importance of each of the 15 points of the profile. In this work the requirement of higher precision in the central part of the profile is addressed by using the weights shown in Table 6. With respect to the weight attributed to discrepancy between forecasted and target hardness values in the initial and final values of the profile, a double penalty is attributed for di = 11.0 mm and di = 35.0 mm, and a quadruple penalty for 13 mm ≤ di ≤ 30 mm.

Table 6 Weighting employed in the optimization test focusing on the achievement of more accuracy in the central part of the Jominy profile

The optimization task is performed for all the curves in the test dataset by using the usual configuration, as far as the GA engine is concerned, and the selected values for the α and r parameters (α = 15, r = 3). The obtained results are reported in Table 7. In addition, Table 8 allows a direct comparison between the values of ει achieved through the weighted optimization and the ones achieved in the base test, by highlighting accuracy improvements. Lower errors are achieved for all the focused points except for the first one (distance 11 mm). Improvement in selected points varies between 10 and 60%. The side-effect of this increased interest on a subset of the profile points is an overall loss of precision that raised from 0.27 HRC to 0.34 HRC, although it remains certainly acceptable at both global and individual points level.

Table 7 Performance of the weighted optimization focusing on the central part of the Jominy profile (highlighted in bold and italic)
Table 8 Comparison between weighted and standard version of the optimizer on test curves in terms of average absolute error

4.3 Optimization by minimizing the consumption of arbitrary elements

A further application of JoMiner involves the consideration of economical or environmental aspects in the optimization process related to the consumption of the micro-alloying elements, by typically minimizing their usage according to either cost or environmental impact criteria. This task is pursued by using Eq. (3) as objective function, since the content of selected chemical elements can be discouraged by using a suitable cost vector Γ = [γ1…γM].

Here a typical optimization study is proposed, that targets minimization of the usage of micro-alloyed elements. This is a quite common goal in the industrial practice, since Cr, Ni, Mo, V, Nb and Ti are expensive and their consumption markedly affects the production cost. More in detail, the adopted Γ cost vector is shown in Table 9. This set-up penalizes the usage of the above-mentioned chemical elements without distinction (i.e. γ = 1 for all of them) with respect to all the other elements of the domain. The equalization parameters of the objective function shown in Eq. (5) is set to β = 3 for order of magnitude balancing purposes.

Table 9 Array of the relative cost associated to each chemical element as employed in the optimization test that includes the minimization of micro-alloyed elements usage

The results obtained by the optimization of the test profiles by using the standard configuration of the GA engine and the selected hyper-parameters are shown in Table 10 in terms of absolute error of the optimized \(\user2{\hat{J}}\) with respect to the associate target profiles JT.

Table 10 Results of the optimization with a weighting factor limiting consumption of costly micro-alloying elements

Table 11 compares the average contents of the relevant chemical elements computed over the optimal solutions determined by JoMiner to the actual ones (i.e. the average contents computed over the target profiles). Noticeably, the contents of the micro-alloying elements, on which the optimization focuses, are reduced: the reduction is slight for Cr and Ti, but sensible for Ni and Mo (around − 18%) and dramatic as far as V and Nb are concerned (− 89% and − 80% respectively). The exchange in terms of discrepancy of the optimized profiles with respect to the target ones is again negligible, since 0.27 HRC ≤ ε ≤ 0.40 HRC (see Table 4), a price which can be worth to pay for a considerable reduction of micro-alloying elements and consequent production costs.

Table 11 Comparison between average chemistry computed over the optimal solutions determined by JoMiner and average chemistry corresponding to the target profiles

Figure 9 depicts form a qualitative point of view the performance of JoMiner in the optimization task on a single Jominy profile while adopting the different objective functions provided by Eqs. (1)–(3). In particular, focusing on the central points of the profile through the objective function in Eq. (2) effectively improves the prediction accuracy in these points (black vs. red curve). On the other hand, the penalization of the consumption of costly micro-alloying elements, which is implemented through the cost function in Eq. (3), slightly modifies the whole shape of the profile (magenta vs. red curve), which, however, is still very close to the target one.

Fig. 9
figure 9

Exemplar Jominy profiles obtained through JoMiner optimization using the different objective functions: red curve for the one in Eq. (1), black curve for the one in Eq. (2), magenta curve for the one in Eq. (3). An enlarged view of the central part of the curve is provided on the right side

5 Conclusions and future work

This paper proposes an approach combining NNs-based models and GA-based optimization to face a relevant practical problem within the steelmaking field, i.e. finding the steel chemical composition, which ensures achievement of a target hardenability profile. The balance between exploration and exploitation in the search for optimal solution is investigated, by varying two hyper-parameters and a trade-off solution is proposed, which also allows exploiting the information hidden in the available dataset. This is an important aspect, if one considers the non bi-univocal correspondence between the steel chemistry and the shape of the Jominy curve, which reflects the still not perfectly known interactions between the micro-alloying elements. Moreover, different optimization strategies can be implemented, targeting the matching of the target profile over the whole or over a particular range of distance values (according to frequently found customers specifications, which are stricter on a limited range of distance values), as well as the minimization of the contents of some micro-alloying elements, in compliance with economic and environmental constraints.

The proposed approach is flexible and customizable to the specific production range and targets of a company, in term not only of exploited dataset exploited for system training, but also of optimization targets, as the weights of the objective functions can be selected based on the customer’s specifications on the Jominy profile and on the production constraints. Even more important, the system, is capable to provide suggestions which goes beyond the standard operating practice, as the generalization capability of the NN-based models and the exploration potential of the GAs helps finding combinations of micro-alloying elements, which might be less frequently used in the past, but can provide adequate Jominy profiles, while saving costly micro-alloying elements. To sum up, the system can extract valuable knowledge from raw data, by supporting plant managers in meeting production demands and achieving economic and environmental targets.

Future work will focus on development of a suitable graphical user interface, which allows an easy deployment of the system in the steelworks, including parameters set-up and NN-based models re-tuning. Moreover, some efforts will be devoted to speed up the GA-based optimization procedure, for instance, through the combination of the current simple termination condition, consisting in the achievement of a maximum number of iterations, with an additional condition on the improvement of the best solution in the most recent generations, by terminating the search if no significant improvement is achieved.