Molten steel temperature prediction using a hybrid model based on information interaction-enhanced cuckoo search

This article presents a hybrid model for predicting the temperature of molten steel in a ladle furnace (LF). Unique to the proposed hybrid prediction model is that its neural network-based empirical part is trained in an indirect way since the target outputs of this part are unavailable. A modified cuckoo search (CS) algorithm is used to optimize the parameters in the empirical part. The search of each individual in the traditional CS is normally performed independently, which may limit the algorithm’s search capability. To address this, a modified CS, information interaction-enhanced CS (IICS), is proposed in this article to enhance the interaction of search information between individuals and thereby the search capability of the algorithm. The performance of the proposed IICS algorithm is first verified by testing on two benchmark sets (including 16 classical benchmark functions and 29 CEC 2017 benchmark functions) and then used in optimizing the parameters in the empirical part of the proposed hybrid prediction model. The proposed hybrid model is applied to actual production data from a 300 t LF at Baoshan Iron & Steel Co. Ltd, one of China's most famous integrated iron and steel enterprises, and the results show that the proposed hybrid prediction model is effective with comparatively high accuracy.


Introduction
Ladle furnace (LF) is a pivotal equipment utilized to fully refine and alloy during secondary metallurgy processes in iron and steel industries [1]. Close control of the temperature of molten steel in LF is vital for the improvement of product quality and productivity [2]. However, the temperature of molten steel cannot be continuously measured in the actual production, which makes it difficult to achieve accurate control. Therefore, it has considerable practical significance to develop a model to predict the temperature of molten steel in LF.
Models for predicting the temperature of molten steel in LF are traditionally developed based on thermodynamics and the energy conservation law [3,4]. However, due to the intrinsic complicacy of LF metallurgy processes, the fundamental mechanisms of involved physicochemical phenomena are not entirely clear by far, and developing a mechanistic prediction model is very time-consuming and costly. As a result, empirical modeling approaches have been extensively used in developing the temperature prediction models of molten steel in LF. In empirical modeling, the model is developed exclusively from the production data without the need to invoke the phenomenology of the process [5][6][7][8]. Thus, the time-consuming and expensive nature associated with the development of a suitable mechanistic prediction model can be averted.
In recent years, hybrid modeling approaches have been considered as an appealing alternative for developing molten steel temperature prediction models. A hybrid prediction model commonly consists of a mechanistic thermal model for representing the known priori knowledge of the LF metallurgy process under consideration, and one or more empirical models for approximating unknown functions in the mechanistic thermal model [9][10][11]. Moreover, according to existing researches, hybrid prediction models have better properties than pure empirical prediction models [9,10]; they typically have better prediction accuracy and generalization performance, and are easier to interpret and analyze.
Regarding to the training of the empirical part, most of the reported hybrid modeling approaches use a direct method as schematically shown in Fig. 1a. The parameters in the empirical part are determined by minimizing the errors between outputs of the empirical part, denoted bŷ g ¼ ½ĝ 1 ; Á Á Á ;ĝ n , and the actual values of unknown functions, denoted by g ¼ ½g 1 ; Á Á Á ; g n . Here, n denotes the number of unknown functions. Obviously, the precondition for using the direct method to train the empirical part is that these actual values are available. In other words, when the actual values of one or more unknown functions are unavailable just like the hybrid prediction model proposed in this article, the direct method could not be used.
To address the above issue, this article proposes an alternative method for the determination of the parameters in the empirical part using the available values of the molten steel temperature instead of the target outputs of the empirical part, as is schematically shown in Fig. 1b. This allows the empirical part being trained indirectly without having its target outputs. In Fig. 1b, x andx denote the measured and predicted temperature values of molten steel respectively.
The determination of the parameters in the empirical part using the above indirect method is a complex optimization problem. It is difficult to calculate the derivative information required by traditional optimization algorithms. Intelligent optimization algorithms, such as the genetic algorithm (GA) [12], particle swarm optimization (PSO) [13], differential evolution (DE) [14], ant colony optimization (ACO) [15], salp swarm algorithm (SSA) [16], artificial bee colony (ABC) [17], and cuckoo search (CS) [18], do not require any derivative information and can perform global search [19][20][21], so using them for finding the parameters in the empirical part is a viable alternative. Amongst these algorithms, CS is a comparatively new one, initially introduced by Yang and Deb [22]. Due to some attractive features like good balance between the local search and global search, simplicity, and efficiency [23,24], the CS algorithm has been successfully applied to many optimization problems in various fields with promising results [25][26][27][28][29], including the parameter optimization problems in modeling manufacturing processes such as parameter estimation of a common empirical model for the temperature of cutting tools [30], estimation of soft-sensing model parameters for fermentation processes [31], and parameter identification of a neural network model for the electron beam welding process [32]. Besides, some researches have revealed that compared with PSO, GA, and some other intelligent optimization algorithms, CS is potentially far more efficient [30,33,34]. However, in the search process of the basic CS (BCS), there is no interchange of search information between individuals (i.e., cuckoos). To address this issue, we propose a modified CS, information interaction-enhanced CS (IICS), by introducing an information interaction-enhanced mechanism into BCS. It is based on the common idea that the information interchange between people would be in favor of their team accomplishing an assignment with efficiency. The proposed IICS is employed to optimize the parameters in the empirical part of the proposed hybrid prediction model.
The remainder of the article is organized as follows. Section 2 elucidates the development of the hybrid temperature prediction model with the proposed indirect Fig. 1 Schematic representation of a hybrid molten steel temperature prediction model: a its empirical part is trained directly; b its empirical part is trained indirectly training method for its empirical part. Section 3 briefly discusses BCS and details the IICS algorithm. Section 4 describes using IICS for determining the parameters in the empirical part of the proposed hybrid prediction model. Section 5 analyzes the performance of IICS by testing on two sets of benchmark functions and then presents the application of the proposed hybrid prediction model on the actual production data from a 300 t LF at Baoshan Iron & Steel Co. Ltd. Finally, conclusions of this study are drawn and considerations for future works are pointed out in the last section.

Development of a hybrid prediction model
In this section, a mechanistic thermal model (i.e., the mechanistic part of the proposed hybrid prediction model) is first derived based on thermodynamics as well as the law of energy conservation. Next, artificial neural networkbased empirical models (i.e., the empirical part of the proposed hybrid prediction model) are used to approximate the unknown functions in the mechanistic part, and the indirect training method for these empirical models is elaborated.

Development of mechanistic part
Taking the molten steel and slag as a unitized system, a mechanistic thermal model is developed in this subsection based on the energy conservation law and thermodynamics. Similar to the existing literature [2], the following three assumptions are made: (1) no local temperature gradient exists in the steel bath, i.e., the steel bath is fully mixed; (2) there is only radial heat flow in the ladle wall; and (3) only axial heat flow occurs in the ladle bottom.
In the remainder of this subsection, the primary factors affecting the temperature change of molten steel are described in detail. Then, a mechanistic thermal model with two unknown functions is presented.

Thermal gain from the arc
The energy required for secondary steel refinement in LF is mainly from the arc. The thermal gain of the steel bath due to the energy injection through arc can be calculated as where Q arc is the power in W injected into the steel bath, g arc is the efficiency coefficient of heat transfer from the arc to the steel bath, and P arc is the total power. For a given LF system, the value of g arc mainly depends on the slag thickness H sl and arc length L arc [10]; that is to say, g arc is a function of H sl and L arc , shown as Hence, once the function f arc has been obtained, the value of g arc can be calculated using the online available H sl and L arc . However, it is noteworthy that the concrete expression of this function is hard to derive by mechanistic approaches.

Thermal loss from the ladle lining
Thermal loss from the ladle lining consists of two components, the thermal loss from the ladle wall and the thermal loss from the ladle bottom. Here, the instantaneous temperature distribution models of the ladle wall and ladle bottom are first established; then the thermal loss from the ladle lining is calculated based on these two models.
2.1.2.1 Instantaneous temperature distribution model of the ladle wall With the assumptions above, the heat transfer in the ladle wall can be considered as a one-dimensional unsteady heat conduction in cylindrical coordinates [2,3,35], formulated as Equation (3) is the heat conduction differential equation of the ladle wall, where T w , q w , k w , and c w are the temperature in°C, density in kg/m 3 , heat conductivity in W/ (m°C), and specific heat in J/(kg°C) of the ladle wall respectively. The boundary conditions for Eq. (3) are given in Eqs. (4) and (5), where r 1 and r 2 are respectively the inner and outer diameters of the ladle wall in m; T st is the molten steel temperature in°C; a w-en is the convection heat transfer coefficient between the steel shell of the ladle wall and the environment in W/(m 2°C ); and T ! is the environment temperature in°C. Equation (6) [2,3,35], formulated as Equation (7) is the heat conduction differential equation of the ladle bottom, where T b , q b , k b , and c b are, respectively, the temperature in°C, density in kg/m 3 , heat conductivity in W/(m°C), and specific heat in J/(kg°C) of the ladle bottom. Equations (8) and (9) are the boundary conditions for Eq. (7), where h b is the thickness of the ladle bottom in m; and a b-en is the convection heat transfer coefficient between the steel shell of the ladle bottom and the environment in W/(m 2°C ). Equation (10) is the initial condition. See the existing literature [2] for more details.

2.1.2.3
Thermal loss from the ladle lining Thermal loss from the ladle lining is obtained by calculating the heat flow of the contact surface between the ladle lining and the molten steel as where Q lin is the thermal loss from the ladle lining in W; h is the height of the steel bath in m; k w,1 is the heat conductivity of the ladle wall material that is in contact with the molten steel in W/(m°C); and k z,1 is the heat conductivity of the ladle bottom material that is in contact with the molten steel in W/(m°C).

Thermal loss from the top surface
The top surface thermal loss mostly results from the radiation loss through the bare molten steel surface and slag surface. It is difficult to do an exact calculation for this loss using traditional mechanistic models. So, here the cooling water energy change is used to indirectly calculate this part of thermal loss as where Q sur is the top surface thermal loss in W; g sur is the correction coefficient; c cw , F cw , and DT cw are the specific heat in J/(kg°C), flow rate in kg/s, and temperature difference in°C between inlet and outlet of the cooling water. Through analysis, g sur can be considered as a function of three online available variables, F arg (the argon flow rate in Nm 3 /s), D sl-co (the distance between the steel bath and the ladle cover in m), and T st as Similar with f arc , the concrete expression of f sur is also hard to derive by mechanistic approaches.

Thermal effects resulting from the additions
The additions include the slag and metal alloys. Their thermal effects are calculated as where Q add is the total thermal effect of additions in W; i denotes a specific addition with mass m i in kg, and k i is its temperature influence coefficient in°C/kg; s i is the time that addition i takes to reach the steel bath temperature in s; and m st and c st are the mass in kg and specific heat in J/ (kg K) of molten steel respectively. The value of k i can be obtained by statistical analysis based on actual production data. Table 1 shows the temperature influence coefficients k i s of various additions for the LF system considered in this study.

Thermal loss due to stirring-argon injection
The thermal loss due to stirring-argon injection is calculated as where Q arg is the heat flow carried away by argon in W; and c arg and T arg are the specific heat in J/(Nm 3°C ) and initial temperature in°C of argon respectively.

The overall mechanistic thermal model
According to the energy conservation law, the following mechanistic thermal model can be obtained by combining all of the above factors where m sl and c sl are the mass in kg and specific heat in J/ (kg K) of the slag respectively.

Development of empirical part using indirect training method
Obviously, the above mechanistic thermal model cannot be immediately used for predicting the temperature of molten steel, as there are two unknown functions, namely f arc and f sur . In this article, two single-hidden layer feed-forward neural networks (SLFNs) are utilized to respectively approximate f arc and f sur (see Fig. 2), for such neural networks can approximate any nonlinear relationships arbitrarily well [36], and they have been successfully applied in modeling some LF metallurgy processes to predict the molten steel temperature [5,6,9,10]. However, for practical LF metallurgy processes, only the values of the inputs to two empirical models, [H sl , L arc ] and [F arg , D sl-co , T st ], are available, whereas no target values of their outputs g arc and g sur are available. Therefore, traditional neural network training methods, which usually train the empirical part by directly minimizing the errors between the outputs of the empirical model(s) and its (their) target outputs, would be difficult to apply. In this study, the two SLFN-based empirical models are trained indirectly by minimizing the errors between the molten steel temperature predicted by the hybrid model and its measured values. The basic description of this indirect training method is given below.
The two SLFN-based empirical models of f arc and f sur can be formulated as g arc ¼f arc ðH sl ; L arc ; h arc Þ ð 17Þ g sur ¼f sur ðF arg ; D slÀco ; T st ; h sur Þ ð 18Þ wheref arc andf sur denote the SLFN-based empirical models used to approximate the unknown functions f arc and f sur , respectively, while h arc and h sur are the vectors of weights and bias of the corresponding SLFNs.
In essence, to trainf arc andf sur is to determine the optimal values of their parameters h arc and h sur . The indirect method fulfills the training task as follows. Firstly, Eqs. (17) and (18) are, respectively, substituted into Eqs. (1) and (12), so that a hybrid prediction model is obtained dTst ds ¼f arc ðHsl; Larc; harcÞParc À Qlin Àf sur ðFarg; DslÀco; Tst; hsurÞccwFcwDTcw À Qadd À Qarg cstmst þ cslmsl Then, h arc and h sur are regarded as the vectors of parameters to be identified in the hybrid prediction model, and further they are determined by using the proposed IICS algorithm to minimize an objective function which is defined with the measurements of the molten steel temperature, as shown below.
where H is the number of heats of training data; M h is the number of temperature samples of molten steel in the hth heat; x represents the measured value; andx represents the hybrid model prediction. In such way, the training off arc andf sur can be appropriately fulfilled while avoiding the requirement of their target outputs. Al

Information interaction-enhanced CS
In this section, a brief overview of basic CS (BCS) is first given for self-completeness. Next, the proposed information interaction-enhanced CS (IICS) algorithm is elaborated, followed by the complexity analysis of IICS.

BCS algorithm
Cuckoo search (CS), introduced by Yang and Deb in 2009 [22], is one of the intelligent optimization algorithms. The core idea behind this algorithm is some cuckoo species' brood parasitism. Besides, the CS algorithm also incorporates into its framework the mathematical model of the Lévy flight behavior found in some birds and fruit flies.
The following are some idealized rules adopted in CS development [22]: (1) the number of available host nests is fixed, and each cuckoo each time lays one egg in a randomly selected nest; (2) the nests containing high-quality eggs will be chosen to partake in the next generation; and (3) the egg laid by a cuckoo can be identified by the host bird with a probability p a . Once it is identified, the host bird will either push the egg out or just discard the nest, and then make a new one. In addition, it is worthy to point out that a nest, an egg or a cuckoo is equivalent to a solution and only minimization problems are considered in the rest of the article without loss of generality. Based on the rules above, Lévy flight is performed first to generate the new solution z new i for cuckoo i with the following formula: where a is a vector of step size scaling factors that should be related to the scales of the problem under consideration, and it can be in most cases used as [37][38][39] where a 0 is a constant usually set as 0.01 [38,39] and z best represents the current best solution. In Eq. (21), Le´vy(k) is a random vector drawn from a Lévy distribution, k is a Lévy flight parameter, and represents the entry-wise multiplication.
In essence, Lévy flights offer a random walk with random steps drawn from a Lévy distribution. From the perspective of implementation, there are two procedures to generate random numbers using Lévy flights [38], that is, the selection of a random direction and the generation of steps that obey the selected Lévy distribution. The generation of a direction should be drawn from a uniform distribution, whereas the generation of steps is quite tricky. There are several ways of accomplishing this, but one of the most efficient and yet straightforward ways is to utilize the Mantegna algorithm [40], in which the step size s is given by where u and v are drawn from the following normal distributions: where C represents the Gamma function, and b is a distribution parameter related with k in Eq. (21) as k ¼ 1 þ b (0\b 2, and b ¼ 1:5 [38] in CS). As per the above, Eq. (21) can be rewritten as where r is a random vector with all its elements generated from the standard normal distribution N (0, 1), and s is calculated using Eq. (23). After comparing the fitness values between each old and new solution at the same nest and retaining the solutions with lower fitness values, the new solution z new i for cuckoo i is generated again by imitating the action of alien egg discovery, which can be formulated as: where z j and z k are two randomly selected solutions; p a is a vector with all its elements being p a ; rnd 1 is a random number generated from the standard uniform distribution Uð0; 1Þ; rnd 2 is a random vector with all elements generated from Uð0; 1Þ; and H(•) is a step function, defined as More details on BCS can be found in [38].

IICS algorithm
As mentioned above, information interaction between individuals is lacking in the search process of BCS, while it is well known that the information interaction between people plays an important role for their team to accomplish an assignment with high efficiency. Accordingly, it is expected that a CS with an information interaction-enhanced mechanism can realize a better search performance than BCS. Based on this consideration, a modified CS called information interaction-enhanced CS (IICS) is proposed in this article. In IICS, cuckoo i is offered an opportunity to get some potentially useful information from a selected information provider z pi (here p i defines which cuckoo should be selected as the information provider for cuckoo i). To get the z pi for cuckoo i, three candidates are chosen from z = [z 1 , ..., z i-1 , z i?1 ,..., z Np ] randomly, where N p denotes the population size. Among these three candidates, the one with the best (i.e., smallest) fitness value is selected as z pi , and meanwhile its index is assigned to p i . The main differences between IICS and BCS are the updating formulas of the Lévy flight and alien egg discovery. To be specific, in IICS, the updating formulas used in BCS, namely Eqs. (26) and (27), are replaced with Eqs. (29) and (30), respectively. where In the above equations, I is a vector with each element being one; c a is a vector with all elements being c a , which is a coefficient for adjusting the combination of the proposed information interaction-enhanced mechanism and Lévy flights (or the alien egg discovery); rnd 3 , rnd 4 , rnd 5 , and rnd 6 are random vectors with all their elements drawn from U (0, 1); and H(•) is the same step function as defined in Eq. (28). Figure 3 presents the complete implementation of the proposed IICS. Firstly, various parameters are set and a group of N p solutions are randomly initialized (lines 1-2). Next, the fitness function value for each solution is calculated, and the one with the best fitness in the current population is assigned to z best (lines 3-4). Thereafter, these solutions are sequentially evolved with the first (lines 7-9) and second (line 14) proposed search operators, which respectively combine the information interaction-enhanced mechanism with Lévy flights and the alien egg discovery. Following the generation of each new solution, the optimal selection between the old and newly-generated solutions at the same nest will be performed, and the one with the smaller fitness value is retained (lines 10-13 and 15-18). Finally, z best is updated and it is selected as the optimal solution of the search process (lines 19-21).

Complexity analysis of IICS
To facilitate the analysis, the computation time complexity of BCS is given firstly in this section. For each cuckoo, O (D) number of operations are performed in an iteration in BCS, resulting in O (N p , D) complexity. However, generally CS runs for a number of iterations, so the overall complexity depends on the maximum iteration number (g max ). This procedure gives the overall time complexity of BCS as O (N p . D. g max ). Compared with BCS, our IICS needs to perform additional computations of O (N p . D. g max ) for the proposed information interaction-enhanced mechanism. Meanwhile, the selection of information providers consumes further computational complexity of O (N p . D. g max ). Accordingly, the computation time complexity of IICS is the same as that of BCS, i.e., O (N p . D. g max ). However, IICS significantly outperforms BCS according to the experimental results given in the following sections. These observations suggest that when compared with BCS, our IICS achieves better tradeoff between performance improvement and computation time complexity.

Optimizing parameters of the hybrid model using IICS
In this section, the IICS algorithm is applied to solve the parameter optimization problem of empirical modelsf arc andf sur in the hybrid prediction model described by Eq. (19), hereafter referred as to the PO problem, and thus accomplish their training with the proposed indirect method. In the process of search, nest i is encoded as

Experiments
In this section, the IICS algorithm was validated on 16 classical benchmark functions and the 29 CEC 2017 benchmark functions. Then, the actual production data from a 300 t LF at Baoshan Iron & Steel Co. Ltd were used to build the proposed hybrid prediction model using IICS.

Validation on classical benchmark functions
In this subsection, 16 classical benchmark functions [41][42][43] are employed for investigating the performance of IICS. These benchmark functions are listed in Table 2, which includes the mathematical formula, search range, and function value at the global minimum (F Ã ) of each benchmark function. These benchmark functions fall into High conditioned elliptic function Sum square function Griewank's function Rastrigin's function Ackley's function Generalized penalized function1 Generalized penalized function2 Neural Computing and Applications (2021) 33:6487-6509 6495 two categories: unimodal problems and multimodal problems. Of them, f 1 , f 2 , f 3 , f 4 , and f 5 are unimodal ones containing only one optimum, whereas the rest 11 benchmark functions, namely f 6 to f 16 , are multimodal ones having many local optima, but only one global optimum. In addition, it should be pointed out that the validation is performed with 50 variables, that is, the dimension (D) of each benchmark function is 50.
In order to show the competitiveness of the proposed IICS, the proposed IICS is compared with six recently developed CS variants, i.e., CCS [43], ACS [44], ICS [45], NNCS [46], BHCS [47], and HECS [48], as well as BCS in this subsection. For fair comparison, Np and g max (the maximum iteration number used to terminate the iteration) of the eight CS-based algorithms are set to be same, that is Np = D and g max = 5000 as done in the studies [46,49]. We follow the parameter settings of CCS, ACS, ICS, NNCS, BHCS, and HECS used or recommended in the studies conducted on them. For BCS and IICS, the configuration of the common parameter p a is taken from that originally utilized by Yang and Deb [22,38]. From these studies, p a = 0.25 is a better choice for most optimization problems. The specific parameter of IICS, i.e., c a , is adjusted via experiment analysis. According to the experiments, it was found that when c a = 0.08 the search performance of IICS can be well balanced on different kinds of optimization problems. The parameter settings of the eight CS-based algorithms are summarized in Table 3.
To reduce random discrepancy, each algorithm is performed independently for 30 runs on each benchmark function, and the mean, best, worst, and standard deviation (SD) of the function error (f(x best ) -F*), where x best represents the best solution achieved by the algorithm in a run, are calculated and recorded. The test results of the eight CS-based algorithms on the 16 classical benchmark functions are shown in Table 4. The best mean error values among the eight algorithms are highlighted in boldface. On these 16 classical benchmark functions, IICS produced the lowest mean error values for eight of these, while BHCS produces the lowest mean error values in five cases. For two of the benchmark functions, IICS and some of its competitors are seen to produce equal results. For f 14 , it is observed that CCS produces the better solution. Although BHCS attains the best mean results on 4 out of the 5 unimodal benchmark functions, it sacrifices performance on multimodal ones. On the contrary, IICS is not only very efficient in solving the unimodal problems but also attain very competitive performance on the multimodal ones. To statistically compare IICS with each competitor, the multiple-problem Wilcoxon rank test is conducted at a significance level of 5% based on the mean error value. In Tables 4 and 5, the statistical significance state is indicated with the symbols ? , &, and -, denoting that the competitor performs significantly worse than, insignificantly different from, and significantly better than the IICS algorithm respectively. Moreover, the Friedman test is also applied to determine the differences between these algorithms and rank them with a significance level of 0.05. Table 4 indicates that IICS provides significantly better results than all of the other seven algorithms. The row headed ''Mean Rank'' provides the final ranking of different algorithms for all 16 classical benchmark functions. The results show that IICS, NNCS and BHCS are in the first, second and third orders with 1.56, 3.84, and 3.91 mean rank values respectively. In addition, the p value (7.4999e-09) is smaller than the chosen significance level (0.05). This indicates that there is at least one significant difference among the algorithms' results.
To show the convergence process visually, the convergence curves of each algorithm in term of the mean error values on the classical benchmark functions are presented in Fig. 6. For the ease of comparison, semilogarithmic coordinate is used to plot the convergence curves of each benchmark function, except that of f 14 . When the convergence curves in Fig. 6 are analyzed, it can be observed that IICS performs well in 14 out of the 16 classical benchmark functions. IICS has significantly higher convergence speeds in f 6 , f 7 , f 9 to f 13 , and f 16 compared with the other seven CS-based algorithms. Therefore, IICS can be regarded efficient. However, the result patterns slightly differ in some of the 14 functions: for f 14 , it can be seen that the rapid convergence of IICS is at the expense of being trapped in the local minima, while the results of f 1 to f 3 , f 5 , and f 8 show that IICS and BHCS perform similarly in terms of convergence speed and accuracy. In addition, it can also be observed that IICS is weak in f 4 and f 15 . For these two functions, BHCS has the highest convergence speeds and best search results.

Validation on CEC 2017 benchmark functions
In this subsection, benchmark functions from CEC 2017 are used as the benchmark test set. This test set consists of 29 benchmark functions: two unimodal functions, F 1 and F 3 ; seven simple multimodal functions, F 4 -F 10 ; ten hybrid functions, F 11 -F 20 ; and ten composition functions, F 21 -F 30 . The order of these functions is the same as the original article [50]. The detailed functions are not presented here to save space. These benchmark functions are considered difficult to optimize, as all of them are shifted and rotated, and some of them are hybrid or composition functions. It should be noted that the function F 2 named ''Shifted and Rotated Sum of Different Power'' is not used here due to unstable behavior especially for higher dimensions, as described in [50]. The detailed definitions can be found in the original article.
In addition to BCS, CCS, ACS, ICS, NNCS, BHCS, and HECS, this subsection compares IICS with four other intelligent algorithms, i.e., hybrid firefly and particle swarm optimization (HFPSO) [51], enhanced LSHADE-SPACMA (ELSHADE-SPACMA) [52], hybrid sampling evolution strategy (HSES) [53], and improved sine cosine algorithm with crossover scheme (ISCA) [54], which belong to PSO, DE, covariance matrix adaptation evolution strategy (CMA-ES), and sine cosine algorithm (SCA) communities, respectively. These four algorithms have all proved their good performance on CEC 2017 benchmark functions, and HSES and ELSHADE-SPACMA have won the first and third places in the CEC 2018 competition respectively. Parameter configurations of the new selected algorithms are the same as in the corresponding references, as listed in Table 3. Size of the population (Np) is set equal to the benchmark function dimension (D) in IICS, BCS, CCS, ACS, ICS, NNCS, BHCS, HECS, HFPSO, and ISCA, while the settings of Np in ELSHADE-SPACMA and HSES are consistent with those in the original studies. In accordance with the original article of the competition of CEC 2017 problems [50], each algorithm is repeated 51 runs with the maximum number of function evaluations set to 10,000 9 D. Table 5 lists the obtained results from all involved algorithms on CEC 2017 benchmark functions with D = 50, including the mean, best, worst and standard deviation (SD) values of the function error of every benchmark function obtained by each algorithm, as well as the findings from the multiple-problem Wilcoxon rank test and Friedman test both at a significance level of 0.05. In Table 5, the mean, best, worst and SD values for ELSHADE-SPACMA and HSES are collected from the original articles [52,53], and values smaller than 10 -8 are indicated as 0.00e?00. When the results from IICS compared with those from the other seven CS-based algorithms, it can be seen IICS produces the best mean error values for 17 of the 29 benchmark functions, while all the other seven CS-based competitors do so for just 13 benchmark functions. Examination of Symbol row in Table 5 further indicates that IICS achieves significantly better results than the other seven CS-based competitors, as well as HFPSO and ISCA. However, the ELSHADE-SPACMA and HSES algorithms are seen to return superior results to IICS. The findings from the Friedman test show that ELSHADE-SPACMA, HSES and IICS have the first, second, and third mean rank values of 1.69, 1.95, and 4.02, respectively.

Experimental verification based on actual production data
In this section, 537 heats of actual production data from a 300 t LF built in Baoshan Iron & Steel Co., Ltd., are employed to verify the ability of the proposed hybrid prediction model, as well as the performance of the IICS algorithm. Among these data, 437 heats are randomly selected for the development of the proposed prediction model, and the remainders are utilized for testing its performance. The parameter setting for IICS to solve the PO problem is as follows: p a ¼ 0:25, c a ¼ 0:08, N p ¼ D (where D = [(2 ? 1) 9 hd 1 ? (hd 1 ? 1) 9 1] ? [(3 ? 1) 9 hd 2 ? (hd 2 ? 1) 9 1], and hd 1 and hd 2 denote the hidden neuron numbers of the empirical modelsf arc andf sur respectively), and g max = 5000. Moreover, it should be   pointed out that when IICS is employed to solve the following model parameter optimization problems, 20 independent calculation runs are conducted to reduce random discrepancy. Correspondingly, the mean predicted values are utilized for the following hidden neuron number selection, as well as the model prediction performance exhibition and comparison. For each of the above two empirical models, the activation functions for the hidden layer and the output layer are the sigmoid function and the linear function, respectively. The optimal numbers of hidden neurons forf arc and f sur are 3 and 5, respectively, determined by trial and error; that is to say, the selected topologies of these two empirical models are 2-3-1 and 3-5-1, respectively. After the topologies off arc andf sur are selected, all the 437 heats of modeling data are utilized for determining the model parameters with the methodology depicted in Sect. 4, to obtain the overall hybrid temperature prediction model of molten steel.
Then, the 100 heats of testing data are utilized to evaluate the performance of the proposed hybrid prediction model. Figure 7 shows the final molten steel temperature predicted by the developed hybrid model. Figure 7 shows that this model can predict the temperature with high accuracy. Out of these prediction results, the absolute error in 91% of the cases is lower than 5°C (desirable value), and in 95% of the cases, it is lower than 7°C (tolerable value), and only in 2% of the cases is absolute error higher than 10°C. This demonstrates the effectiveness of the hybrid prediction model, with the proposed indirect training method for its empirical part.
To demonstrate the excellent prediction ability of the proposed hybrid model, this article also develops an empirical prediction model based on the above selected 437 heats of production data. To be fair to the comparison, this empirical model is also established utilizing a SLFN and its parameters (namely the network's weights and thresholds) are determined by IICS. The input layer of this SLFN-based empirical prediction model has eight neurons. They are the initial molten steel temperature, total power consumption, ladle state, heat effect of additions, total argon consumption, weight of molten steel, refining time, and energy change of cooling water in the water-cooled cover. The hidden layer has 13 neurons (determined by trial and error), and the output layer has one neuron (the final molten steel temperature). Figure 8 shows the results predicted by the empirical model. For ease of comparison, the prediction errors (PE) of the proposed hybrid model and empirical model, as well as the differences between the prediction errors (D_PE) of these models are presented in Fig. 9a, b respectively. Herein, the differences in more details are the results obtained by subtracting the absolute where N a is the number of heats with absolute prediction errors not higher than 5°C, and N t is the number of total testing heats. The calculation results with respect to these four indices for the above two models are listed in Table 6. From Figs. 7-9, it can be observed that both the hybrid model and the empirical one could predict the molten steel temperature with certain accuracy, while the prediction values given by the former are much closer to the measured values than those given by the latter. Furthermore, as can be observed from the data in Table 6, the proposed hybrid model predicts significantly better than the empirical one. Compared with the empirical model, the RMSE, MAE, and MRE of the proposed hybrid model are respectively lower by 33.11%, 30.81%, and 30.77%; while the AR of the proposed hybrid model exceeds 90%, an 18.18% improvement over the empirical one. These demonstrate the excellent prediction performance of the proposed hybrid model in a practical application. From these observations and comparisons above, it can be concluded that the proposed hybrid model is a promising predictor for the molten steel temperature.
Moreover, to confront the search capability of IICS with some other widely used intelligent algorithms in model parameter optimization, GA [12], PSO [13], DE [14], ACO [15], and BCS are employed to solve the same PO problem with the topologies off arc andf sur being 2-3-1 and 3-5-1 respectively, and the 437 heats of modeling data. In addition, it is interesting to investigate the performance of the two winners of the CEC 2018 competition, i.e., HSES and ELSHADE-SPACMA, on this PO problem. The parameters of these four new selected algorithms are set according to the respective studies. Specifically, in GA the BLX-a Crossover is used, and the crossover probability p c , mutation probability p m , and parameter a for BLX are set to 0.85, 0.02, and 2.0 respectively. In PSO the inertia weight x, cognitive acceleration coefficient c 1 , and social acceleration coefficient c 2 are taken as 0.5, 2.0, and 1.0, respectively. In DE the scale factor F and crossover probability P xover are both set to 0.5, and the mutation operator is DE/best/1. In ACO the selection parameter q 0 ,  Table 7 gives the results of the eight algorithms on the PO problem in 20 independent runs, and the best results are  Table 7, in terms of the mean fitness function value both BCS and IICS give better results than the other four widely-used model parameter optimization algorithms, revealing that CS is relatively more suitable for solving the PO problem involved in this study. It can also be seen that IICS performs better than ELSHADE-SPACMA and HSES, which indicates that IICS is an efficient algorithm for solving the PO problem. Furthermore, it can be observed that IICS produces much better optimization results than BCS. Figure 10 illustrates the convergence progress of IICS and BCS on the PO problem. It can be found from Fig. 10 that IICS has a better search accuracy and a higher convergence speed, which demonstrates again that the proposed IICS algorithm has greatly enhanced the performance of BCS.

Conclusions
A hybrid model for the prediction of molten steel temperature in LF is proposed. In the proposed hybrid prediction model, two SLFN-based empirical models are incorporated within the structure of a mechanistic thermal model, to represent the unknown functions in the mechanistic thermal model. The primary difference between the proposed hybrid prediction model and existing ones is that its empirical part is not trained in the traditional direct way since the target outputs of the two empirical models are unavailable in advance. In the proposed approach, the empirical part is trained indirectly with the readily available temperature measurements of molten steel but not the barely accessible target outputs of this part, which means the hybrid prediction model with its empirical part trained by the proposed indirect method has more extensive application range when compared to existing ones. Application results on the production data from a 300 t LF at Baoshan Iron & Steel Co., Ltd, show the effectiveness and superiority of the proposed hybrid prediction model. Another main innovation of this article is the development of the information interaction-enhanced CS (IICS), which is used to optimize the parameters in the empirical part so as to complete the development of the proposed hybrid prediction model. One of the problems with BCS and many of its variants is that the information interaction among cuckoos is lacking in the search process, which   would decrease their search performance considerably. In order to overcome this problem, an information interactionenhanced mechanism is proposed and employed in IICS. The optimization results on the model parameter estimation problem and two benchmark sets (16 classical benchmark functions and 29 CEC 2017 benchmark functions) indicate that IICS has distinct advantages over its competitive algorithms (expect ELSHADE-SPACMA and HSES) on these optimization problems. When compared with the winners of the CEC 2018 competition, i.e., ELSHADE-SPACMA and HSES, the performance of IICS is found to be inferior to the two top algorithms on the CEC 2017 benchmark set, but it produces better results on the parameter optimization problem involved in this study. Despite its promising performance, the proposed IICS still has limitations. First of all, compared with BCS, one more parameter (i.e., c a ) is used by the algorithm to perform the proposed information interaction-enhanced mechanism. Consequently, the parameter tuning process used to achieve a reasonably good performance of IICS can be time consuming. As for the two common parameters of IICS and BCS (i.e., a and p a ), our current study sets them directly according to the recommendation of Yang and Deb [22,38]. There may be better value combinations of the three parameters. But their tuning process will no doubt become much more time consuming, and it might also require retuning when the algorithm is applied to solve different optimization problems.
Based on the current study, several future work directions can be pursued. Firstly, a parameter self-learning strategy could be constructed for the proposed IICS algorithm so as to tune the three involved parameters (i.e., a, p a , and c a ) adaptively. Secondly, there is still room for improvement in the selection strategy of information providers in IICS. The selection of information providers in this article is a kind of blindness; therefore, a more effective selection strategy is worthy to research. Finally, the proposed indirect hybrid modeling method could be also applied to other LF refining processes or other similar complex industrial processes. adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.