1 Introduction

Due to the increasing demand for iron, the management of these resources is an important subject of recent researches, which has attracted researchers to apply a robust reference tracking control strategy for managing these resources. The use of modeling and simulation is a well-accepted technique in many processes, especially in mineral processing. The prediction of efficiency is an important parameter for processes applied in mineral processing. In the industrial magnetic separation process, the online prediction of separation efficiency is costly and time-consuming. To overcome these difficulties, the use of statistical and artificial intelligence techniques for monitoring purposes has been developed. Artificial intelligence methods were found to be more efficient than statistical methods for the prediction of process efficiency.

Artificial intelligence techniques, such as support vector machine (SVM) [1], random forest (RF) [2], adaptive neural-based fuzzy inference system (ANFIS) [3], radial basis function neural network (RBFNN) [4] and artificial neural network (ANN) [5] have been efficiently used in the modeling and simulation of processes. Artificial neural networks employ very powerful computational techniques for modeling complex nonlinear relationships. They can be applied to solve and model many complicated processes due to their acceptable accuracy, ease of modeling, robustness, simplicity and nonlinearity.

Artificial neural networks estimate complex nonlinear associates existing between input and output variables to an arbitrary degree of accuracy. Recently, neural networks have been applied as a powerful modeling tool in different processes such as adsorption [6], flotation [7], liquid–liquid extraction [8] and many other fields of modeling problems [9, 10]. They could be successfully used in regression and interpolating problems. For example, Allahkarami et al. [5] used an artificial neural network to estimate copper grade and recovery based on operational parameters. Jahedsaravani et al. [3] have applied intelligent techniques (i.e., neural network and adaptive neuro-fuzzy method) and statistical approach (i.e., nonlinear regression) for modeling the metallurgical parameters in the batch flotation process. Their results indicated that intelligent techniques outperform statistical approaches. Medi et al. [10] have used artificial neural network models with a novel k-fold cross-validation technique to optimize minimum film thickness, maximum transparency and maximum conductivity. Their results showed that the developed model could predict the film characteristics with a reasonable error. Tripathy et al. [11] applied an artificial neural network to predict the performance of induced roll high-intensity magnetic separator. Four variables including applied current, rotor speed, feed rate and splitter position were used to develop a network for predicting the performance of the separator. All data were normalized in the range of − 1 and + 1. Mean square error was used to evaluate the performance of the developed model for the determination of the optimal structure. They found that the proposed model provides realistic predictive performance (R2 > 0.95). Some scholars studied the modeling of metallurgical parameters in the flotation process through statistical and intelligent techniques. Their results indicated that intelligent techniques (i.e., neural network and adaptive neuro-fuzzy) are more efficient tools than statistical approaches (i.e., nonlinear regression) [3]. In other work, data mining, neural network and time series analysis were applied to evaluate and model the copper concentrator. They used data mining for selecting the most effective variables on the metallurgical performance of the process. Modeling and predicting the future trend of copper concentrator were conducted by ANN and time series analysis, respectively [12]. Tohry et al. [2] used Pearson correlation and random forest methods for optimization and variable assessments for SLon® operating variables to evaluate their effects on separation efficiency. Hu et al. [13] combined a genetic algorithm approach with pulp and froth modeling in different flotation cells to optimize the flotation circuit layout for 3–8 cells (Hu et al. 2013). However, various statistical and intelligent techniques have been applied to simulate the separation process in mineral processing industries. Some of them are unsuitable for deep learning, relatively slow and relatively complex for design. Using random initial weights for neural network training may lead to trapping into the local minima and slow converging. As a result, the NN has often been unable to find a desirable solution. So, in this study, genetic algorithm was used to optimize weights and biases of neural networks to improve the performance of ANN.

To our knowledge, there is no information about using artificial intelligence for modeling the magnetic separation process. Hence, finding a robust and more accurate estimation method for predicting the separation efficiency and selectivity index is still necessary. Therefore, this work aims to develop an intelligent technique based on artificial neural network and hybrid neural genetic algorithm for modeling the concentration process. To evaluate the performance of the proposed model, the statistical criteria, including the coefficient determination (R2), mean square error (MSE), median absolute deviation (MAD) and mean absolute percentage error (MAPE), are used.

2 Description of industrial magnetic separation process

The Golgohar ore body is located in southeast Iran, which contains half billion tons of ore having about 42% iron. Golgohar Mining and Industrial Company operates six separated ore bodies of Golgohar. It produces 10.8 million tons of iron ore concentrate, 157 thousand tons of fine iron ore and 5.3 million tons of iron ore pellets. Golgohar Mining and Industrial Company has four mineral processing plants, including concentration plant, hematite recovery, pelletizing and Polycom iron ore concentrate plant for magnetic separation of iron ore. Flow sheet of Polycom iron ore concentrate plant is shown in Fig. 1. It consists of primary crushing, cone crushing, screening, milling, magnetic separation, thickening and filtering.

Fig. 1
figure 1

The simplified flow sheet of Polycom iron ore concentrate circuit

In this study, the percentage of Fe, FeO and S in mill feed and cobber feed, 80% passing size in mill feed and cobber feed and plant input were considered as the inputs to the network. Table 1 gives the summary statistics for each input variable.

Table 1 The summary statistics for input variables

The percentage of FeO in the final concentrate fluctuates between 23 and 26%, depending on operating conditions and ore type. The presence of sulfur in iron concentrate affects the price of iron ore concentrate because sulfur in steel decreases its quality. Furthermore, sulfur in iron concentrates causes environmental problems, especially sulfur dioxide emissions during the smelting or pelletizing processes. So, the separation of S from iron (Fe) is important. Grade and recovery are the most widely accepted indices to evaluate a process. There are different indices for evaluating mineral processing processes, such as separation efficiency, selectivity index, operation efficiency and efficiency ratio [14]. Separation efficiency index is the combination of grade and recovery, and selectivity index is a measure for evaluating two-product separation. In this study, two goals of this separation process are (1) to separate sulfur from iron (selectivity index is a suitable index for the evaluation of this separation process) and (2) to measure the recovery of FeO in concentrate regarding the recovery of sulfur in tailing (separation efficiency is a suitable index for the evaluation of this separation process). Therefore, two indices named separation efficiency (SE) and selectivity index (SI) were used. These indices are calculated by Equations (1) and (2) [15, 16]:

$$SE= {R}_{m}-{R}_{g}$$
(1)
$$SI= \sqrt{\frac{{R}_{a}\times {R}_{b}}{\left(100-{R}_{a}\right)\times \left(100-{R}_{b}\right)}}$$
(2)

where Rm and Rg are the recoveries of valuable and gangue minerals, respectively; Ra and Rb are the recovery of iron (Fe) in concentrate and the recovery of sulfur (S) in tailing fraction. So, variables of the separation efficiency and selectivity index were used as the network outputs. Table 2 presents the summary statistics for each output variable.

Table 2 The summary statistics for output variables

In this research, ANN and GA-ANN models were used and compared with each other. All calculations were carried out using Matlab 2017b for windows, which were run on a personal computer (Pentium V 2.3 GHz). All 193 datasets were randomly selected to train and test the developed models. So, 70% of all data (135) was randomly considered to train the model, whereas 30% of them (58) was selected to test the network. To facilitate modeling and speed training, all data before using in models were normalized in the range of -1 and 1 based on the following equation:

$${X}_{N}=2\frac{X-{X}_{min}}{{X}_{max}-{X}_{min}}-1$$
(3)

where XN is the normalized value of each parameter; X, Xmax and Xmin are original, maximum and minimum values of the parameters, respectively.

3 Model description

3.1 Artificial neural network

Recently, soft computing methods have been successfully used in mineral processing systems [17, 18]. They are used for solving very complex engineering problems. The most commonly used method in soft computing methods is artificial neural network (ANN) [19]. It consists of a large number of neurons in several layers (input, hidden and output layers). In fact, these neurons are information processing units that have been used for designing the artificial neural network. They are connected by weighted links and bias. The output of each neuron is transferred to the next layer as an input. Finally, the nonlinear basis function set is used to calculate the outputs of ANN, using Eq. 4.

$${y}_{j}=\sum_{i-1}^{n}f\left({w}_{ij}{x}_{i}\right)+{b}_{j}$$
(4)

where xi and yi are input and output variables, respectively; wij is the synaptic weight of neuron k; f(.) is the activation function and bk is the bias.

3.2 Hybrid neural genetic algorithm

The backpropagation (BP) algorithm is a common method of neural network that adjusts weights communication and biases according to gradient descent to minimize the error between predicted and target value. BP has some defects, including slow converging and trapping into the local minima [20]. To overcome these defects, the genetic algorithm (GA) can be applied to optimize the initial weights and biases.

The genetic algorithm as a search heuristic algorithm is a suitable method for the optimization process [21]. This method has been successfully used in mineral processing systems for parameter optimization and circuit design [22,23,24]. The operators in GA are briefly explained in the following:

  • Generation of an initial population

    The variables to be optimized are considered as a chromosome. In fact, each chromosome has M variables named Gen. So, GA is meeting a problem of N-dimensional optimization. The GA starts optimization with some chromosomes, which is called a population. Population size means that how many chromosomes are in one generation of population. If this size is small, GA has a little search space to be explored; if the size of the population is large, the computational load will become high and it does not make solving the problem faster. Therefore, in this research, the population size was considered 100.

  • Fitness evaluation

    In each generation, a fitting function was applied to evaluate all the chromosomes. Then, the chromosomes with the best value of fitness function are chosen.

  • Reproduction of a new population

    After selecting the best chromosomes, they are considered as parents. Some selection operators include roulette wheel, tournament, ranking, uniform selection operators, etc. Also, some crossover operators include single point, two points, heuristic, arithmetic, scattered crossover, etc. Then, new chromosomes are generated by crossover and mutation operators. A crossover is used to vary the programming of a chromosome from one generation to the next. It is being applied over two individuals chosen by selection operators to produce offsprings from them. The crossover probability of this research was 0.5. It means that offsprings are made from parts of parent’s chromosome. The mutation operator randomly changes the values of each element of chromosomes based on the mutation probability. Mutation probability means that how often parts of chromosome will be muted. The mutation rate is in the range of 0 and 1. In this research, the mutation rate was considered 0.03. There are different mutation operators in real-coded GA, such as uniform, non-uniform, boundary mutation and multi-non-uniform [25]. After applying these operators, the offsprings are generated and their fitness values are calculated.

  • Termination criterion

The previous step, reproduction, is continued until a stopping condition is achieved. Termination criterion can be the number of generations. In this study, the number of generations was considered 100 generations. In this research, GA is used for the optimization of initial weights and biases of NN to improve the performance of it for estimation of process outputs. The GA parameters for this study are summarized in Table 3.

Table 3 GA parameters used in this research

As mentioned in the previous section, GA as a powerful method in evolutionary computation research can be applied for solving a variety of optimization problems. In this study, GA is applied to find the optimal connection weights and biases of ANN. The flowchart of combining GA and ANN is shown in Fig. 2 (right).

Fig. 2
figure 2

Flowchart of (Left) ANN model and (Right) GA-ANN model

As shown in Fig. 2 (left), a feed-forward neural network was used for modeling and prediction. The term backpropagation refers to the process by which derivatives of the network error, with respect to network weights and biases, can be computed. This method is based on the backpropagation error algorithm, which is an iterative supervised learning technique. During each iteration, the error signal travels backward through the network, starting at the output neurons and ending at the input synapses. The neural network learns the relations contained in between the input and the output variables and correlates the variables by the optimal weights that minimize the differences between the estimated and observed output values. The performance criteria were MSE, that is the average squared error between the predicted and the observed outputs. Using this algorithm, the main ANN structure was constructed and trained. Training of backpropagation will be done with random initial weights. One of the important applications of genetic algorithm (GA) is applying this technique to learn the weights of NN. So, after optimization of weights and biases, they were considered for initial weights of NN. After that, the backpropagation learning algorithm was used to train NN. For two models, the error between the predicted and the observed values for each iteration is propagated backward from the output layer toward the input layer through the hidden layers. This work continued until the predicted and the target values are in good agreement (the convergence criteria are met).

3.3 Model evaluation criteria

To evaluate the performance of the developed model for estimation of the separation efficiency and selectivity index of magnetic separation process under different operating conditions, the following statistical measures are employed:

$${R}^{2}= 1-\frac{\sum_{i=1}^{n}{\left({y}_{i}-{x}_{i}\right)}^{2}}{\sum_{i=1}^{n}{\left({y}_{i}-{x}_{i}\right)}^{2}+\sum_{i=1}^{n}{\left({x}_{i}-\stackrel{-}{x}\right)}^{2}}$$
(5)
$$MSE=\frac{1}{n}\sum_{i=1}^{n}{\left({x}_{i}-{y}_{i}\right)}^{2}$$
(6)
$$MAD=\frac{1}{n}\sum_{i=1}^{n}\left|{x}_{i}-{y}_{i}\right|$$
(7)
$$MAPE=\frac{1}{n}\sum_{i=1}^{n}\left|\frac{{x}_{i}-{y}_{i}}{{x}_{i}}\right|*100$$
(8)

where R2, MSE, MAD and MAPE are the coefficient of determination, mean square error, median absolute deviation and mean absolute percentage error, respectively; n is the number of samples; \(\stackrel{-}{x}\) is the mean value of target values; xi and yi are the target and predicted values, respectively.

4 Results and discussion

The best structure of ANN can be determined by changing the function type of training and transfer function, the hidden layer size and the number of neurons in each layer. In fact, the appropriate structure of ANN is found by a trial and error method [26]. In this study, several networks were generated, trained and tested. The ANN model has been developed by considering two hidden layers in the MLP configuration and training using the backpropagation algorithm. During each iteration, the error signal travels backward through the network, starting at the output neurons and ending at the input synapses. The NN trains the relation between the input and the output parameters and correlates the parameters using the connection weights and biases that minimizes the differences between the observed and the predicted values. Due to the convergence speed and the performance of the network to find a better solution, a learning rule was selected by the Levenberg–Marquardt training method. Without momentum, a network may be trapped in a local minimum. Therefore, optimum values of 0.1 and 0.9 were selected for learning rate and momentum, respectively. Using this algorithm, the main ANN structure was constructed and trained. Finally, a feed-forward neural network with the arrangement of 9:18:16:2 was used for modeling. The parameters of developed model are given in Table 4. The coefficient of determination (R2) in the training and the testing stages are shown in Fig. 3. The statistical criteria calculated in the training and testing stages were relatively satisfactory that are presented in Tables 5 and 6.

Table 4 The basic architecture and the training parameters of the ANN
Fig. 3
figure 3

Comparison of predicted and measured values for ANN model. First row: training stage; and second row: testing stage

Table 5 Comparison of statistical measures of ANN and ANN-GA in the training stage
Table 6 Comparison of statistical measures of ANN and ANN-GA in the testing stage

In GA-ANN model, the connection weights and biases of ANN are obtained by GA algorithm to avoid trapping in the local minima. As shown in Fig. 4, the value of fitness function (MSE) decreased after 100 generations. It is noted that the values of mean and best fitness are converged in which the best fitness for this model is 0.0438. The chromosome (connection weights and biases) having the best fitness is saved for BP training.

Fig. 4
figure 4

Convergence of best fitness and mean fitness for this study

After optimizing weights and biases for ANN, they are fed to the network for backpropagation training. The correlation coefficient for GA-ANN model in the training process is shown in Fig. 5. Results indicated that the measured data and predicted data are close to each other in the training stage. After that, the test data are fed to the developed model for its validation. Figure 6 shows the comparison of predicted and measured values for ANN-GA model in testing stage.

Fig. 5
figure 5

Comparison of predicted and measured values for ANN-GA model in training stage

Fig. 6
figure 6

Comparison of predicted and measured values for ANN-GA model in testing stage

The statistical criteria for evaluating the training and testing stages are given in Tables 5 and 6. Results of modeling with ANN technique indicated that the quite mean square error and coefficient of determination for the testing phase were achieved 0.635 and 0.86 for selectivity index and of 4.646 and 0.84 for separation efficiency, respectively. To improve the performance of neural network, genetic algorithm was used to optimize the weights and biases of neural network. The results of modeling with the GA-ANN technique showed that the values of mean square error and coefficient of determination for the testing phase were obtained 0.276 and 0.95 for selectivity index and of 1.782 and 0.92 for separation efficiency, respectively. The other statistical criteria for the GA-ANN model were better than those of the ANN model. So, the GA-ANN efficiency for the prediction of the process outputs is more than ANN.

5 Conclusions

In this study, ANN and GA-ANN capabilities for predicting the separation efficiency and selectivity index in the industrial magnetic separation process were investigated. The affective separation parameters of hematite, including percentage of Fe, FeO and S in mill feed and cobber feed, 80% passing size in mill feed and cobber feed and plant capacity were considered as input in the model development. The difference between the two models based on the statistical measures is tangible. The results indicated that the proposed model (GA-ANN) could be successfully applied to model the separation process of hematite under different operating conditions with a reasonable error. By optimizing the initial weights and biases, the ANN-GA model outperforms the ANN model significantly.