Optimisation of ANN topology for predicting the rehydrated apple cubes colour change using RSM and GA

In this study, an efficient optimisation method by combining response surface methodology (RSM) and genetic algorithm (GA) is introduced to find the optimal topology of artificial neural networks (ANNs) for predicting colour changes in rehydrated apple cubes. A multi-layered feed-forward backpropagation ANN model of algorithms was developed to correlate one output (colour change) to four input variables (drying air temperature, drying air velocity, temperature of distilled water and rehydration time). A predictive model for ANN topology in terms of the best mean squared error (MSE) performance on validation samples was created using RSM. RSM model was integrated with an effective GA to find the optimum topology of ANN. The optimum ANN had minimum MSE when the number of hidden neurons, learning rate, momentum constant, number of epochs and number of training runs were 13, 0.33, 0.89, 3869 and 3, respectively. MSE of optimal ANN topology on validation samples was 0.0072095. It turned out that the optimal ANN topology can be considered as more precise for predicting colour change in the rehydrated apple cubes. Mean absolute error and regression coefficient (R) of the optimal ANN topology were determined as 0.0259 and 0.96475 for training, 0.0399 and 0.95243 for testing and 0.0264 and 0.95151 for validation data sets. The results of the testing model on new samples showed excellent agreement between the actual and predicted data with coefficient of determination R 2 = 0.97.


Introduction
Rehydration is a complicated process aimed at the reinstatement, by way of the contact with water, of the features the dried material had prior to its pretreatment preceding the drying and the drying itself. The following three processes take place in the course of the rehydration: water absorption by the tissues of the dried material, thanks to which it increases its mass and volume as well as the leaching of water-soluble substances such as sugars, acids, vitamins and minerals from the rehydrated material [1,2]. The progress of the discussed processes depends on the features of the raw material and conditions in which the drying with preceding pretreatments takes place [3]. Because of that, the progress of the rehydration process reflects the changes that took place in the raw material tissue as a result of the following processes: drying and pretreatment and also rehydration [4]. Such changes are the reason why the dried product fails to attain the features of the raw material after its rehydration, which shows irreversibility of the drying process [5].
Many studies of the rehydration process focused on the determination of rehydration indicators defining the reinstatement value of the dried material and on the process description using empirical formulae [6,7]. There are also works oriented on the optimisation of parameters in selected technology of the thermal treatment [8,9] and on the studying of changes in the tissue structure [10,11], the chemical content [12] and colour [13,14].
In the process of drying and rehydration, complex and highly nonlinear phenomena take place [15]. Therefore, it is difficult to estimate relationships between the input and the output of this complex system on the basis of mathematical approaches.
Intelligent techniques such as artificial neural networks show a high learning ability and capability of identifying mentioned systems [16]. The ability of neural networks to learn from repeated exposure to system characteristics has made them a popular choice for many applications, including drying technology [17,18].
In the literature, several papers are related to modelling the heat and mass transfer kinetics [19], drying characteristics [20,21], approximating the moisture content [22] and quality of apple tissue [23]. ANN models were also developed for predicting some physicochemical properties of apple tissue during hot air drying in thin layer [24]. Recently, several authors applied neural network for modelling heat and mass transfer during rehydration process [25,26]. A simple ANN model was used to predict the shrinkage and estimate the rehydration capacity of the dried cooked rice [27] and dehydrated carrots [28]. A comprehensive review study devoted to application of ANNs in drying technology was made in paper [29].
Determination of the best ANN topology for estimation of colour change is usually put through by trial and error procedure that is very time-consuming [30]. Optimisation of neural networks parameters requests a large number of different topologies have to be constructed, trained and tested. However, there is no general rule used in selecting the value of variables in ANN. Also, it is dependent on the complexity of the system that is modelled.
Genetic algorithm is a biologically inspired optimisation technique [36]. Recently, GA has gained popularity as a robust optimisation tool for multi-modal nonlinear problems. Sometimes GAs can exploit ANN models as their fitness function. In food industry, ANNs and GA system were used to control the fruit storage process. The authors [37,38] indicated the need to apply a hybrid system for selection of input parameters (temperature, density) and output parameters (colour, mass loss, hardness) to improve the quality of the stored fruit. The authors noticed that the complex system of ANNs and GAs is superior to traditional computational techniques used in problems related to agriculture. ANN and GA system was successfully used for optimising thermal conditions for conduction-heated foods [39,40]. Recently, ANN and GA approaches in drying technology have been described in papers [41][42][43].
It can be concluded from the literature review that the coupling of these two methods has many benefits for finding the global optimum neural networks topology and improving the model performance [33,34].
The objective of this work is to optimise the neural network topology to predict the colour change in the rehydrated apple cubes using integrated RSM and GA methods.

Material
High-quality Ligol apples were bought from the local market. They were washed in water, cut into cubes, with dimensions of 10 9 10 9 10 mm and were dried on the same day. The initial moisture content of a sample amounted to ca. 85% w.b. (5.66 d.b.).

Drying equipments and experiments
The drying experiments were carried out in the dryer constructed in our laboratory. The details of dryer equipment and conducting the drying process can be found in a paper [44]. The laboratory dryer was run about 1 h, and when the steady conditions were achieved the samples were placed on a tray. The drying process lasted until the mass of the sample became constant. The drying experiments were performed at three levels of drying air temperatures of 50, 60 and 70°C, together with two levels of air flow velocities-0.5 and 2 m/s. The final moisture content of dried apples amounted to ca. 9% w.b. (0.098 d.b.). Dry matter of the solid was determined according to AOAC standards [45]. The dried material obtained in the given conditions from three independent experiments was mixed and stored in a tightly sealed container for about one week at 20°C; after that, samples were taken for further studies. The container in which the dried material was stored was placed in a cupboard, so the dried apples were not exposed to the sunlight.

Rehydration procedure
The apple cubes were immersed in distilled water at four levels of temperatures-20, 45, 70 and 95°C, using a water bath ELP 12 (LABOPLAY, Bytom, Poland). The initial mass of each dried sample subjected to rehydration was 10 g, and dried sample mass-to-medium mass ratio at the beginning of rehydration was 1:20. The rehydration lasted from 6 h (20°C) to 2 h (95°C). After removal from the water, the sample was dried on a filter paper. The medium was not stirred during the rehydration process, and its temperature was constant.

Colour determination
Colour images of fresh and rehydrated apple were acquired using a flatbed scanner (Canon CanoScan 5600F). The device was equipped with 6-line colour CCD sensor, fluorescent lamp and the 48-bit input/output interface (16 bits for each RGB channel). Images of resolution of 300 dpi were acquired to sRGB colour space and then saved in BMP format as matrixes with dimensions of 2552 9 3508 pixels. During the scanning process, all tools for an automatic image enhancement were disabled. Apple cubes were randomly positioned on the scanner platen. For fresh apple and each type of dehydrated cubes (various drying conditions: drying temperature, drying air velocity, and various rehydration temperatures and times) 30 images were acquired. The images were then transformed to CIEXYZ colour space [45,46]. Nonlinear transformation of CIEXYZ to CIEL*a*b* coordinates was done relative to illuminant D50 and observer 10°according to CIE standard using 94.811, 100, 107.32 values as reference whites for X, Y and Z coordinates, respectively [47]. Chroma (C*) and hue (h*) of CIEL*C*h*colour space were calculated according to Schanda [48]. Original digital image of raw apple cubes and preprocessed image of apple cubes extracted from the image background is shown in Fig. 1.

Quality of rehydrated product
It was also assumed that quality of the rehydrated product is defined by means of its colour change. The colour of a food product can be considered as a very important quality factor because it plays a decisive role in consumer's acceptability. The colour change (C ch ) was calculated according to the formula, given by [49] where S L , S C , S H denote the weighing functions, adjusting the internal non-uniform structure of the CIEL * a * b * and may be obtained using Eqs. (2-4) The parameters K L , K C , K H express the variation from the reference conditions. The discussed parameters are equal to 1 in reference conditions [50]. Parameters DL * , DC * , DH * denote difference between the tested sample ( T ) and the standard ( S ) in terms of luminance, chroma and 4 Neural networks

Design of ANN architecture
A MLFF backpropagation (BP) neural model was developed (Fig. 2) for predicting the colour change in apple cubes during drying and rehydration processes. The following variables considered as the input parameters for the model were taken: drying air temperature, drying air velocity, temperature of distilled water, rehydration time. The network output variable included colour change. Since one output variable (colour change) was dependent on four exogenous input variables, one neuron was taken for the output whereas four neurons for the input layers. It was reported in earlier works [32,33] that a network with one hidden layer and hyperbolic tangent sigmoid (tansig) function is commonly used for forecasting in practice [51,52]. So, a single hidden layer network was used in this study. Therefore, for optimisation one hidden layer with tansig transfer function was considered. The tansig function was determined using Eq. (8) Moreover, a linear (pureline) function for output was selected for simulation process.

Data preprocessing
In order to produce the most efficient training, the data before training should be normalised. It is also helpful to analyse the network response after having completed the training. Therefore, to achieve this, 189 cases were chosen from our experiments. Chosen cases were randomly divided into the following sets: for training 133 samples (consisted of &70% cases), for validation 28 samples (&15%) and for testing 28 samples (&15%). The second data set was applied for evaluating the performance of the network in the process of training, while the third one was used for estimation of the predictive ability of the model which has been developed [51]. The data were normalised between 0.1 and 0.9 in the following way [53] x normalized value ¼ 0:1 þ 0:8 y actual value À y minimum value y maximum value À y minimum value

Training methods
In the MLP networks, MSE can be obtained by various methods, including Levenberg-Marquardt (LM), gradient descent (GD) and conjugate gradient (CG). MLPs are as a rule trained using error backpropagation (BP) algorithm. It is a general method for iterative solution for weights and biases. BP uses GD technique. This technique is very slow at a small learning rate, but its convergence properties are slow. Different methods concentrated on speeding up BPs have been applied for an instance momentum term, variable learning rate. Finally, gradient descent momentum (GDM) algorithm has been chosen for training the networks. It avoids local minima, speeds up learning and stabilises convergence [33,34]. Moreover, GDM allows a network to respond to the local gradient and to ignore small features in the error surface [51].

Training parameters
In our training process, number of neurons in the hidden layer, number of epochs, learning rate and momentum coefficient are parameters that can affect the network simulation efficiency. However, GDM mainly depends on two training parameters: learning rate (lr) and momentum constant (mc). The first parameter determines the time indispensable for finding the minimum in the weight space. Too high lr leads to an increase in the magnitude of the oscillations for the MSE. Too small lr causes smaller steps taken in the weight space. Moreover, in this case the learning becomes slower and the capability of the network to escape from the local minima in the error surface becomes lower. The mc defines the amount of momentum. A mc of 1 result in a network that is totally insensitive to the local gradient. Consequently, mc does not learn properly. Too high mc causes diverging of the adaptation and gives unusable weights. Too small mc is responsible for a long learning time [32]. Number of neurons in the hidden layer is decisive for network performance. The small number of hidden neurons causes that the ANN is disable for adapting to a being modelled process, whereas when the number of hidden neurons is too big the system memorise errors [51]. Moreover, too many neurons do not propagate errors back efficiently [33] and therefore worsen the ability of the neural network to learn. Similar problems can be encountered while selecting the number of epochs. Too small number of epochs limits the ability of the network to process modelling. Too many epochs can lead to an overtraining of network and to an increasing of errors.
Therefore, figuring out the optimum values of affecting parameters for ANN is an important task and appropriate ranges should be chosen. The following numerical variables were chosen for ANN optimisation number of neurons in the hidden layer, lr, mc, epoch number and number of training runs. The responses sought were faulty on the best validation performance. BP uses a GD technique. Its stability depends on lr. Small lr leads to very stable GD. In the MATLAB 7.0 software, the defaults value for lr and mc equal 0.01 and 0.9, respectively. Accordingly, we changed lr from 0.01 to 0.4 and mc between 0.1 and 0.9. Similarly, the number of neurons in the hidden layer (2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16), training epoch (300-5000) and number of training runs (3-7) were chosen. The range of input variables in ANN model is shown in Table 1.

Performance evaluation
After having found the optimal ANN topology, measuring its performance is the next step. The performance of the designed ANN was estimated on the basis of coefficient of determination (R 2 ), mean square error (MSE) and mean absolute error (MAE) [33]. Discussed parameters were determined using Eqs. (10-12) where x pi is the network (predicted) output derived from observation i, x di is an experimental (actual) output derived from observation i, x is the average value of an experimental output, and N is the number of data. MSE informs of the differences between the value implied by the estimator and estimated quantity. The value of MSE close to 0 indicates that the network can be considered as a satisfactory one. R 2 informs of the correctness of model fitting. If R 2 = 1, the regression line fits the data excellent. MAE shows how close the predictions are to the final outcomes. The value of MAE close to 0 indicates that the error of our model decreases. Figure 3 provides a schematic diagram of simulation system. The proposed hybrid RSM-ANN-GA system is described briefly below. This system includes the following steps:

Hybrid intelligent system
Step 1 Collection of the data set Step 2 RSM designs the experiment and builds a fitness function Step 3 GA optimises ANN architecture Step 4 ANN predicts the colour change The algorithm proceeded with its iterations until a specified performance criterion became satisfactory. More details related to applied RSM and GA algorithm are described in Sects. 5.1 and 5.2.

Response surface method
The optimum architecture of ANN was determined on the basis of the runs designed by response surface  Table 2) consisting of 50 set of conditions and comprising a full replication of five-factor factorial design of 32 points, 10 star points and 8 centre points was used. The upper and lower limits of the parameters were coded as ?1 and -1, respectively. Fifty different patterns of proposed ANNs (Table 2) were designed and trained to model the best MSE performance on validation data set using the Design-Expert (DOE) software. Finding a suitable approximation for the true efficient relationship between the response and the set of independent variables makes the first stage in discussed methodology [54,55]. The response variables were then transformed to natural logarithm function. It makes the distribution of the response variable closer to the normal distribution and improves the model fitting to the data. The experimental results of the CCD were fitted with a secondorder polynomial equation using a multiple regression technique. Equation (13) represents the quadratic model for predicting the optimal point where Y is the response (MSE), b 0 ; b i ; b ii ; and b ij are regression coefficients (intercept, linear, quadratic and interaction, respectively), x i and x j are the independent variables, k is the number of levels, and e ij is an error observed in the response.

Genetic algorithm
GAs use the evolutionary principle of survival characteristics for the best adapted chromosomes [56]. A group of chromosomes is called a population. Each population of chromosomes has the same size which is referred to as population size.
According to the researchers [32,33,57], a suitable population size numbers about 20-30 chromosomes. However, sometimes a population size with 50-80 has lead to the best answers [58,59]. With a large population size, the GA searches the solution space more thoroughly, thereby reducing the chance that the algorithm returns a local minimum that is not a global minimum. In addition, a large population size causes the algorithm to run more slowly [58,60].
The main data structures in GA toolbox are chromosomes, objective function values and fitness values. The chromosome data structure stores an entire population in a single matrix of size Nind 9 Lind, where Nind is the number of individuals in the population and Lind is the length of the genotypic representation of these individuals. Each of the rows correspond to an individual's genotype, consisting of base-n, typically binary, values Chrome ¼ g 1:1 g 1:2 g 1:3 ... g 1:Lind g 2:1 g 2:2 g 2:3 ... g 2:Lind g 3:1 g 3:2 g 3:3 ... g 3:Lind : : : ... : g Nind:1 g Nind:2 g Nind:3 ... g Nind:Lind Such a data representation does not force the chromosome structure, requiring only all chromosomes to be of equal length. Thus, structured populations or populations with varying genotypic bases may be used in the GA toolbox provided that a suitable decoding function, mapping chromosomes onto phenotypes, is employed.
The decision variables (phenotypes) in the GA are obtained by applying some mapping from the chromosome representation into the decision variable space. Here, each of the strings contained in the chromosome structure decodes to a row vector of order Nvar, according to the number of dimensions in the search space and corresponding to the decision variable vector value. The decision variables are stored in a numerical matrix of size Nind 9 Nvar. Again, each of the rows corresponds to a particular individual's phenotype. An example of the phenotype data structure is given below, where bin2real is used for representation of an arbitrary function, possibly from the GA Toolbox, mapping the genotypes onto the phenotypes.
Phen ¼ bin2real Chrom ð Þ% map genotype to phenotype Phen ¼ The general steps of a genetic algorithm are presented in Fig. 4. This algorithm encodes a possible solution to a particular problem on a simple chromosome string and applies specified operators to a chromosome for preserving critical information and producing a new set of population with the aim to generate strings which map to high function values [36]. The main GA operators are selection, crossover and mutation (see Table 3). Roulette as selection method was used in the study. Roulette simulates a roulette wheel with the area of each segment proportional to its expectation. GA then uses a random number to select one of the sections with a probability equal to its area. The next main operator is crossover. Crossover combines two chromosomes, or child, for the next generation. A single point as a crossover function was applied in the study.
Single point chooses a random integer n between 1 and a number of variables, selects the vector entries numbered less than or equal to n from the first parent, selects genes numbered greater than n from the second parent, and concatenates these entries to form the child.
For example, if p1 and p2 are the parents The next parameter of GA is mutation. This operator makes small random changes in the individuals of the population, which provide genetic diversity and enable GA to search a broader space. Uniform as the mutation function was applied in the simulation process. In this case, GA selects a fraction of the vector entries of chromosomes for mutation, where each entry has the same probability as the mutation rate of being mutated. Next, the algorithm replaces each selected entry by a random number selected uniformly from the range of that entry.
The simulation process proceeds by 58 s. The computer simulations were conducted using the computer with the following specifications: Intel Core i5-2400s, processor 2.50 GHz speed, 6 GB memory and commercially available ANN software, MATLAB 7.0 [56].

Statistical test results
The MSE results on validation data set are given in Table 2. In this study, the cubic model (CM) was chosen to give the correlation between neural network effective factors and the response of ln(MSE). Moreover, mentioned model was selected due to high amount of R and non-    The results of statistical test show that the first-order effect of neurons number was the most significant term in estimation of ln(MSE). It was followed by a training epoch and lr, respectively, whereas mc and number of training runs had no significant effect on the responses. Similar results in case of lr and training epoch were reported by [33,34].

Mathematical model results
Finally, a MCM term of coded value is: where x 1 ; x 2 ; x 3 ; x 4; and x 5 parameters are defined in Table 1. This model was checked hierarchically. The above statistical estimators indicate an adequate neural model with optimal structure that can be used for prediction of colour change in rehydrated apple cubes. Figure 5 shows the effect of normal percent probability on the internally studentised residuals. As can be seen from a residual plot, the linear function very well approximates the results. Moreover, the residual scatters randomly on the display (Fig. 6), suggesting that the variance of the original observation is constant for all responses. Therefore, it can be concluded from these plots that the empirical model is suitable for describing relationships between design variables described by RSM. Figure 7 shows the response surfaces (RSs) and contour plots (CPs) obtained by Design-Expert (DOE) software. Each graph represents a combination of two factors at the time and holding all other factors at the middle level. The effect of different values of neurons number and training epoch on ln(MSE) can be predicted from the RSs and CPs as shown in Fig. 7a, b. It is obvious that minimum value of MSE can be found by 3000-4000 epochs and 0.25-0.35 lr. Moreover, this range was observed for epoch number in relation with number of neurons (Fig. 7b). The CPs show that along with an increase in number of neurons from 2 to 16 and lr from 0.01 to 0.4, the ln(MSE) decreases to -5 (see Fig. 7c).

RSM and contour plot results
The response surface plot of momentum constant and learning rate is shown in Fig. 8a. The CP shows that along with an increase in lr from 0.01 to 0.4 and mc from 0.1 to 0.5, the ln(MSE) decreases to -4.615. Figure 8b shows the contour plot, where along with an increase in training epochs from 3000 to 4500 and mc to 0.5, the ln(MSE) decreases to -4.75.

GA optimisation results
The fitness function Eq. (14)   apple cubes. The fitness function was a function minimising the ln(MSE) in experimental ranges presented in Table 1.
As can be noticed in Fig. 9a, the optimisation terminated when maximum number of generations exceeded 2000 iterations. The objective function value ln(MSE) = -5.47257 was obtained for the final points presented in Fig. 9b. Table 5 shows results of optimised ANN parameters. The optimum values were as follows: number of neurons = 13, training epoch = 3869, lr = 0.33, mc = 0.89 and number of training runs = 3.

Errors of model results
Next, ANN with the proposed topology was trained and tested. As can be seen from Fig. 10a, the training stopped when the validation error increased after 1147 iterations. The result is sensible because MSE is very small. As can be seen from graph (Fig. 10a), the test and validation errors have similar characteristics. Furthermore, no significant overfitting has followed [58]. Finally, MSE of optimal ANN topology was equal to 0.0072095 (see Fig. 10a). Figure 10b shows ANN regression plots between outputs and targets samples. The R values in each case are greater than 95%. Therefore, the fit is reasonably good for all data sets. Additionally, MAE and R for colour change in

Validation model results
According to the authors [32][33][34], the trained neural network must have a high predictability for new data. Therefore, 40 data sets of colour change obtained from the new experimental run (drying air temperature = 55°C, drying air velocity = 0.52 m/s, rehydration temperature = 35°C and rehydration time = 35 min) were used for the verification of the developed model. The regression result of the testing model with new samples is shown in Fig. 11. It can be seen that the genetic algorithm has been successfully applied to optimise of neural network topology. Moreover, the optimised ANN topology was efficient for predicting colour change in the rehydrated apple cubes. This system can also be used for optimising the topology of a neural network which describes the other engineering problems.

Conclusions
The following conclusions can be drawn from the investigations conducted in this work: 1. An efficient hybrid intelligent approach was proposed to find the optimal topology of neural networks. 2. The optimal ANN topology was more precise for predicting colour change in the rehydrated apple cubes with a low mean square error (0.0072095) and a high regression coefficient (0.96). runs were equal to 13, 0.33, 0.89, 3869 and 3, respectively. 4. The results of the testing model on new trials showed excellent agreement between the actual and predicted data with a coefficient of determination equal to 0.97. 5. This optimisation method significantly reduces the number of experiments comparing with more expansive learning methods.