Introduction

Since the ancient times, honey has been used both as a food and in medicine. It is the natural sweet substance produced by Apis mellifra bees. Honey contains more than 180 constituents, mainly sugars: fructose and glucose as well as other sugars, minerals, vitamins, proteins, free amino acids, enzymes and a wide range of minor ingredients [1, 2]. The amount of each ingredient depends on various factors, e.g., bee species, floral sources, geographical origin, season and climatic conditions [3]. In many scientific reports, it has been proved that the composition of honey can be used for its characterization [4, 5] as well as there is a high degree of correlation between chemical and electrical honey features [6]. Honey is a product having antibiotic, anticancer, anti-oxidative, antibacterial and preserving properties. According to the Community Directive 74/409/EEC (OJEC L 221, 12.8.1974.) not any food ingredient including food additives and any other additions which have been made other than honey can be added to a product being sold as honey. Moreover, honey has to be stored and processed in proper conditions in order not to reduce or deprive its nutritional and therapeutic value. Therefore, the quality control of honey reaching consumers is of great importance and should be carried out at any stage of processing and marketing.

At present, some researchers try to use honey electrical parameters such as conductivity, impedance, permittivity or dielectric loss coefficient for honey assessment regarding its floral or geographical origin, quality and potential adulteration. In these approaches, honey electrical features are used independently or jointly with other physicochemical honey parameters [68]. Methods based on the honey electrical features measurements are a valuable alternative for methods applied at present which use the honey chemical parameters measurements. These methods are expensive, are time-consuming, require specialized equipment and therefore are not generally accessible. To use methods based on honey electrical features measurements effectively, the knowledge about the relationships between honey electrical and chemical parameters is required. Due to the significant influence of frequency on complex impedance, the determination of appropriate measurement frequency is also essential.

Artificial neural networks (ANNs) are the group of computational models based on human brain behavior. ANNs consist of simple processing elements (neurons) linked by weighted connections. The main advantage of ANNs is their capacity for learning only on the basis of a data set. During the learning process, weighted connection values are changed in order to minimize the difference between the value calculated on the network as output for certain input data vectors and expected values. ANNs are widely used in large set of fields [913], especially when the phenomenon investigated is not well known and only an experimental data set describing the phenomenon is available. The most common ANNs applications are regression and classification. For solving regression tasks, the multilayer perceptron (MLP) is mainly used. Shafiee et al. [14] have used MLP for prediction of some chemical honey parameters from color features, Marti et al. [15] for the estimation of stem water potential, Maulidiani et al. [16] for modeling the relationship between the bioactive compounds in Pegaga extract and antioxidant activity and Xi et al. [17] for modeling effects of pressure, liquid/solid ratio and ethanol concentration on the total phenolic content of green tea extracts. For classification tasks, the self-organizing maps (SOM) or MLP networks are commonly used. SOM has been used for creating ambient air quality classifications [18], for classification of a plant disease using visual symptoms [19] and for sediment samples classification according to similar chemical characteristics [20]. Cajka et al. [21] have used MLP for honey classification according to its geographical origin and Anjos et al. [22] according to its botanical origin, Boniecki et al. [23] have got very good results in identification of apple orchard pests using MLP model for digital image analysis, and Silvestre and Ling [24] have used some new pruning methods for MLP in order to solve the E. coli unbalanced classification problem. If a neural network is used only for solving a classification or regression task, it is considered as a “black box” model. However, the model obtained by using ANN is often one of better quality than that obtained by using other methods. Therefore, the extraction of additional information from the model can be very useful. This kind of information is of relative importance to each input variable. Therefore, several methods to determine the contribution of each independent input variable have been proposed by researchers. There are a group of methods called sensitivity analysis (SA). SA methods basically change values of input parameters and determine the relative importance of each variable on the basis of an output value change. Methods from SA group can be used on more than models based on ANNs. Besides SA methods, a group of specific ANN methods have been developed also. In the prior literature, one can find many reports where some methods are used for the determination of the contribution of independent variables. In plenty of them, the results from different methods are compared. Nourani et al. [25] used four different methods to investigate the effect of each input parameter on the output in an ANN model of the evaporation process at different climatologic regimes. The compare analysis of variable contribution determination methods has been carried out by de Ona and Garrido [26] based on the model of the Granada Area Transport Consortium customer service quality. Paliwal and Kumar [27] have proposed the new approach to interpreting the relative importance of independent variables in neural networks based on connection weights values and have compared it with the other connection weights method. The data analysis has been performed for various data characteristics.

Some reports relating to electrical parameters used for honey quality assessment can be found in the prior literature recently. The fact that there is a strong relationship between honey chemical and electrical features is well known. Likewise, relationships between honey quality and its chemical parameters are well defined. Therefore, many methods based on chemical parameters measurement have been developed. The HMF content is used as a parameter of the honey freshness [28] and affects honey darkening. Dark-colored honeys have a high therapeutic potential [29]. The pH is considered as an index of possible microbial contamination [30], and diastase activity is used for honey freshness assessment and for overheating detection [31, 32]. Water activity is a crucial factor which determines the growth of microorganisms in honey [33, 34]. The other parameters that affect honey electrical features are, e.g., temperature and crystallization. However, the influence of honey chemical parameters in combination with temperature or crystallization on electrical features has not been investigated so far. The information which parameters affect the electrical features the most can be the base for conclusion what type of honey quality deterioration can be potentially detected by the use of certain electrical parameter. The use of frequency-related electrical parameters, e.g., impedance for food quality assessment, is a fairly recent innovation. These parameters can be more useful than, e.g., electrical resistance or conductivity. However, the development of methodology based on frequency-related electrical parameters must include the choice of proper frequency. Therefore, beside investigation related to the influence of certain chemical parameters on impedance, also the frequency impact must be analyzed.

The aim of this research is to determine the level of the influence of honey chemical parameters such as glucose content/fructose content ratio, water activity, HMF content, pH, diastase as well as the temperature on complex impedance. Considering impedance dependency on frequency, the investigation has been performed for three selected frequency values. This approach has enabled the verification of the contribution of input variables depending on the frequency.

Materials and methods

Honey samples

A total of 50 Polish honey samples that have been harvested in the year 2011 have been used for this study. Thirty nine samples of nectar honeys (acacia—Robinia L., rape—Brassica napus L., phacelia—Phacelia Juss., goldenrod—Solidago L., buckwheat—Fagopyrum esculentum, heather—Calluna vulgaris (L.) Hull, willow—Salix L. and multiflower), four samples of nectar–honeydew honeys and seven samples of honeydew honeys (conifers and deciduous) have been collected. For the verification of honey types, a pollen analysis was accomplished in an accredited laboratory. The method used for pollen analysis confirmed to the Polish Standard (PN-88/A-77626, 1998 based on Louveaux et al. [35]). Additionally, the classification of honey sample to nectar–honeydew or honeydew honey group has been based on the value of 20 % honey aqueous solution conductivity. For each sample, the following chemical parameters have been measured in accredited laboratory: glucose content/fructose content ratio (%), HMF (hydroxymethylfurfural) content (mg/kg), pH (–) and diastase (–). For testing these parameters, the methods compiled by the International Honey Commission [36] have been used. The methods for measurement of chromatographic sugars content and HMF content were modified according to the conditions of the Bee Products Quality Testing Laboratory [3739]. Chemical characteristics of each honey group are presented in Table 1.

Table 1 Chemical properties (maximum, minimum, mean and standard deviation) of honey samples

For the water activity of liquid honey (–) measurement, the LabSwift-aw instrument has been used. Afterward, each sample’s complex impedance has been measured by means of ATLAS 0441 HIA apparatus with an electrode installed in a climate chamber. Measurements of impedance have been taken in five temperatures: 20, 25, 30, 35 and 40 °C and for frequency in a range from 10 Hz to 1 MHz. As the result of measurements, the data set containing 250 vectors for each frequency was obtained. The group of six vectors was excluded from data set because of very high impedance values caused probably by disruptions during measurement process. For the analysis, the three frequencies have been chosen on the basis of authors’ experiences: 260 Hz, 11 kHz and 160 kHz.

Artificial neural networks and the model

ANNs are widely used as a tool for modeling the complex, multidimensional and highly nonlinear interrelations. They are composed of artificial neurons typically organized in layers and linked by weighted connections. During the learning process, the weights of artificial neurons are adjusted in order to obtain values of output signals expected for specific inputs. The theory of ANNs has been described in several papers [4042].

A MLP with one hidden layer as a network topology has been chosen in this work. Simulations have been executed using a Statistica v. 10 environment. Six independent neural models have been developed. In each model, there were six input nodes and one neuron in the output layer corresponding to the real or imaginary part of the impedance measured for a certain frequency. The following parameters have been in input data set:

  • glucose content/fructose content ratio (GF),

  • water activity of liquid honey (WA),

  • HMF content (HMF),

  • pH,

  • diastase (D),

  • temperature (T).

The structure of MLP network is presented in Fig. 1.

Fig. 1
figure 1

Structure of the MLP network

Experimental data (244 data vectors for each model) have been divided into learning (80 % of data vectors) and testing (20 % of data vectors) sets. All training and testing vectors have been a subject of a scaling procedure into a new range of <0.1–1>. In the hidden output layer, neurons with sigmoidal activation function have been implemented.

For each model, 200 various network configurations have been examined by changing the number of neurons in the hidden layer and training algorithm. Initial weighted values have been selected randomly. The determination of the variables contribution has been executed for the best network configurations. The model quality has been assessed on the basis of a correlation index between values calculated by the model and the expected values for the training and testing data sets.

Methods for determining the contribution of variables

In this work, three methods of extracting the contribution of independent variables have been used. The first method is from the SA family, and the other two methods can be applied only to the MLP networks and require the knowledge of the connection weights matrix.

Sensitivity analysis

Global SA has been implemented in a Statistica v. 10 environment which gives information about the relative importance of the variables by replacing each variable with its mean value calculated from the training data set. The ratio of the network error with a certain input changed to the network error with the input with original value is calculated. The higher the ratio value, the more important the input parameter [43]. The errors ratio values can be used for calculating the percentage influence of input parameters. A similar SA method has been employed by Hadzima-Nyarko et al. [44] in the modeling of the damage ratio coefficient and by Pastor-Bárcenas et al. [45] in surface ozone modeling.

Partial derivatives method (PaD method)

This method uses all training data and requires the knowledge of connection weights and the biases matrix [46]. The output signal calculated by neurons of the hidden layer with sigmoidal activation function is given by the following equations:

$$h_{k} = \frac{1}{{1 + e^{{ - o_{k} }} }}$$
(1)
$$o_{k} = \mathop \sum \limits_{i} w_{ik} \cdot x_{i} - b_{k}$$
(2)

where h k is the output signal of the hidden neuron k, x i is the input value from the input node i, w ik is the connection weight between input node i and hidden neuron k and b k is the bias of hidden neuron k.

The output signal calculated by neurons of the output layer with sigmoidal activation function is given by the following equations:

$$y_{j} = \frac{1}{{1 + e^{{ - o_{j} }} }}$$
(3)
$$o_{j} = \mathop \sum \limits_{j} w_{kj} \cdot h_{k} - b_{j}$$
(4)

where y j is the output signal of the output neuron j, w kj is the connection weight between hidden neuron k and output neuron j and b j is the bias of output neuron j.

The equation that defines the variation in the y i variable with respect to the variation in x i variable is as follows:

$$\frac{{{\text{d}}y_{j} }}{{{\text{d}}x_{i} }} = \mathop \sum \limits_{k} \frac{{{\text{d}}y_{j} }}{{{\text{d}}o_{j} }} \cdot \frac{{{\text{d}}o_{j} }}{{{\text{d}}h_{k} }} \cdot \frac{{{\text{d}}h_{k} }}{{{\text{d}}o_{k} }} \cdot \frac{{{\text{d}}o_{k} }}{{{\text{d}}x_{i} }}$$
(5)

The activation functions of neurons in hidden and output layers are sigmoidal functions. The derivative of the sigmoidal function f is as follows:

$$f^{\prime} = f\left( {1 - f} \right)$$
(6)

Combining (5) and (6):

$$\frac{{{\text{d}}y_{j} }}{{{\text{d}}x_{i} }} = \mathop \sum \limits_{k} y_{j} \cdot \left( {1 - y_{j} } \right) \cdot w_{kj} \cdot h_{k} \cdot \left( {1 - h_{k} } \right) \cdot w_{ik}$$
(7)

The relative contribution of each input variable on a specific output can be calculated as follows [47]:

$$L_{i} = \mathop \sum \limits_{N} \left( {\frac{{{\text{d}}y_{j}^{N} }}{{{\text{d}}x_{i}^{N} }}} \right)^{2}$$
(8)

where N is the number of training patterns.

The percentage influence of ith input parameter is given by the equation:

$$Lp_{i} = \frac{{L_{i} }}{{\mathop \sum \nolimits_{i} L_{i} }}$$
(9)

Connection weights method

In this method, the product of connection weights between input nodes and neurons in a hidden layer as well as the connection weights between neurons in a hidden layer and neurons in output layer is calculated. Various equations have been proposed for the weights’ product determination. The proposition by Garson [48] was employed by some authors. The percentage influence of the input variable x i on the output y k for the network with N input nodes and L neurons in a hidden layer is represented by the calculation of Q ik calculated as follows:

$$Q_{ik} = \frac{{\mathop \sum \nolimits_{j = 1}^{L} \left( {\frac{{w_{ij} }}{{\mathop \sum \nolimits_{r = 1}^{N} w_{rj} }} \cdot w_{jk} } \right)}}{{\mathop \sum \nolimits_{i = 1}^{N} \left( {\mathop \sum \nolimits_{j = 1}^{L} \left( {\frac{{w_{ij} }}{{\mathop \sum \nolimits_{r = 1}^{N} w_{rj} }} \cdot w_{jk} } \right)} \right)}}$$
(10)

where w rj is the connection weight between input node r and neuron j in hidden layer and w jk is the connection weight between neuron j in hidden layer and neuron k in output layer. In this work, the absolute values of connection weights were used, the same as in work of Gevrey et al. [49].

Results and discussion

Methods of extracting the relative contribution of input variables can be ineffective when inputs are interdependent [50, 51]. Therefore, before building the ANN-based prediction models, a Pearson’s correlation coefficients between the explanatory variables have been calculated. The results are presented in Table 2.

Table 2 Correlation coefficients between explanatory variables

The data presented in Table 2 show that the correlation coefficients between input model parameters are of very low values. The highest correlation coefficient is observed between pH and HMF content and its value equals −0.45.

In Table 3, the architectures of neural networks used for determination of the relative contribution of input variables as well as values of parameters used for model quality assessment are given. The model quality assessment has been based on Pearson’s correlation coefficients between output values expected and calculated by a model for training and testing data sets.

Table 3 Parameters of neural networks used for extracting the relative contribution of input variables

The data presented in Table 3 show that the correlation coefficient of high value exceeding 0.9 (for both—training and testing data sets) has been obtained for neural models. In the case of the model describing the relationship between the real part of impedance for frequency of 160 kHz and the input parameters, the correlation coefficients are slightly lower and equal 0.87 for both the training and testing data sets. Only in the case of the model describing the relationship between the imaginary part of impedance for frequency of 260 Hz and input parameters, the correlation coefficient for the training data set is significantly lower (0.63). However, the correlation coefficient for the testing data set is very high (0.98). The quality of neural models obtained after the training process may be considered to be high enough for the determination of the relative contribution of input variables.

In Figs. 2, 3, 4, 5, 6 and 7, the percentage influence of each input parameter on real and imaginary part of impedance measured for selected frequencies is presented. The influence has been determined by three methods described in this work.

Fig. 2
figure 2

Contribution of variables used in ANN model for the real part of impedance measured for frequency 260 Hz

Fig. 3
figure 3

Contribution of variables used in ANN model for the real part of impedance measured for frequency 11 kHz

Fig. 4
figure 4

Contribution of variables used in ANN model for the real part of impedance measured for frequency 160 kHz

Fig. 5
figure 5

Contribution of variables used in the ANN model for the imaginary part of impedance measured for frequency 260 Hz

Fig. 6
figure 6

Contribution of variables used in the ANN model for the imaginary part of impedance measured for frequency 11 kHz

Fig. 7
figure 7

Contribution of variables used in the ANN model for the imaginary part of impedance measured for frequency 260 kHz

The data presented in Fig. 2 show that the HMF content and pH are of the highest influence on the real part of impedance measured for frequency 260 Hz (about 30 % for each of them). The influence of other parameters is significantly lower. The comparable results have been obtained for three methods of the relative contribution of input variables determination used in this work.

The data presented in Figs. 3 and 4 show that the percentage influence of input variables on the real part of impedance is generally constant when the frequency is increasing. Only a slight increase in the glucose content/fructose content ratio influence is observed. In each case, the results obtained using different methods are similar.

The high influence of HMF content and pH on real part of impedance suggests that this electrical parameter can be potentially used for honey freshness assessment as well as for detection of changes caused by microorganisms growth.

The data presented in Fig. 5 show that the influence of certain input parameters on the imaginary part of impedance measured for a frequency of 260 Hz is similar to those achieved in the case of the real part of impedance. The HMF content and pH are of the highest influence on the output model parameter. The glucose content/fructose content ratio is slightly less important. In this case, the significant difference between results calculated using the PaD method and the connection weights method, as well as the SA method, is observed. Admittedly, all those methods proved that the HMF content affects the imaginary part of impedance the most, but the results calculated by the use of the PaD method are significantly higher (about 70 %) than the one calculated by the use of the other two methods.

The influence of water activity, diastase and the glucose content/fructose content ratio on the imaginary part of impedance increases when the frequency increases. The data presented in Figs. 6 and 7 show that the results obtained using different methods are comparable.

According to the results shown in Figs. 5, 6 and 7, the imaginary part of impedance measured for low frequency can be potentially used for honey freshness assessment as well as for detection of changes caused by microorganisms growth. When impedance is measured for frequency values of several hundred kHz, the imaginary part can be additionally used for overheating detection.

The results presented in this work show that the percentage influence of input variables on honey impedance calculated by the use of three different methods is comparable. According to the reports published by other authors, the results’ similarity is not the rule. Even if the results obtained using the selected methods are comparable, the percentage influence of input variables on output variables is different for each method [25, 26, 52]. The phenomenon described above can be observed also in the results presented in this work. In the prior literature, one can find many reports where the results of variable contribution calculated using different methods are of the significant differentiation. Shojaeefard et al. [47] have published the report where there was a similarity between results obtained using the PaD method and the profile method, as well as the classical stepwise method, but the results obtained using the connection weights method were significantly different. Gevrey et al. [49] employed several methods of testing the variables contribution to study a brown trout reproduction phenomenon. The results obtained by authors show that more than one method should be used to analyze the contribution of the inputs, and results should be compared because for each method they are not always the same.

Conclusions

In this research, the ANNs have been used for determination of independent variables contribution. The neural models quality has been assessed on the basis of the correlation between output values expected and calculated by the model. It can be generally considered as high (correlation coefficient has generally exceeded 0.9). The high correlation coefficient values have been obtained for both training and testing data sets. The three different methods of the determination of independent variables contribution in a neural network model have been employed. The results show that HMF content and pH are of the highest influence on the real part of impedance that is constant with the frequency changes. It can suggest the ability of the use of real part of impedance for honey freshness assessment or for microorganisms growth detection. For the frequency of 260 Hz, it has been shown that HMF content and pH affect the imaginary part of impedance the most. However, for higher frequencies, the increase in the influence of water activity, diastase and glucose content/fructose content ratio has been observed. Therefore, the imaginary part of impedance measured for higher frequencies can be useful parameter for honey overheating detection. The temperature has the low influence on impedance; therefore, it is not obligatory to take this parameter into account during measurement process. The results calculated by the use of three methods of extracting the contribution of input variables described in this work have been generally comparable. However, there were some differentiations in the percentage influence of each input parameter on electrical honey features obtained from each method which is similar to the results reported by other researchers.

It can be concluded that complex impedance can be potentially used for honey quality deterioration detection. The investigation related to other electrical parameters seems to be relevant. The results should be also confirmed by tests with the use of honey which quality has been deteriorated by various methods.