Geographical classification of Spanish bottled mineral waters by means of iterative models based on linear discriminant analysis and artificial neural networks
- 74 Downloads
The composition of Spanish natural mineral waters has been determined by means of inductively coupled plasma-mass spectrometry, inductively coupled plasma-atomic emission spectrometry, ionic chromatography and other routine techniques. Methods were applied to samples of bottled water from springs situated in five different mountain systems such as Cordillera Costero-Catalana, Macizo Galaico, Sistemas Béticos, Sistema Central and Sistema Ibérico. Pattern recognition techniques have been applied to differentiate the origin of samples. Data were initially studied by using nonparametric multiple comparison techniques and principal component analysis to highlight data trends. Classification models based on linear discriminant analysis and multilayer perceptron artificial neural networks have been built and validated by means of a stratified jackknifing methodology. An iterative approach has been used to build an artificial neural network model based on the variables selected by linear discriminant analysis. The prediction ability of the constructed model was 94 %.
KeywordsPattern recognition Multivariate analysis Multielemental analysis Geographical characterization Natural mineral water
Water is one of the most important compounds in earth due to its essential role for life. The human consumption of water has varied from ancient times till nowadays. At the beginning, humans drank water directly from sources such as rivers, lakes or wells, but today water is adequately treated before consumption. Potabilization and chlorination of water are perhaps the most important advances in human history and one of the main contributions of chemistry for development of society. Sometimes these processes mean a sacrifice of the organoleptic characteristics of water in order to improve its suitability for human consumption by eliminating small particles, pathogens and chemical contaminants. Nevertheless, people appreciate products with their natural characteristics, and in this way, natural mineral waters are very demanded by consumers.
According to the Directive 2009/54/EC of the European Union , natural mineral water is obtained from a protected underground source and directly bottled without any chemical treatment, except for the separation of the unstable elements, such as sulphur, iron, manganese and arsenic compounds. The treatment for these compounds is usually the filtration or decantation (compounds of iron or sulphur), possibly preceded by oxygenation, whereas certain natural mineral waters are treated with ozone-enriched air, provided that such treatment does not have the effect of modifying the composition of the water as regards the essential constituents. Each natural mineral water has its own and stable mineral composition, and it must be labelled stating the analytical composition, place of origin and the name of the source. Most of European producers are joined to national associations. In Spain, there are approximately a hundred of companies engaged in the exploitation of natural water springs, most of them pertaining to an association named Asociación Nacional de Empresas de Aguas de Bebida Envasada (ANEABE). European associations are federated into the European Federation of Bottled Waters (EFBW). Created in 2003, EFBW is committed to protecting the unique specificities of natural waters and works to promote the sector and its products.
As it was said before, the chemical composition of bottled water is like the fingerprint of each natural water source. This composition depends on the nature of soil and rock formations and weather. It can be expected to find similar waters in nearby areas. For instance, Sipos et al.  used sensory evaluation, electronic tongue responses and chemical composition to differentiate the geographical origin of Hungarian spring waters. The elemental profile of mineral waters is very stable, and for this reason, the use of pattern recognition techniques based on this composition is a potential tool to be applied with authenticity and adulteration testing purposes. Some studies have been focused on the mineral profiling of bottled waters from different countries, such as Cameroon , Croatia , Hungary , Italy , Germany , Turkey , Spain  and UK , but few of them developed authentication studies. This fact has been explored in the case of Brazilian spring waters with 94–97 % of prediction abilities . Birke et al.  studied the geographical dependency of German bottled waters according to major and trace elements composition by means of principal components analysis (PCA). Güler  applied PCA and cluster analysis to characterize Turkish bottled waters according to major component. Oyebog et al.  used factor analysis and cluster analysis to find out the relationship between the composition of waters and the composition of soils nearby to the springs and other surface enrichment phenomena. Groŝelj et al.  also studied the relationship between the chemical composition of bottled waters and their geological origin but using artificial neural networks (ANN) approach.
Paying attention to Iberian Peninsula, most of sources of bottled natural mineral water are distributed in five mountain systems, such as Sistemas Béticos, Sistema Central, Sistema Ibérico, Cordillera Costero-Catalana and Macizo Galaico. Sistemas Béticos are located in the southern and eastern of Iberian Peninsula reaching from western Andalusia to the region of Murcia, as well as the southern of Castilla La Mancha and Valencia. Sistema Central is a mountain range separating the Tajo and Duero basins, being the natural boundaries of Castilla León, at north, and Castilla La Mancha, Madrid and Extremadura, at south. Mountains from Sistema Ibérico cover from Burgos to north of Valencia. The mountains known as Cordillera Costero-Catalana are situated parallel to the Catalonia Coast. In the case of Macizo Galaico, mountains are distributed from the south-west to the north-east of Galicia.
Based on the cited literature [2, 11], it can be expected that samples from these regions could be differentiated according to their elemental composition. The purpose of this work is to explore the adequacy of the inorganic profile and some non-specific parameters of Spanish mineral waters from the above considered mountain systems to establish classification models. The contents of Al, As, B, Ba, Co, Cr, Cs, Fe, Li, Mn, Mo, Ni, Se, Sb, Sr, Ti, U and Zn were determined by inductively coupled plasma-mass spectrometry (ICP-MS). The contents of Si, Ca, Mg, Na and K were determined by inductively coupled plasma-atomic emission spectrometry (ICP-AES), SO42− and Cl− were determined by ionic chromatography, whereas HCO3− was determined by potentiometric titration. Other parameters such as pH, electrical conductivity (EC), redox potential (E) and dry extract (DE) have been also experimentally measured. Pattern recognition techniques, such as principal component analysis (PCA), stepwise linear discriminant analysis (SLDA) and artificial neural networks (ANN), have been used to obtain suitable classification models.
2 Materials and methods
2.1 Samples and study area
A total of 52 samples of commercial bottled mineral waters were purchased in markets or obtained directly from suppliers. Samples from Cordillera Costero-Catalana (N = 11), Macizo Galaico (N = 8), Sistemas Béticos (N = 10), Sistema Central (N = 8) and Sistema Ibérico (N = 15) were considered for this study. The information about samples is included in electronic supplementary material (Table S1). Prior to analysis, water samples were stored at 4 °C.
2.2 Analytical method for metals and metalloids
Major elements (Si, Na, K, Ca and Mg) were analysed by ICP-AES, whereas minor and trace elements (Al, As, Sb, Ba, Be, B, Cd, Cs, Zn, Co, Cu, Cr, Sr, Fe, Li, Mn, Mo, Ni, Ag, Pb, Se, Tl, Ti, Th, U and V) were analysed using ICP-MS. An ULTIMA 2 Instrument (Horiba Scientific, Japan) was used for ICP-AES determinations, while the ICP-MS Instrument was an X7 SERIES ICP-MS (Thermo Elemental, USA). The instrumental details and operating conditions are summarized in Tables S2 and S3 of the electronic supplementary material, respectively. ICP-AES and ICP-MS measurements were carried out at the Research General Services of the University of Seville.
Complete ICP-MS analyses were conducted according to the 200.8 US EPA method, with some modifications related to tuning and mass calibration. These adaptations were established from the X Series ICP-MS Getting Started Guide  and were restricted to the isotopes of interest. Sample matrix was reproduced in calibration standards and QC standards and properly internal standards were selected. In order to minimize metal residues, all glass materials were cleaned with a 0.2 M solution of nitric acid during 24 h. All reagents, materials and samples were handled within a vertical laminar airflow cabinet (Indelab, model IDL-48 V). The cabinet contained a high-efficiency particulate air HEPA filter that ensured air cleanliness class 100, according to Federal Standard 209E.
The performance characteristics of the analytical method, such as trueness, precision, limit of detection (LOD), limit of quantification (LOQ) and linearity within the calibration range, were tested. The accuracies of spectrometric determinations were established by analysing international certified reference materials (CRM). Experimental concentrations were obtained from 18 replicates over 6 days. Quality control samples were used according to the EPA protocol (Section 9.0 of EPA 200.8:1994). CRM-TMDW (Trace Metals in Drinking Water Standard) from High Purity Standards (Charleston, USA), which is certified for trace metals in drinking water, was used to determine the accuracy of the US EPA method for drinking water by ICP-MS. The trueness of the method was evaluated via determination of specific elemental concentrations in the CRM. Recoveries (90.5–105.6 %) demonstrated that the method presented optimum trueness, with values included in the AOAC range . Precision, expressed as relative standard deviation (RSD) of repeatability, presents values in the range 0.6–5.1 %, also included in the AOAC range according to the elemental content. Limits of detection and quantification were calculated using the standard deviations obtained from calibration curves. LOD and LOQ were obtained as the concentrations corresponding to a signal that was 3 and 10 times the standard deviation of the intercept, respectively. Limits of quantification were quite low for ICP-MS (0.1–1.5 µgL−1), enabling analyses of very low levels of metals and metalloids in drinking water. Linearity within the calibration range was calculated as 100·(1−sb/b), where b is the slope of the calibration curve and sb is its standard deviation . The ICP-MS calibration curves were linear for all elements analysed, generally obtaining values higher than 95.0 %.
Each sample was analysed in three replicates, and the standard deviation for each element was calculated. Internal standards used in ICP-MS were Sc, In, Tb and Bi, which presented optimum accuracies (97.8–101.2 %). Standard solutions for metals and acids were from MERCK. Ultra-pure water was from WATERS-MILLIPORE (Milli-Q-grade, Model Plus).
In the case of elements also included in the CRM-TMDW certificate but determined by ICP-AES, recoveries vary from 90.1 to 102.4 % and precision from 1.9 to 6.2 %. Linearity was higher than 98 % for all these elements. LOQs varied from 0.036 to 0.18 mg L−1.
2.3 Analytical method for inorganic anions
Inorganic anions (Cl−, SO42−) were determined by ion chromatography (IC) with conductivity detector and chemical suppression (H2SO4). An 792 Basic IC (Metrohm, Germany) was used. A column METROSEP A Supp5-250, protected by a METROHM precolumn module, was employed for the determination of the anions according to the following conditions: mobile phase 3.2 mM Na2CO3/1.0 mM NaHCO3; flow rate 0.7 mL min−1 and injection volume 100 µL.
Accuracy and precision of the applied method were established by use of international certified reference material (BCR, Simulated Rain Water) for two levels of concentration (CRM 408, low content, and CRM 409, high content) supplied by the Institute for Reference Materials and Measurements (IRMM, Belgium). All the results obtained for the analyses of these materials present optimum percentages of recovery (>98 %). Precision, expressed as % of RSD, was 1.3 in the case of SO42− and 2.1 in the case of Cl−. Limits of detection and quantification and linearity were calculated. LOD and LOQ for SO42− were 0.04 and 0.13 mgL−1, respectively. In the case of Cl−, LOD and LOQ were 0.15 and 0.5 mg L−1, respectively. Linearity of 98 % was accomplished for both anions.
The determination of alkalinity was performed by potentiometric titration with HCl, according ISO 9963-1:1994 procedure . General parameters such as pH, electrical conductivity (EC) and redox potential (E) were measured according standard methods .
2.4 Chemometric calculations
A data matrix consisting of 52 rows (samples) and 30 columns (variables) was obtained to perform chemometric calculations. Basic statistic and Kruskal–Wallis test were used in order to highlight differences between the five mountain systems considered in this study. PCA was used to initially observe data trends. LDA and ANN were used to obtain classification models. Before these calculations, all variables were auto-scaled, i.e. data in each column were mean-centred and divided by the standard deviation of that column. All chemometric calculations were carried out by using the software package Statistica 8.0 (StatSoft, Tulsa, OK, USA).
3 Results and discussion
3.1 Chemical composition
The contents of Al, As, B, Ba, Co, Cr, Cs, Fe, Li, Mn, Mo, Ni, Se, Sb, Sr, Ti, U, Zn, Ca, Mg, Na, K, Si, HCO3−, SO42− and Cl−as well as values for parameters pH, EC, E and DE, for samples of water proceeding from the five considered mountain systems are given in Table S4, included in electronic supplementary material. As can be seen, pH varies from 6.96 to 8.89, without apparent differences between the considered groups. The same can be observed for EC, ranging from 129 to 794 S cm−1. Median values of E vary from 202 mV, in the case of Sistemas Béticos, to 217 mV, in the case of Cordillera Costero-Catalana. Samples from Cordillera Costero-Catalana and Sistemas Béticos present the lowest median values for DE (174 mg L−1), whilst those from Macizo Galaico present the higher one (291 mg L−1). HCO3− is the most abundant anion with median values ranging 140–297 mg L−1, being the highest contents found in samples from Sistema Ibérico. Median values of SO42− and Cl− ranged from 10.0 to 22.2 and 6.0 to 32.9 mg L−1, respectively. Samples from Sistema Central present the highest median concentration of Cl− and the lowest of SO42−. Considering the content of Ca, the highest median content (88.7 mg L−1) was found in samples from Sistema Ibérico and the lowest in waters from Macizo Galaico and Sistema Central, with contents of 22.8 and 21.5 mg L−1, respectively. Samples from Sistema Ibérico also present the highest contents for Mg (23.4 mg L−1), whilst those from Macizo Galaico present the lowest one (5.4 mg L−1). On the contrary, the highest median contents of Na (67.1 mg L−1), K (4.0 mg L−1) and Si (3.81 mg L−1) were found in Macizo Galaico waters. The highest median contents of Al (16.7 mg L−1), Ba (29.0 g L−1), Zn (5.8 g L−1), Co (0.40 g L−1), Mn (0.56 g L−1), Mo (1.4 g L−1), Ti (3.3 g L−1) and U (6.5 g L−1) were found in samples from Cordillera Costero-Catalana. Samples from Macizo Galaico present the highest contents of Li, Sb, B, Cs and Sr, with medians of 816, 0.27, 237, 54.7 and 161 g L−1, respectively. Waters from Sistema Central present the highest median contents of As (1.6 g L−1) and Se (0.65 g L−1). In the case of Sistema Ibérico and Sistemas Béticos, samples from both origins present the highest median values of Fe, being 137 and 149 L−1, respectively.
Kruskal–Wallis test results
3.2 Differentiation of geographical origin
PCA was first applied in order to visualize data trends in the space of the considered variables. PCA is based on obtaining linear combinations of the original variables to produce new variables called principal components (PCs) that are uncorrelated. PCA can be used to reduce the dimensionality of the n-dimensional space of original variables by computing PCs retaining the highest variability as possible of the original variance of data . The first principal component (PC1) expresses the largest variability of the data and each successive PC represents as much of the residual variance as possible. Taking into account that the matrix of data is auto-scaled, each observed variable contributes one unit of variance to the total variance in the data set. An eigenvalue is computed for each PC indicating the amount of variance explained by this PC. In order to reduce dimensionality only PCs with eigenvalues greater than 1 were retained, because these components account for a greater amount of variance than one observed variable . In this case, the 9 first PCs present eigenvalues >1, explaining the 81.21 % of total variance (Table S5 of the electronic supplementary material).
Results of pattern recognition models
67 ± 26
96 ± 7
56 ± 30
98 ± 4
87 ± 20
88 ± 9
72 ± 26
95 ± 7
72 ± 18
88 ± 6
72 ± 10
92 ± 3
83 ± 35
83 ± 12
83 ± 26
90 ± 12
39 ± 42
92 ± 13
61 ± 21
90 ± 10
67 ± 11
89 ± 5
100 ± 0
97 ± 5
72 ± 36
100 ± 0
94 ± 17
95 ± 7
89 ± 22
99 ± 3
89 ± 13
96 ± 5
90 ± 9
97 ± 2
72 ± 36
100 ± 0
100 ± 0
95 ± 7
94 ± 17
99 ± 4
100 ± 0
97 ± 6
93 ± 9
98 ± 3
100 ± 0
96 ± 5
72 ± 36
100 ± 0
96 ± 11
100 ± 0
100 ± 0
100 ± 0
97 ± 8
96 ± 5
94 ± 6
98 ± 2
In order to improve these results, a nonlinear approach such as MLP-ANN was applied. MLP-ANNs are feed forwarded networks consisting of neurons arranged in an input layer, various hidden layers and an output layer. As LDA, ANN uses training and test set, but a third set (validation set) is needed to avoid overtraining . In this case, samples were divided into training (50 %), validation (25 %) and test (25 %) sets, maintaining this proportion in each class. The model was trained by back-propagation during 50 cycles by minimizing the prediction error made by the network. Learning rate and momentum were set to 0.1 and 0.3, respectively. Logistic sigmoid activation functions were used for hidden nodes, and softmax (normalized exponential) activation functions were used for the output layer. A network with 20 inputs, one for each variable selected by LDA, 10 hidden neurons and 5 outputs was obtained. The model was cross-validated using SDAGJK, and SENS and SPEC were computed for each considered class. As given in Table 2, SENS obtained by model ANN1 for Cordillera Costero-Catalana was 100 % and a SPEC of 97 % was accomplished. The other classes presented SENS ranging from 72 %, in the case of Macizo Galaico, to 94 % in the case of Sistemas Béticos. The overall values were 90 and 97 % for SENS and SPEC, respectively.
The built LDA model was cross-validated by means of SDAGJK using a data division of 75 % for training and 25 % for test set. Table 2 (LDA2) shows an improvement for the SENS for Macizo Galaico, but the results for the other three classes are worst. In order to improve these results, the computing of an ANN model was considered. The same variables selected by LDA2 were used to obtain model ANN2, with architecture 25:13:4. This model was built also applying logistic sigmoidal and softmax activation functions for hidden and output layers, respectively. Learning rate and momentum were the same used for ANN1. In this case, after applying cross-validation, SENS rises to 100, 94 and 100 % for Sistemas Béticos, Sistema Central and Sistema Ibérico, respectively. In the case of Macizo Galaico, SENS of 72 % was obtained. The overall performance of model ANN2 was 93 and 98 % of SENS and SPEC, respectively.
Bottled natural mineral waters are food products very appreciated by consumers, and consequently, the chemical characterization and geographical traceability of these products have gained more and more importance from an economical point of view. In this work, natural mineral waters from Spain have been chemically characterized in order to study their correlation with their production area. Samples from five different Spanish mountain systems, such as Sistemas Béticos, Sistema Central, Sistema Ibérico, Cordillera Costero-Catalana and Macizo Galaico, were collected and analysed. Some differences were detected by application of simple nonparametric test and principal component analysis. Samples from Cordillera Costero-Catalana generally presented lower values for electrical conductivity and dry extract than the other origins, and in the case of Ti and Al, the contents were slightly higher. Samples from Sistemas Béticos and Sistema Ibérico present higher contents of Ca and Mg and lower ones for K and Na, when they are compared with the other three origins.
Classical nonparametric multiple comparison method, such as Kruskal–Wallis test, and principal component analysis do not allow a good differentiation among the considered origins. For this reason, a pattern recognition approach is necessary to solve the classification problem. The development of stepwise linear discriminant analysis models allows the selection of the most discriminant variables, but these models do not solve the classification problem by themselves. In this case, nonlinear models based on artificial neural networks obtain better results.
In this study, samples of water from Cordillera Costero-Catalana are usually the best differentiated from the others. This fact can lead to a biased model which works worse in the classification of samples into the others groups. Consequently, in order to obtain the most discriminant variables allowing the differentiation among the remaining groups, the use of iterative models is adequate. The proposed model first differentiates samples from Cordillera Costero-Catalana from the other mountain systems and then performs the classification of the remaining groups. This model presented an average classification ability of 94 %.
- 1.European Union (2009) Directive 2009/54/EC of the European Parliament and of the Council of 18 June 2009 on the exploitation and marketing of natural mineral waters. Official Journal of the European Union, L 164/45, Brussels. http://eur-lex.europa.eu/eli/dir/2009/54/oj. Accessed 9 July 2016
- 9.Gutiérrez-Reguera F, Seijo-Delgado I, Montoya-Mayor R, Ternero-Rodríguez M (2012) Caracterización fisicoquímica (parámetros generales y componentes mayoritarios) de las aguas minerales naturales envasadas de España. Afinidad 519:165–174Google Scholar
- 13.Thermo Electron Corporation (2004) X series ICP-MS getting started guide. Ref. no. S419MA. Thermo Electron Corporation, WinsfordGoogle Scholar
- 14.AOAC (2012) Appendix F: guidelines for standard method performance requirements. In: Official methods of analysis of AOAC international, 19th edn. AOAC International, GaithersburgGoogle Scholar
- 16.ISO (1994) ISO 9963-1:1994 Water quality. Determination of alkalinity. Part 1: determination of total and composite alkalinity. International Organization for Standardization, GenevaGoogle Scholar
- 17.ISO (1985) ISO 7888:1985 Water quality. Determination of electrical conductivity. International Organization for Standardization, GenevaGoogle Scholar
- 18.Muth JE (1999) Basic statistic and pharmaceutical statistical applications, 1st edn. Chapman and Hall/CRC, New YorkGoogle Scholar
- 23.Massart DL (1998) Handbook of chemometrics and qualimetrics, part B. Elsevier, AmsterdamGoogle Scholar
- 25.Kott PS (2001) The delete-a-group jackknife. J Off Stat 17:521–526Google Scholar