Introduction

Establishment of different industries such as fertilizers, paper, metal plating, batteries, mining operations and tanneries has been causing an increased rate of discharge of the most hazardous inorganic pollutants known as heavy metal ions (HMI) in the water resources. As a result, the origin of these pollutants is better to be considered as anthropogenic rather than natural (Kumar et al. 2012a, b, 2014, 2015, 2016; Karkra et al. 2016). These pollutants have been deteriorating the quality of water resources. These toxic elements are non-biodegradable and primarily enter the body through water, followed by food and air. They are toxic even at low concentration level and their toxicity increases with accumulation in water (Bradl 2004); if accumulated in living organisms, they can cause dreadful diseases such as Alzheimer, Parkinson, kidney damage and hypertension. As these diseases are pernicious, proper surveillance systems are required not only to detect, but also to perform the remedial process of removal of HMI. Many government agencies have come forward and employed stringent rules and regulations as these toxic ions are some of the high priority pollutants and are becoming the most serious environmental problems. The metal ions released by industries are copper, arsenic, nickel, cadmium, mercury, chromium and cobalt (Fu and Wang 2011), which are of major concern. According to a report by the Indian National Science Academy (Sahni 2011), these ions are found in many areas and it lists that 80% of the toxic pollutants in India are primarily contributed by Gujarat, Maharashtra and Andhra Pradesh as shown in Table 1. For the scientist community, it has become the biggest challenge for preserving our natural heritage, i.e., rivers such as Yamuna from being affected by the adverse effects of these pollutants ; the largest tributary of the Ganges is now the second most polluted river in India after Ganges (Kumar et al. 2014; Chawla et al. 2015).

Table 1 Contaminated sites in India (Sahni 2011)

Nowadays, the qualitative and quantitative study of liquids is done using a multi-sensor array system called e-Tongue device. In 1985, Otto and Thomas presented the first e-tongue system (Otto and Thomas 1985). The inspiration behind the design and operation of electronic tongues (ET) systems is the neurophysiology of the senses of taste. This system not only performs the automatic analysis of samples with complicated composition and finds their distinguishable characteristic properties, but also performs a faster qualitative analysis. There is a fusion of knowledge from various branches of science like pattern recognition methods, sensory technologies, chemo-metric tools and artificial intelligence in the construction of such systems. In the design of electronic tongues, a wide variety of chemical sensors can be employed: electrochemical (potentiometric, voltammetric), enzymatic (biosensors) or optical. There are various analytical methods that have been used for HMI determination including atomic fluorescence spectroscopy, atomic absorption spectroscopy and inductively coupled plasma-mass spectroscopy (Sanchez-Rodas et al. 2010; Larivière et al. 2012). The potentiometric method has been used for measuring and monitoring HMI in rivers (Mimendia et al. 2010a) and can be applied to evaluate cross-sensitivity of any kind of potentiometric sensors for liquid media (Vlasov et al. 1997). The major drawbacks of such potentiometric measurements are the property of temperature dependency, which influences the changes in solution, and adsorption of solution components, which further has effect on the nature of charge transfer; however, the effects of those factors can be minimized by controlling the temperature (Ciosek and Wroblewski 2007). There is another technique called electrochemical impedance spectroscopy (EIS) that exploits Faraday’s law to obtain electrical measurements for the chemical process. Electrochemical impedance spectroscopy is one of the highly used non-selective techniques for heavy metal detection due to its several advantages over other techniques. It offers better sensitivity and is easy to use compared to other techniques. The main advantage of this technique is the cost-effectiveness. EIS measures the impedance of all heavy metals by applying AC perturbation and the sweep of frequency from 1 Hz to 100 kHz (Reece 2005). EIS is one of the widely used techniques and gives the physio-chemical information of contaminated samples and the resulting multi-variate dataset contain all the hidden patterns and set of information that are needed to be explored. With the help of the chemometrics method in combination with other optimizing algorithm, data processing and information extraction from chemical data are carried out (Reece 2005).

Over a period of time, there have been many applications areas where genetic algorithm has been used for optimal classification (Vlasov et al. 1997; Turek et al. 2009; Mimendia et al. 2010a, b; Wilson et al. 2012). The general application areas are in samples of tea, juice (Liu et al. 2013), wine and water containing HMI. Prominent works done by Bhondekar et al. (2011), (Kaur et al. 2012) and (Kumar et al. 2012a, b) in the field of optimum classification of tea have used techniques such as social impact theory-based optimizer and support vector machines. Similarly, Gutiérrez et al. (2011) used principal component analysis (PCA) and soft independent modeling by class analogy (SIMCA) for the quantification of grape varieties. Further, Jańczyk et al. (2010) used ion selective electrode for detection of micro-encapsulation effect of pharmaceutical ingredients. As far as the classification of water containments are concerned, Martínez-Máñez et al. (2005) developed an electronic tongue for the qualitative analysis of natural waters using Fuzzy ARTMAP neural network with success rate higher than 93%. Hong Men et al. (2005) have developed an integrated electronic tongue which includes multiple light addressable potentiometric sensors and electrochemical electrodes for the detection of Fe(III), Cr(VI) and HMI.

Though evolutionary algorithms such as genetic algorithm (GA) and particle swarm optimization (PSO) algorithm have been used since a long time in several applications, in most of the cases PSO remains to be one of the best optimizers. The potential of GA and PSO for classification of the water containments is yet to be harnessed. The novelty of the work lies in the application of GA and PSO to the multi-variate data pertaining to multi-electrode, multi-frequency potable water data for the classification of heavy metal ions. This work focuses on classifying the impedance data of potable water of single electrode multi-frequency (SEMF), single frequency multi-electrode (SFME) and multi-frequency multi-electrode (MFME) and GA, with PSO being used to optimize MFME response. Principal component analysis (PCA) improves the extraction of the cluster structure (Ben-Hur and Guyon 2003) and is applied in conjunction with cluster validation-similarity index (S), dissimilarity index (D) and Davis–Bouldin index (DBI). The results show that the impedance response of silver nanoparticle (SNP) electrode gives the best discriminability without compromising the complexity of the system, for e.g., using SFME, MFME and GA and PSO optimized response. Overall, PSO optimized the result with the best combination of electrodes with particular frequencies for classification. It also shows that we can improve the cross-sensitivity of electrodes by selecting the optimum frequency of the optimum electrode.

Methodology

The schematic of the work carried out is depicted in Fig. 1.

Fig. 1
figure 1

Schematic of the methodology

Experimental setup and data acquisition

The experimental setup is designed to obtain the impedance spectra of eight heavy metal ions sampled from reagents such as NiCl2, ZnCl2, CuCl2·2H2O, K2Cr2O7 (Spectrochem Pvt. Ltd.), CdSO4·8H2O, As2O3 and AgNO3 (Merck Pvt. Ltd.) (3 samples for each) at 60 different frequency ranges from 1 Hz to 100 kHz using electrochemical workstation instrument. All the experiments were carried out at room temperature and the electrodes Au, GC, Pt and SNP were first polished with alumina slurry and dried in N2 gas after the removal of residual alumina by sonication in isopropanol. Further, the impedance spectra of different heavy metal ions were recorded. The recorded data are in the form of matrix of size 24 × 60 for each electrode, where rows indicate 24 different sampled heavy metal ions and columns represent their sampled frequency points. The feature selection and cluster analysis of recorded impedance spectra were done using PCA and validated using clustering indices.

Feature selection: principal component analysis

Principal component analysis is a statistical tool used for dimensionality reduction of multi-variate data. It selects the features from input data to reduce the dimensionality and tries to keep the informative value of the data intact. Depending on the number of input variables, it creates principal components (PC’s) representing the maximum variability in information data and its variance in descending order (PC1, PC2, PC3, etc.). Usually, the first two components (PC1 and PC2) are found to be the best means to carry out the classification of input variables. In our work, we performed PCA to form clusters of impedance values of the electrodes at various frequencies, and analysis and validation of the cluster through the similarity and dissimilarity factor within the cluster and in between clusters, respectively.

Clustering analysis

Clustering is an unsupervised process of dividing or grouping set of input data on the basis of some common attributes into clusters. It is very difficult to define the acceptability of clusters due to which two measurements are generally done:

  • Similarity Index (compactness) (S)—This index measures the value of homogeneity of data in a cluster and how much it is closely packed in a given cluster. This compactness factor is generally measured using the variance.

  • Dissimilarity Index (separation) (D)—This index measures the value of heterogeneity between the clusters and how far clusters are from each other. The more the distance between the clusters, the better is the clustering.

To measure crisp clustering, i.e., having no overlapping partitioning, one more validity index is used, i.e., Davies–Bouldin index (DBI) (Kovács and Iváncsy 2006), which is based on the similarity and dissimilarity measure of the clusters.

Let E be the set of input data, and C i the cluster of E i , i.e., \(E_{i} \in C_{i}\).Then the similarity index (S i ) of C i is measured as

$$S_{i} = \left\{ {\frac{1}{{|N_{i} |}}\mathop \sum \limits_{j = 1}^{{T_{j} }} |E_{j} - T_{i} |^{p} } \right\}^{{\frac{1}{p}}} ,$$
(1)

where N i is the number variable in C i and T i the centroid of C i .

Similarly, we can measure the dissimilarity index (D i ):

$$D_{ij} = \left\{ {\mathop \sum \limits_{k = 1}^{{k_{r} }} |T_{ki} - T_{kj} |^{p} } \right\}^{{\frac{1}{p}}} ,$$
(2)

where k r represents the kth element of centroid T r of cluster C r and p = 2 (Euclidian distance).

To calculate R ij we need to find the similarity between two clusters, C i and C j :

$$R_{ij} = \frac{{S_{i} + S_{j} }}{{D_{ij} }},$$
(3)
$${\text{DBI}} = \frac{1}{N}\mathop \sum \limits_{i = 1}^{N} R_{i} ,{\text{ where }}R_{i} = { \hbox{max} }\left( {R_{ij} } \right){\text{ when }}i \, \ne \, j .$$
(4)

Data processing

The four electrodes (platinum, gold, glass carbon and silver nanoparticles) are used in experimentation and impedance is measured in the range of 1 Hz to 100 kHz in 60 steps in response to the potable water mixture containing eight heavy metal ions dissolved (1 metal ion has 3 samples). We use only five frequencies to carry out further analysis to reduce impedance variables. The data are then arranged in 24 × 5 matrices for each electrode and each frequency, where rows represent heavy metal ions and columns are the modulus of impedance of electrodes with respect to frequencies, matrices namely GC, Au, Pt and SNP and frequencies 1, 100 Hz, 1, 10 and 100 kHz. Further, a matrix containing the data of all electrodes is formed named as multi-frequency multi-electrode (MFME) of size 20 × 24, where rows represent a sample and column the impedance response of each electrode to a particular frequency. The above matrices have been standardized (i.e. mean centered and standard deviation scaled). To optimize MFME data, it is subjected to GA and PSO. It was observed that using DBI as their fitness function, both GA and PSO select different sets of electrodes, i.e., GA selected Au—1 Hz, Au—1 kHz, SNP—1 Hz, Pt—10 kHz and Pt—100 Hz; and PSO selected electrodes Pt—100 kHz, Pt—1 kHz, Au—100 kHz, Au—10 kHz and SNP—100 Hz. For all matrices, the similarity index (S), dissimilarity index (D) and Davies–Bouldin index (DBI) values are calculated and shown in Table 2.

Table 2 Clustering indices values

Results and discussions

To analyze the impedance values generated by the electrochemical workstation used in the experiment, we have used PCA to classify the heavy metal ions and genetic algorithm to check the cross-sensitivity of the electrodes. We used four electrodes, namely, platinum (Pt), glassy carbon (GC), gold (Au) and silver nanoparticles (SNPs). The eight heavy metal ions are arsenic, copper, zinc, nickel, cadmium, lead, cobalt and chromium. For each electrode, we had 24 impedance values corresponding to three sample sets of eight heavy metal ions. The frequencies used are 1, 100 Hz, 1, 10 and 100 kHz. For each frequency, we have impedance values in the form of a matrix of size 3 × 8 for each electrode. Such a matrix corresponds to the data pertaining to single frequency multi-electrode (SFME) configuration. The PCA scattering corresponding to the SFME configuration for frequencies 1, 100 Hz, 1, 10 and 100 kHz are shown in Figs. 2, 3, 4, 5 and 6, respectively.

Fig. 2
figure 2

PCA plot for 1 Hz frequency

Fig. 3
figure 3

PCA plot for 100 Hz frequency

Fig. 4
figure 4

PCA plot for 1 kHz frequency

Fig. 5
figure 5

PCA plot for 10 kHz frequency

Fig. 6
figure 6

PCA plot for 100 kHz frequency

It can be seen from Fig. 2 that two heavy metal ions—Co and Cr, are clearly classified at 1 Hz frequency, and the rest of the metal ions overlap and as a result are not properly classified. Figure 3 shows that all the heavy metal ions are reasonably classified at a frequency of 100 Hz. Figure 4 classifies As, Pb, Cu, Cd and Zn as forming a more compact cluster at 1 kHz, whereas the Co, Ni and Cr compactness of clusters is comparatively less as compared to the cluster formed at 100 Hz frequency. Figure 5 classifies all the heavy metal ions; the plot is quite similar to that of 1 kHz. The impedance values at 100 kHz are not that well classified, as overlapping was present between Co and Cd as shown in Fig. 6. When the similarity and dissimilarity indices are compared for all the frequency values used, it can be observed that the classification corresponding to 1 kHz is the most reasonable, as it has lower S (similarity index) value and higher D (dissimilarity index) value among all the five frequencies in the SFME.

The next configuration we have considered is multi- frequency single electrode (MFSE). For each of the four electrodes, Pt, GC, Au and SNP, the five frequencies used are 1, 100 Hz, 1, 1 and 100 kHz. The PCA scatterings of MFSE configuration for each electrode are shown in Figs. 7, 8, 9 and 10.

Fig. 7
figure 7

PCA plot corresponding to the Pt electrode

Fig. 8
figure 8

PCA plot corresponding to the GC electrode

Fig. 9
figure 9

PCA plot corresponding to the Au electrode

Fig. 10
figure 10

PCA plot corresponding to the SNP electrode

Figures 7 and 8 show the PCA scattering corresponding to the Pt electrode and GC electrode, respectively, and the classification is very poor because of excessive overlapping. Figure 9 shows a better classification, but overlapping exists. Figure 10 shows the best classification of metal ions when SNP electrode is used. Out of all the four electrodes, SNP clearly stands out as the most optimal electrode in the MFSE configuration for classification of heavy metal ions with the smaller S value as 0.429 and a larger D value as 2.816 (Table 2).

The third configuration we have used is multi-frequency multi-electrode (MFME) whose PCA scattering is shown in Fig. 11.

Fig. 11
figure 11

PCA plot corresponding to MFME

Every heavy metal ion Cd, Co, Zn, Ni, Cu, Cr, Ar and Pb form its own clusters as can been seen and to decrease the complexity of the MFME configuration and to optimize the issue of cross-sensitivity of electrodes, i.e., choosing optimum electrode at its optimum frequency, we have used the GA and PSO algorithm.

The GA and PSO algorithm was tuned and ran several times and selected the electrodes with the best efficiency on the basis of DBI as the fitness function. The electrodes selected by GA are Pt—1 kHz, SNP—100 kHz, Au—1 kHz, GC—1 Hz and GC—1 kHz. The matrix of 24 × 5 was formed containing the electrode impedance values at the particular selected frequency by GA and PCA scatter plot of MFME. GA successfully classified the ions and the clusters formed were more compact as compared to both MFME and SNP results as seen in Fig. 12.

Fig. 12
figure 12

PCA plot corresponding to MFME–GA

Further, we have repeated a similar exercise for the electrodes (Pt—100 kHz, Pt—1 kHz, Au—100 kHz, Au—10 kHz and SNP—100 Hz) selected by PSO. It is observed that clusters of nickel and chromium ions that were not optimized in the GA scatter plot are now more compact in the MFME PSO PCA scatter plot shown in Fig. 13.

Fig. 13
figure 13

PCA plot corresponding to MFME–PSO

Further, the matrix containing these electrodes of each optimization technique was subjected to a code written in MATLAB to calculate the cluster indices. It was found that MFME–PSO produced a comparatively better result as compared to all, both similarity index (S) and Davis–Bouldin index (DBI) decreased, which represents a better classification of ions.

Conclusions

In this work in the SEMF system, it is found that SNP gives better classification of HMI as compared to Pt, GC and Au. Overall, the PSO-optimized response gives better clustering indices values; its system complexity is more, as multiple electrodes are used, but classification and compactness of the clusters formed is much more distinguishable as compared to SNP. The PSO-optimized multi-frequency multi-electrode system could be used for discrimination of heavy metal ions residing in potable water. Also, it is important to keep in consideration that cross-sensitivity of electrodes could enhance more and opens up the window for more qualitative and quantitative analysis of liquids for exploration.