1 Introduction

Iron constitutes only 0.008 percent of human body mass (about 6 g for a 75 kg adult male); even with the low concentration in the body, it plays a vital role in human immunity [1]. Therefore, an adequate supply of iron is essential for keeping the human immune system effective in fighting against diseases [2]. People with iron deficiency are unable to produce sufficient hemoglobin to meet up the body’s oxygen transport needs [3]. One of the diseases associated with iron deficiency is anemia which occurs due to insufficient red blood cells (RBCs) circulating in the blood or as a result of low hemoglobin content of these RBCs. Moreover, the body stores the excess iron in the tissues; however, overloading of this heavy metal results in a condition known as hemochromatosis which severely damages the body tissues and their functions [4]. The ethylenediamine tetra acetic acid (EDTA) titration method is the traditional method used for determining iron in samples by assuming the endpoint of complex formation [5]. Other commonly used analytical methods include the use of spectroscopic and chromatographic techniques such as atomic absorption spectrometry and ion chromatography. These methods are used for the detection of heavy metals, but have disadvantages where they require sophisticated instrumentation, experienced personnel, and high analytical costs [6]. Electrochemical techniques have achieved significant attention in most and recent research applications in all aspects of industrial and scientific site [7]. The electrochemical measurements generated a lot of interest from recent studies due to the low cost, less processing time, environment-friendly, low detection limit and high selectivity [8]. Voltammetric methods are the most widely used electrochemical techniques adopted by many scientists in several research and applications. Moreover, the methods of voltammetry consist of different techniques such as cyclic voltammetry (CV) and differential pulse voltammetry (DPV) which vary in their properties, characteristics, specificity, and applications. However, the most widely adopted methods are those with higher sensitivity [9, 10]. Voltammetric techniques are more sensitive to redox currents (oxidation–reduction currents) or so-called faradic currents, which are measured in order to determine the chemicals of concern’s concentrations, which generates when they are reduced or oxidized. In modern electrochemical techniques, extensive modifications have increased sensitivity [11]. The running of numerous machine learning techniques in classifying pulse voltammetric methods data sets has also been evaluated, containing cyclic voltammetry (CV), differential pulse voltammetry (DPV) and square wave voltammetry (SQWV) [12, 13]. For example, CV has been used with sensors to evaluate some biological compounds [14]. Machine learning is used to carry out the data analysis obtained by the electrochemical technique used to simply, precisely and rapidly classify harmful chemicals and heavy metals. Additionally, some other useful properties of the algorithms model assessed in the study were highlighted. Voltammetry is a powerful tool, and recently, it is being used with machine learning in classification purposes [15].

Working on the pulse electrochemical techniques indicates one of the previous works that we present in previous years in 2019 and 2020. The analyte used is potassium ferrocyanide (K\(_{4}\)Fe (CN)\(_{6}\)) as a model solution for the electrochemical techniques. The main differences between our previous studies and this work indicate test size iteration, different machine learning models used for the classification of the electrochemical data obtained, and the higher accuracy achieved, moreover, the type of electrochemical method chosen to detect the chemical of concern. In 2019, different five concentrations of potassium ferricyanide (K\(_{3}\)Fe(CN)\(_{6}\)) were prepared and detected by the cyclic voltammetry (CV) method. Gradient boosting algorithm was chosen to classify the data obtained by giving 60% training and 40% testing. The accuracy achieved was up to 76.6% [16]. In 2020, K\(_{4}\)Fe (CN)\(_{6}\) was detected by two electrochemical methods DPV and SQWV. The data obtained were classified by gradient boosting algorithm and the accuracy achieved was 75% and 60%, respectively, for both voltammetric techniques [17].

It is noteworthy to state that the novelty of this current study can be presented in a dissimilar approach. First, the detection of the inorganic compound by analyzing the forward and backward scan of a SQWV data was done which is the first time to be presented in the published technical research literature due to the complexity. Machine learning is used effectively in various applications in the literature for classification and regression [18,19,20]. So secondly, the classification of the electrochemical data by using the six different machine learning models which weren’t used in our previous research was also done. Finally, the accuracies of data classification output were obtained with high accuracy and higher sensitivity in comparison with our previous studies. In this study, an attempt was made to test several algorithm models, involving the lately advanced technology in machine learning [21, 22]. Backpropagation Neural Networks (BPNN), Gaussian Naive Bayes (GaussianNB), Logistic Regression (LR), K-Nearest Neighbors Algorithm (k-NN), K-Means clustering, and Random Forest are the methods used to classify the electrochemical database obtained. Some of these methods were used as a preliminary work before. It has been seen that these methods were effective. When research about applications of machine learning was done, some other effective methods were also found and added to this research for comparison [23, 24].

DPV and SQWV were selected to detect iron II using the model compound potassium ferrocyanide K\(_{4}\)Fe(CN)\(_{6}\) solution, each method characterized by its own identity and uniqueness, while voltammograms generated from the techniques are integrated as input for quantification of species using electrochemical cells relies on sensors that play a vital role in the projected observation. By detecting the analyte of interest in the recorded voltammogram (current vs. potential), it is possible to determine the concentration of the analyte in the sample. However, SQWV is avoided in many studies due to its complexity compared to other pulse electrochemical techniques [25]. It is also important to realize that voltammetric analysis for real samples generally gives us complicated voltammograms due to the existence of other electrochemically active components. Using machine learning approach with voltammetric data applied in this study, will be a promising method which can be used in the analysis of complex voltammograms. Nevertheless, in this study, the forward and backward scans of SQWV were used for the detection of Fe\(^{+2}\) in various concentrations, which improve the sensitivity over the traditional SQWV [26, 27]. Afterward, all data obtained were examined and classified by the suggested machine learning algorithm models in which no work has been applied before in analytical studies validated with ML for data classification. However, machine learning was introduced in this study due to its high accuracy and its contributions in terms of creativity and innovation.

2 Materials and methods

2.1 Chemicals and instrument

All the chemicals and instrument used were reported in the previous study [28]. All chemicals used were purchased from Sigma-Aldrich (Istanbul, Turkey) and contained phosphate-buffered saline (PBS) solution, potassium nitrate (KNO\(_{3}\)) and potassium ferrocyanide (K\(_{4}\)Fe (CN)\(_{6}\)). Deionized (DI) water (18.2 M\(\Omega\).cm), treated with pure lab Ultra Analytic (ELGA Lab Water, UK), was used for all the aqueous solutions preparation. Pencil graphite electrodes PGEs were used in the form of pencil tips tom-bow 0.5/HB which were picked up from local stationeries.

AUTO LAB Potentiostat/Galvanostat PGSTAT101 with NOVA 2.1.2 Software (Pine Instrument Company, USA) was used to carry out the DPV and SQWV measurements. The electrochemical pulse measurements for both techniques were carried out via an electrochemical cell consisting of three main electrodes; the first electrode called the counter electrode (CE) or auxiliary electrode was a wire made of platinum. The second electrode indicates the reference electrode (RE) which was made of Ag/AgCl-KCl 3 M concentration. The final one is called working electrode (WE). For each ferrocyanide concentration, the solutions were stirred before measurement and not through the measurement detection to restructure initial conditions. Moreover, all the detection procedures were detected under room temperature 25\(^{\circ }\)C. Both pulse voltammetric techniques, DPV and SQWV, were carried out using the same parameters of start potential from 0V, end potential from 1V, step height of 0.004 mV, pulse height of 0.02 and frequency of 2Hz.

2.2 Electrochemical data

Throughout the study, unless otherwise mentioned, analytical-grade reagents were used. All electrochemical data used here are based on our previously published results [28]. The preparation of five different concentrations of the analyte which is potassium ferrocyanide (K\(_{4}\)Fe (CN)\(_{6}\)) was carried out by preparing the stock solution of it first with a concentration of 10 mmole/L and dissolved in a potassium nitrate of 1 M concentration to get a total volume of 250 mL (with distilled water as a solvent); then, for each concentration, a proper amount was taken from the stock solution prepared and diluted to the mark of the volumetric flask using deionized water. The final five different concentrations of potassium ferrocyanide (K\(_{4}\)Fe(CN)\(_{6}\)) fit a total volume of 25 mL of 2,4,6,8 and 10 mmol/L. Each potassium ferrocyanide concentration was carried in both pulse voltammetric measurements, respectively, starting from the lowest concentration until the highest concentration. Twenty pencil graphite were used for every single concentration in each pulse detection measurement.

The data used here in this research is related to the DPV and SQWV measurement for potassium ferrocyanide. The DPV and SWV scans were used to get the peak position, peak height, peak area, base start, base end, peak width (1/2), peak (1/2), and peak total data. These data have been chosen because they contain features and connections that can distinguish voltammograms, which have formed 5 different classes. Furthermore, the forward and backward scans of SQWV also were included and compared in this study, and by using new machine learning algorithms to classify the electrochemical data obtained, modern data and accuracy were obtained. For each detection, and for each step is illustrated in Fig. 1.

Fig. 1
figure 1

The proposed electro-chemo-metric flowchart system: a stock solution of potassium ferrocyanide; b the 5 standard solutions; c pulse voltammetric techniques used; d voltammograms results from analysis; e converted data used; f calibration graphs of iron II by voltammetric methods; g six machine learning algorithms used for classification

3 Machine learning

3.1 Dataset

Five different input datasets SQWV, forward scan of SQWV, backward scan of SQWV, net of SQWV, and data obtained from DPV were taken and normalized to obtain five different scenarios for the training process. To examine if machines identify the forms of data fed into the models in terms of better accuracy and sensitivity, machines must be trained using training datasets [12]. Figure 2 shows the methodology used in this research. To ascertain if the machine classified the dataset efficiently and effectively, it must be tested using testing datasets. Splitting data into 2 datasets, as supported by the literature, is an essential method for the evaluation of models [29]. Based on recent studies, higher percentage of datasets are used as the training data, while a smaller percentage of dataset are used as a testing data [30]. In this study, we made a system that iterates over different test sizes ranging from 0.2 to 0.8 with an increment of 0.1 to find the best ratio for each algorithm. This method is used to compare the results of different machine learning algorithms on different train test ratios. The dataset consists of 100 samples in total which has 20 samples for each 5 classes. Train test ratios in this research were iterated from 0.2 to 0.8 because out of this range, enough samples for each class were not included in train or test datasets.

Fig. 2
figure 2

Projected machine learning methodology for continuous examining of PGE: from the left to the right, the chemical is detected by the electrochemical sensor PGE, the SQWV and DPV data extracted to be given to ML models to classify the chemical concentrations used

3.2 Model training

Six machine learning models were examined on each iron II concentration dataset. K\(_{4}\)Fe(CN)\(_{6}\) was used and detected by DPV and SQWV. Forward square wave voltammetry (FSQWV), backward square wave voltammetry (BSQWV), and Net square wave voltammetry (NSQWV) scans of SQWV data were also used and classified in this work, looking forward to higher accuracy. Backpropagation Neural Networks, Gaussian Naive Bayes, Logistic Regression, K-Nearest Neighbors Algorithm, K-Means clustering, and Random Forest were used for data classification as mentioned before.

Backpropagation is a machine learning algorithm that computes gradient of the loss function according to the weights of the network, and it is used for training feedforward neural networks. The term backpropagation indicates backward propagation of errors [13]. This method updates weights to minimize the loss. Backpropagation Neural Network used in this study is a multilayer network which has multiple connected dense layers, and a single dense layer as output and its results are then rounded to the closest class when obtaining the final decision.

Gaussian Naive Bayes is a type of Naïve Bayes Classifiers which are based on the Bayes theorem. Bayes theorem is used to calculate the conditional probability, and it is used in machine learning for classification problems. For Gaussian Naive Bayes, it is assumed that each class is distributed as in Gaussian distribution.

Logistic regression is a model used in statistics to find the probability of classes, which is also very useful in machine learning. For binary classification, outputs are labeled as 0 and 1 where the outputs are classified according to a threshold value. This method can also be used for multiclass problems when there are three or more output classes. In this case, a probability value is assigned to each class which is between 0 and 1 and their sum is equal to 1. The class with the highest probability value is the output class. This method is called multinomial logistic regression.

K-Nearest Neighbors are a machine learning algorithm that is based on the similarity of the data, which means this algorithm assumes that similar things are close to each other and they can be classified according to the distances. This algorithm stores the data and when a new data point is given the K-Nearest Neighbors Algorithm classifies the data in the most suitable class according to the closeness or similarity. In this research, k value was used as 5 and the distance measure was used as Euclidean distance.

K-Means clustering is an unsupervised machine learning algorithm that makes clusters with the data according to the given number of k, which is the desired number of centroids, in other words, classes. Algorithm starts with random centroids which are the beginning positions for each cluster. Then, calculations are done in iterations to optimize the positions of the centroids. Although it is an unsupervised method, K-means algorithm was used to obtain 5 different clusters of data, and then, it was compared to the real class each data belongs to.

Random forest is an ensemble learning method which is based on constructing multiple decision trees during training. For classification problems like in our case, the output class is decided according to the class selected by most of the trees. Usage of random forest helps preventing over fitting that can happen when using a single decision tree. In this research, there were 100 decision trees.

By applying an approach of training and testing of datasets, the effect of data divergences would be decreased and also improved the characteristic models employed. After the termination step of choosing the best algorithm models, moving to the next step of testing the model by formulating the classifications counter to the test dataset; for this reason, the testing dataset is previously enclosed with known values based on the feature which is needed to be classified, hence it is an easy way to determine the model estimate correctly. The training test sets were made randomly and iterated several times at different train test ratios between 80–20 and 20–80% to examine the maximum obtained accuracy for each scenario on different algorithms. All the machine learning models were trained on Windows operating system with Spyder IDE by using the Keras and Scikit libraries with an Nvidia Geforce RTX2060 GPU.

4 Results, analysis & discussion

For the DPV and SQWV methods, 100 voltammograms were obtained from the redox reaction of potassium ferrocyanide which previously has been reported [28]. The oxidation peak of ferrocyanide into ferricyanide obviously can also be observed at \(\sim\) 0.25 V due to the electron transfer. Moreover, this pulse technique provides much more detailed information about the analyte of interest, forward scan and backward scans were measured and detected for the five different concentrations of potassium ferrocyanide. This helps in deeply analyzing the analyte by providing more information and hence increasing the sensitivity of the method used. The response single increased due to the increase of the concentration applied, the current was enhanced due to the electron transfer of the reduction of K\(_{4}\)Fe(CN)\(_{6}\) reaction.

The experiments were conducted under the same standard conditions to verify the accuracy of the process, which is further confirmed by the graphs of peak current values (Ipa) against K\(_{4}\)Fe(CN)\(_{6}\) concentration, where reported before [28]. It is noted that the concentration range examined in these experiments was between 2 and 10 mM.

The result obtained from SQWV performed better compared to DPV due to its sensitivity and low availability of background current [31]. By considering the linear regression equation, it is possible to calculate the limit of detection (LOD) and the limit of quantitation (LOQ) using the formulas below:

$$\begin{aligned} {\rm {LOD}}= &\,{} 3.33x{\rm {SD/m}} \end{aligned}$$
$$\begin{aligned} {\rm {LOQ}}= &\,{} 10x{\rm {SD/m}} \end{aligned}$$

SD indicates the standard deviation for the intercept of the five measurements, and m is the slope obtained from the calibration curves. The LOD values recorded from the analysis for detection of potassium ferrocyanide detection using both techniques are presented in Table 1 based on calibration curves reported before [28]. Both LOD and LOQ are given in mM due to the potassium ferrocyanide solutions that were prepared in the mM unit. The values obtained showed the difference in sensitivity between both pulse electrochemical techniques.

Table 1 Figures of merit for the voltammetric methods of iron II

When machine learning is considered, results are obtained by using six different machine learning algorithms on five different dataset scenarios at different test sizes between 0.2 and 0.8 with an increment of 0.1. For square wave voltammetry (FSQWV) scenario, Logistic Regression had its best accuracy both at test sizes of 0.2 and 0.5 which was 80%, while Backpropagation Neural Network had the best accuracy at only 0.5 test size with 82%. Gaussian Naive Bayes and K-Nearest Neighbors Algorithm had their best accuracy 85% at 0.2 test size. The accuracy of K-Means was lower when compared to other algorithms. The maximum accuracy obtained by the K-Means algorithm was 42.5% at 0.8 test size. And finally, Random Forest which had higher results than other algorithms at all test sizes, obtained 95% accuracy at 0.2 and 0.4 test sizes as shown in Figs. 3 and 4 (Table 2).

Fig. 3
figure 3

Forward square wave voltammetry (FSQWV) Random Forest confusion matrix at 0.2 test size

Fig. 4
figure 4

Forward square wave voltammetry (FSQWV) Random Forest confusion matrix at 0.4 test size

Table 2 Accuracy Results for Forward square wave voltammetry (FSQWV) data

The second scenario was backward square wave voltammetry (BSQWV), where Gaussian Naive Bayes, K-Nearest Neighbors Algorithm, and Logistic Regression algorithms had their best accuracies at 0.2 test size with 90, 90, and 95%, respectively. Backpropagation Neural Network obtained 82.5% accuracy at 0.4 test size, while K-Means had 50% accuracy at 0.7 test size. Again Random Forest algorithm obtained better accuracies at all test sizes when compared to other algorithms. 100% accuracy was obtained at 0.2 test size (Fig. 5, Table 3).

Fig. 5
figure 5

Backward Square Wave voltammetry (BSQWV) Random Forest confusion matrix at 0.2 test size

Table 3 Accuracy Results for Backward square wave voltammetry (BSQWV) data

For net square wave voltammetry (NSQWV) scenario, results were different from the first two. Logistic Regression had 90% of accuracy at 0.2 test size which is also the best result of the scenario which is shown in Fig. 6. At 0.2 test size, Gaussian Naive Bayes and K-Nearest Neighbors Algorithm also had their best results with 85%. Random Forest algorithm, which was the best network for previous scenarios, had 85% of accuracy at both 0.2 and 0.4 test sizes. K-Means algorithm had 50% accuracy at 0.5 test size and for Backpropagation Neural Network highest result was obtained at 0.3 test size with 83.3%. When we look at the algorithm performances for each specific test size, at 0.2 Logistic Regression was the best algorithm as mentioned in the previous paragraph. At 0.3 Backpropagation Neural Network was the network with the highest accuracy which was also mentioned. From test size 0.4–0.8, Random Forest had the highest results with accuracies of 85, 78, 80, 72.9, and 73.8%, respectively. But at the test size 0.5, Gaussian Naive Bayes and K-Nearest Neighbors Algorithm algorithms also shared the result of 78% (Table 4).

Fig. 6
figure 6

Net square wave voltammetry (NSQWV) Random Forest confusion matrix at 0.2 test size

Table 4 Accuracy Results for Net square wave voltammetry (NSQWV) data

For all square wave voltammetry (SQWV) scenario, results were similar to the first two scenarios where the Random Forest algorithm outperformed all other algorithms. All algorithms except K-Means had their highest accuracies at 0.2 test size. K-Means had 46.7% accuracy at 0.3 test size which was much lower than the other algorithms. Backpropagation Neural Network had the second-lowest result with 85%, while the remaining results were 90% and higher. K-Nearest Neighbors Algorithm and Gaussian Naive Bayes had 90% and Logistic Regression passed them with 95%. The final algorithm Random Forest had the highest results at every test size and obtained an accuracy of 100% (Fig. 7, Table 5).

Fig. 7
figure 7

All square wave voltammetry (SQWV) Random Forest confusion matrix at 0.2 test size

Table 5 Accuracy results for all square wave voltammetry (SQWV) data

For the last scenario differential pulse voltammetry (DPV), most of the networks obtained their best results at 0.2 test size. Logistic Regression had 70% accuracy, while Gaussian Naive Bayes, K-Nearest Neighbors Algorithm, and Random Forest all had 85% accuracy at this test size. Backpropagation Neural Network had its best result at 0.3 test size with 70% of accuracy, and the last algorithm K-Means again had lower results according to the other algorithms with the highest accuracy of 56% at 0.5 test size. When we look at each test size, we see that Gaussian Naive Bayes, K-Nearest Neighbors Algorithm, and Random Forest had the better results for 0.2 test size which is also the highest accuracy 85% obtained within this scenario. Their results are shown in Figs. 8, 9, and 10. Random Forest also had better results at the test sizes 0.3, 0.6, and 0.7 with accuracies 80%, 75%, and 67.1%, respectively. Gaussian Naive Bayes and K-Nearest Neighbors Algorithm both had better results for 0.4 and 0.5 test sizes with 75% and 70% accuracies, respectively. And finally, for 0.8 test size Backpropagation Neural Network was the network to have the highest accuracy with 60% (Table 6).

Fig. 8
figure 8

Differential pulse voltammetry (DPV) Gaussian Naive Bayes confusion matrix at 0.2 test size

Fig. 9
figure 9

Differential pulse voltammetry (DPV) K-Nearest Neighbors Algorithm confusion matrix at 0.2 test size

Fig. 10
figure 10

Differential pulse voltammetry (DPV) Random Forest confusion matrix at 0.2 test size

Table 6 Accuracy Results for Differential pulse voltammetry (DPV) data

For comparison and precision verification, several methods were used. To develop upon the algorithm in data classification applied in N. A. Shama et al. [28]. We examined six different algorithm models for the data classifications; Backpropagation Neural Networks, Gaussian Naive Bayes, Logistic Regression, K-Nearest Neighbors Algorithm, K-Means clustering, and Random Forest. Comparable algorithms, type, chemical method, and percent accuracies were obtained, suggesting that both techniques have similar analytical efficiency, as shown in Table 7. However, the key value of the proposed voltammetric methods with machine learning at a much lower cost is strategic over the former. The concentrations of iron II are statistically in agreement with the two samples tested using both techniques. For the determination of iron II in potassium ferrocyanide, the method’s efficacy is also contrasted with other methods reported in the literature, taking into account factors such as time of study, sensitivity, linearity, and repeatability. Quickness, simplicity, and cost-effectiveness were the key benefits of this strategy over others. Study time was nearly short, including the type of algorithm for classification of the chemical data obtained in several studies as shown in Table 3. Moreover, accuracies can be shown obviously different in each article. Regardless of the chemical method used in each study, which indicates the methods that took a longer duration for the analysis. Sensitivity was comparable to that given by the detection of conductivity [33], LAD, CART, and SVM were used as algorithms to classify the chemical data and 81.6% accuracy was achieved. On the other hand, higher accuracy was obtained in the second study of about 94.95%. By using the GBA for classification, 76.6% was obtained, and by improving through using other electrochemical techniques in cooperation with other algorithms, the accuracy increased up to 100% in this study, the total data are shown in more details in Table 8. The repeatability of the method proposed was as good as the other methods offered.

Table 7 The total accuracy results in each electrochemical data
Table 8 Comparison of machine learning with other methods for chemical data classification

This research functioned the potentiostat, which has been previously illustrated by N.A. Shama et al. [28], the objective is to develop mechanical distinction. The essential side and expected following stage in getting this purpose are to improve a universal DPV and SQWV dataset classifier. At this point, we look to develop the electrochemical detection algorithms projected in the previous study [12] and improve a self-determining public library model which can be used to accurately identify on-site electrochemical analytes by using SQWV and DPV data set, looking forward getting more programmed classification method. Estimate the performance of recognized and innovative machine learning and deep learning algorithms; contains the benchmark model algorithm used for SQWV and DPV experiment classification in the former report. We practiced the period cycle’s structure of SQWV and DPV data, inwards where the potential is adapted during measurement at a cycle of systematic intermissions current. The electrocatalytic activity of potassium ferrocyanide solution was measured against iron II using a pencil graphite electrode (PGE). Because of the hybridized carbon of pencil graphite, which shows good adsorption, conductivity, high sensitivity, lower background current, ease of preparation, and properties of surface modification, pencil graphite electrodes have attracted widespread attention among different carbon-based electrodes. Besides, the whole reaction takes place on the PG working electrode surface where the ferrous ions Fe\(^{2+}\) are oxidized into Fe\(^{3+}\) according to equation (3) shown below. As an oxidant, the electrode works, and the current of oxidation rises to a peak. Both pulse voltammetric techniques, DPV and SQWV, were carried out using the same parameters based on start potential of 0V, end potential of 1V, the step height of 0.004 mV, and modulation amplitude of 0.025V. However, the SQWV technique also used a frequency of 2Hz. Each potassium ferrocyanide concentration of 2, 4, 6, 8, 10 mM was analyzed in both voltammetric measurements, respectively, starting from the lowest concentration to the highest one.

$$\begin{aligned} {\rm {Fe(CN)}}_{6}^{-4} \xrightarrow {}{\rm {Fe(CN)}}_{6}^{-3}+-e \end{aligned}$$

The electrochemical data obtained from the two techniques used have been identified and classified by the machine learning models suggested. Furthermore, for practical usefulness and for main scientific concerns, the BP algorithm model can be compared with notes of DPV and SQWV data. For instance, researchers can rapidly classify the overlapping peaks when existing in a voltammogram in a mixed other heavy metals analyte such as Hg, Cd, and Al. By the data collection of each voltammogram peak, peak height, peak area, peak potential, etc., the classified decision can be obtained for each different concentration prepared and used for heavy metal in a way as to provide the machine learning models used all these information to be able to classify with high accuracy [35]. Even though training of maximum machine learning models (such as the models in this study) may need many calculations, strongly and high-cost hardware, the mathematical calculations concentration of the model made by the area demands will be greatly decreased. For the usage in the field, we improved a systematic model application to request the singular analyte model. Further than feasible applied efficiency, classification of machine learning models can be contrasted with scientist’s comments of the electrochemical pulse voltammetry methods. For instance, the oxidation peak matching to 0.25 of the heavy metal sample was easily detected and readable by the researcher to Fe\(^{+2}\). Nevertheless, some oxidation peaks 0.25 V vs. Ag/AgCl were not simply identified by the models scan, especially for the DPV method, whereas the SQWV peaks were more readable and identified due to the more additional information details given from this electrochemical technique to the algorithm model. Nevertheless, by using a trained model, the scheme achieves readings of DPV and SQWV scans on inexpensive processors and computers, which can be attached with our cheap convenient potentiostat voltage. In addition, the most presenting models have been applied in the intranet applications and hence can be easily applied in current electrochemical experiments.

5 Conclusion

This study shows the successful use of different machine learning models to classify the electrochemical data obtained from DPV and SQWV data. The applications of both pulse electrochemical techniques DVP and SQWV for the oxidation of iron II in potassium ferrocyanide model solution using carbon graphite electrodes were successfully studied using phosphate buffer, pH 7.0 solution. The diversity of those voltammetric techniques makes it favorable in overcoming many drawbacks of other analytical methods. The voltammograms obtained were affected by the concentrations of potassium ferrocyanide solutions, as shown in the occurrence of iron II, which oxidized to ferric ions. By involving machine learning in the study, six different dataset scenarios were classified with the popular machine learning algorithms Back Propagation Neural Networks, K-Nearest Neighbors, Logistic Regression, Gaussian Naive Bayes, K-Means clustering, and Random Forest. Used datasets used are all SQWV, forward scan of SQWV, backward scan of SQWV, net of SQWV, and DPV. With our test size iterating system all of the used algorithms were iterated on every dataset individually and their highest results were examined. When we looked at the results in general, we saw that mostly 0.2 test size was the most effective ratio which means the dataset was split as 80% for training and 20% for testing. Results for K-Means were significantly lower than the other algorithms with a maximum of 56% with the dataset obtained by DPV. For other algorithms, the results were over 70% and higher results were obtained from the backward scan of SQWV data and all SQWV data. Random Forest algorithm outperformed all other algorithms and had 100% accuracy both on the backward scan of SQWV data and all SQWV data.

The amount of data will be increased so that deep learning models with convolutional neural networks will be generalized and tested, and cross-validation tests as well as an ablation study will be performed as future work.