Introduction

Inland water bodies harbor several single and multicellular organisms whose composition varies from changing climatic conditions (Olano et al. 2020). Phytoplankton or single-cell green growth forms the basis of an aquatic ecosystem that directly or indirectly alters the optimum living conditions for all biotic and abiotic components of its surrounding (Abreu et al. 2020). During optimum conditions, frequent gatherings of freshwater algae like diatoms, green algae, and blue-green algae lead to the formation of “algal blooms” (Young et al. 2020). Huge quantities of “sprouting” green growth have the potential to produce toxins and resulting in lower oxygen levels, subsequently making critical issues for shore-based organizations (e.g., hotels, cafés), and the fishery business (Klemas 2012; Anderson et al. 2015). Besides being awful, these HABs can also be harmful to human health and aquatic ecosystems, as swallowing or swimming in contaminated waters, eating poisoned fish or shellfish, or inhaling airborne droplets of contaminated water can all expose people to HAB toxins. While harmful algal blooms are not a new phenomenon (Spanish explorers observed blooms along Florida's coast in the 1500s), freshwater harmful algal blooms have risen exponentially in past decades and are now a global environmental problem (Hallegraeff et al. 2021). These abrupt invasion needs to be characterized and restrained in their primary phase for sustainability (Zhou et al. 2020). Algal tracking and classification are some of the central strides in algal bloom management (Le Bourg et al. 2014). Conventional identification methods (like flow cam microscopes and cytometers) are tedious and work-exhaustive; thus, improved strategies to achieve unswerving algal identification and classification are the need of the hour (Barteneva and Vorobjev 2016; Dashkova et al. 2017). To resolve this issue, deep learning, an artificial intelligence (AI)-based technique, can give potential and unique ways to deal with fast algal detection (Franco et al. 2019).

Recently, various studies have applied AI-based methods to characterize these bloom-forming algae by various neural networks but their accuracy and reliability are uncertain. Promdaen et al. (2014) exhibit a computerized acknowledgment framework using texture and shape features for the classification of 12 algal genera by sequential minimal optimization (SMO). The affirmation for the viability of the technique regarding 97.22% characterization exactness has been done by them. Li et al. (2017) exhibit a promising and proficient arrangement through the Mueller matrix image analysis framework dependent on the deep neural network for the grouping of morphology, shape, and external features based on comparative algal studies. For the characterization of the algal images, only a few studies were taken into account when examining algal bloom using CNN.

Medina et al. (2018) applied CNN for the discovery of algae in submerged pipelines where algae and sand were deposited on their surface. With the use of a pre-trained deep residual convolution neural network, Deglint et al. (2019) proposed an innovative system for classifying six algal genera and achieved an accuracy of 96%, while Ruiz-Santaquiteria et al. (2020) attained an average sensitivity of 95%, with 57% precision and 60% specificity for a dataset of diatoms comprising of 126 images. The present research is an effort to formulate the best convolution neural network model out of four frequently used models (MobileNet V-2, VGG-16, AlexNet, and ResNeXt-50) to accelerate the distinguishing proof and characterization (with high exactness) for 15 phycotoxins-producing marine algal genera including Amphidinium, Chatonella, Cochlodinium, Gymnodinium, Karenia, Lyngbya, Ostreopsis, Prymnesium, Pseudo-Nitzschia, Tolypothrix, Gambierdiscus, Coolia, Protoceratium, Karlodinium, and Dinophysis.

Material and methods

For strong performance on image classification, CNNs have made great achievements. The current research applied the most acceptable deep CNNs, including MobileNet version 2, Vgg16, AlexNet, and ResNeXt 50, and also examined the potential capacity of these models when applied to the dataset comprising of algal pictures. A proportional analysis of the performance of models is given for 15 bloom-forming algal genera. Usually, a CNN structure consists of multiple convolutionary architecture blocks and a layer that is completely connected. A convolutionary layer conducts operations of convolution over the performance using a set of filters or kernels to extract the characteristics of the preceding layers that are important for classification.

Studied models

The research focuses on advanced CNNs such as MobileNet version 2, Vgg16, AlexNet, and ResNeXt 50, which have numerous hidden layers of components between the input and output layers, as shown in Fig. 1.

Fig. 1
figure 1

Architecture for classification of bloom-forming algal images. A AlexNet, B VGG16, C MobileNetV2, and D Modified ResNeXt-50

MobileNetV2

MobileNet is a simplified architecture that uses deep divisible convolutions to render deep convolutionary neural organizations lightweight and offers an efficient model for flexible and implanted vision applications. MobileNet is based on profoundly separable convolutions, which consist of two inner core layers: convolutions in-depth and convolutions in stage (Sae-Lim et al. 2019).

VGG 16

VGG 16 is a deep CNN built by researchers working on the relation between the depth and efficiency of a CNN at the University of Oxford Vision Geometry Group and Google's Deep Mind (Simonyan and Zisserman 2015; Zhang et al. 2016). The architecture of a VGG-16 is identical to a regular ANN and involves an input layer, a sequence of convolutionary layers with an output layer.

AlexNet

AlexNet is one of the most powerful architectures among many CNN architectures that are commonly used to solve image classification problems (Krizhevsky et al. 2017). The samples are reduced along with the spatial coordinates in the pooling sheet. This mechanism is known as decimation. For each picture, the fully connected (FC) layer calculates the class scores and provides the forecast. For each prediction class, the probability score is calculated and the class that scores the highest probability score is chosen as the predicted class.

Modified ResNeXt

A classic neural network used as a foundation for many computer vision tasks is ResNet, (short for Residual Networks). Original ResNeXt CNN model was being developed by Xie et al. (2017) but in the present research, the model has been modified as per the modifications applied by Pant et al. (2020) in the case of Pediastrum classification while Yadav et al. (2020) for multiple algal classifications (Xie et al. 2017; Pant et al. 2020; Yadav et al. 2020).

Activation functions

The activation mechanism was the first development. In neural networks, activation features have been used to obtain nonlinearity. Therefore, typical choices of activation function include logistic function, tanh, Arctan feature, etc. But these functions tend to run into a gradient disappearing problem in deep models since the gradient is only a large value when the input is around a small range of 0.0.

A new activation function—rectified linear unit (ReLU)—was used to resolve this problem. ReLU can be defined as

$${\text{ReLU}}\left(z\right)=\left\{\begin{array}{c}z\\ 0 \end{array}\right. \, \genfrac{}{}{0pt}{}{\text{if } z > 0}{\text{if }z \le 0}$$
(1)

An element-wise activation feature is employed in the rectified linear unit (ReLU) layer. By adding nonlinearity to the system and applying the function-f(k) = max (0, k), this layer replaces all negative activations with 0. Another important activation function applied is Softmax, a sigmoid nonlinear function that is used to manage several groups. It requires a vector of real numbers to assign the input into the appropriate label as the input performs the probability distribution over it. The formula is as follows:

$$ \Phi (\mathop X\nolimits^{ \to } ) = \frac{{e^{{X_{i} }} }}{{\sum\nolimits_{k = 1}^{J} {e^{{X_{k} }} } }} $$
(2)

where Φ = softmax, X = input vector, eXi = standard exponential function for input vector, J = number of classes in the multi-class classifier, \({e}^{{X}_{k}}\) = is standard exponential function.

Optimizers

Root-mean-square propagation (RMSprop) and adaptive moment estimation (Adam) have been applied as an optimizer for different models tested in this research as RMSprop for MobileNet version 2 and VGG16 while the Adam was being used in Alex Net and ResNeXt models. In RMSprop, gradients can be measured by taking the square average of each weight into account and then dividing it by the square the root of the mean square. In Eq. 3, π means parameter, β is the learning rate, δ means the term of decay and gt is the gradient at a time "t".

$${\pi }_{t+1 }= {\pi }_{t}-\frac{\beta }{\sqrt{\left(1-\delta \right){g}_{t-1}^{2}+\delta {g}_{t+\varepsilon }}} \cdot {g}_{\mathrm{t}}$$
(3)

While Adam stands for adaptive moment estimation, it is by far the most common and commonly used optimizer in DI.

$${\omega }_{t+1}= {\omega }_{t}-\frac{\theta }{\sqrt{\widehat{{v}_{t}}+\mu }} \widehat{{m}_{t}}$$
(4)

In the equation ω represents parameters, while the Ө is learning rate, \(\widehat{v}\) signifies the gradient. It is possible to simplify the math representation for Adam in the following way:

$$ {\text{Weights}} = {\text{Weights}}{-}\left( {{\text{Momentum}}\;{\text{and}}\;{\text{Variance}}\;{\text{combined}}} \right) $$

Results

Deep learning-based algorithms for microscopic image analysis of a wide range of microorganisms, including viruses, bacteria, fungus, microscopic algae, and parasites, have been developed to address the challenges faced by human-operated microscopy (Grimes et al. 2014). These algorithms leverage pixel patterns as the primary feature for image analysis and may thus be simply applied in biological-image analysis with unprecedented potential (Reguant et al. 2021).

Dataset

Present research target 15 genera of bloom-forming algae with a dataset comprised of 450 algal images as input data. These images were gathered from various open access web depositories (CRIS database, galerie.sinicearasy.cz) and past phycological examinations by phycologists in previous studies as mentioned in Yadav et al. (2020).

Data augmentation

Data augmentation increased the quantity of these photographs to 90,000, as shown by Pant et al. (2020) and Yadav et al. (2020). This data augmentation was done in such a way that each class received an equal amount of photographs. Later, these equally fractionated data have been further subdivided into two groups for training and testing of the model, 80% (72,000) and 20% (18,000) of images, respectively.

Training and validation

The morphological traits of these algae genera were used to classify them. The input size was deep enough to get photographs with a size of 200 by 200 pixels, a batch size of 32, and an initial learning rate of 3e-3 before the training and testing functions were conducted. The marking of each class was acted for both training and testing pictures. Adam optimizer has been used for compiling the proposed model. For the calculation of training loss and the testing loss, categorical cross-entropy has been applied. The accuracies and the losses related to these training and validation have been depicted with their respected models MobileNetV-2, VGG16, AlexNet, and modified ResNeXt in Fig. 2. To assess the efficiency of the models, parameters like accuracy (acc), precision (pre), recall (re), and F1-score confusion matrix has been calculated as described as follows:

$$ {\text{Accuracy}}\;{\text{(ACC)}} = \frac{{{\text{True}}\;{\text{positive}}\;{\text{(TP)}} + {\text{True}}\;{\text{negative}}\;{\text{(TN)}}}}{{{\text{True}}\;{\text{positive}}\;{\text{(TP)}} + {\text{True}}\;{\text{negative}}\;{\text{(TN)}} + {\text{False}}\;{\text{positive}}\;{\text{(FP)}} + {\text{False}}\;{\text{negative}}\;{\text{(FN)}}}} $$
(5)
$$ {\text{Precision}} = \frac{{{\text{True}}\;{\text{positive}}}}{{{\text{True}}\;{\text{positive}} + {\text{False}}\;{\text{positive}}}} $$
(6)
$$ {\text{Recall}} = \frac{{{\text{True}}\;{\text{positive}}}}{{{\text{True}}\;{\text{positive}} + {\text{False}}\;{\text{negative}}}} $$
(7)
$$ {\text{F}}1-{\text{score}} = 2 * \frac{{{\text{Pre}} * {\text{Re}}}}{{{\text{Pre}} + {\text{Re}}}} $$
(8)
Fig. 2
figure 2

Graphical representation for training and validation loss for studied CNN models. MobileNetV-2, VGG16, AlexNet, and modified ResNeXt—Validation accuracy (A, C, E, and G) and Validation loss (B, D, F, and H)

The error matrix

The outputs of the proposed algorithms were determined by plotting the confusion matrices. The projected class instances are represented by distinct rows of the matrix, whereas the actual class instances are represented by separate columns. With so many discrepancies in the confusion matrix for MobileNetV2, it is apparent that the transfer learning-based MobileNetV2 fails to discriminate between algae with similar morphologies (Fig. 3A). These errors of recognition have been minimized with 40% accuracy for MobileNetV2 to 96% accuracy for Vgg16 and 98% accuracy for AlexNet, respectively (Fig. 3B and C). While only 17 images of Pseudo-nitzschia were incorrectly recognized as Lyngbya in the confusion matrix of the updated ResNeXt-50 (99% accuracy) as illustrated in Fig. 3D, this could be owing to shape similarities (length, unbranching filaments) between Pseudo-nitzschia and Lyngbya.

Fig. 3
figure 3

Confusion matrices for CNN models—A MobileNetV-2, B VGG-16, C AlexNet, and D Modified ResNeXt

F1-score and receiver operating characteristic (ROC) curve

In the proposed work, the model performance is evaluated using F1-score, precision, and recall. To assess the effectiveness of the predictions, precision-recall can be a useful indicator when classes are very unbalanced. Precision measures the value of true positive results while recall measures the truly relevant results that are being returned during the information retrieval. The F1-score reflects the equilibrium between accuracy and recall. The F1-score is 2*(precision*recall)/(precision + recall)) and also known as “f measure”. As per the results, the values of measures like precision, recall, and F1-score are increasing for the CNN models from MobileNetV-2, to VGG16, and further to AlexNet, and modified ResNeXt, respectively (Fig. 4). The precision has been increased up to 0.99 for ResNeXt as the best value for the F1 score, which is 1 has been achieved (i.e. perfect precision and recall) for all the classes except Lyngbya and Pseudo-nitzschia which could also be as a result of somewhat morphological resemblances between these two.

Fig. 4
figure 4

The ROC curves of the proposed models and class-wise ROC curve area for studied CNN models. MobileNetV-2 (A and B), VGG16 (C and D), AlexNet (E and F), and modified ResNeXt (G and H)

Discussion

Toxins produced by HABs can be harmful to fish and other aquatic creatures. These toxins migrate up the food chain after being digested by small fish and shellfish, affecting larger animals such as sea lions, turtles, dolphins, birds, and manatees. However, the actual health risks presented by these toxins in water resources utilized for recreation and drinking water to the general public, pets, livestock, and wildlife are yet unknown but due to various natural and anthropogenic activities, global trends in the prevalence, toxicity, and risk posed by harmful algal blooms are commonly assumed to be on the rise. Rapid classification of HAB forming algae is a need of time because the health effects, HAB toxins can range from minor to severe, and in some cases, lethal, depending on the quantity of exposure and the type of algal toxins involved.

Using electron microscopes, morphological studies revealed differences in traits such as the flagellar apparatus, cell division mechanism, and organelle structure and function, all of which are significant in algal categorization. Standard microbiological techniques focused on isolation and identification, as well as molecular techniques, are needed to characterize the microalgal community. Li et al. (2017) used the convolutional neural networks (CNNs) for the classification of algae (with morphological resemblances) and achieve a 97% accuracy by Mueller matrix imaging system. With the advancement of artificial intelligence, a deep convolutional neural network (CNN) employing microscopic images of algae could substantially aid in detecting water quality and become a major solution for image categorization (Wang et al. 2020). The performances of the automated models have been deeply impacted by the comparative morphological appearance of various bloom-forming algae (Zhang et al. 2021). The accuracies of the models have been compromised when the algae have similar morphological features and seek a detailed analysis to resolve this miscalculation.

In this research, the performance of 4 deep learning algorithms (MobileNet V-2, VGG-16, AlexNet, and ResNeXt-50) with different geometric augmentations have been done. An exertion has been done to identify a technique with the potential to speed up the identification without compromising the accuracy for 15 phycotoxins-producing algal genera including Amphidinium, Chatonella, Cochlodinium, Gymnodinium, Karenia, Lyngbya, Ostreopsis, Prymnesium, Pseudo-Nitzschia, Tolypothrix, Gambierdiscus, Coolia, Protoceratium, Karlodinium, and Dinophysis. For doing the same, a dataset of 450 algal images was taken as input data and these images were collected from various open access web depositories (CRIS database, galerie.sinicearasy.cz) and other publicly available dataset as suggested by Yadav et al. (2020). The augmentation of these images was being done and increased the number of these images up to 90,000 pictures (Han et al. 2021). In the later stage, these images have been divided into two groups 80% (72,000) and 20% (18,000) of images for training and validation purposes to various deep learning models, respectively (Elgendi et al. 2021).

It is feasible to develop convolutionary neural networks from scratch and be successively trained on different datasets to achieve optimum efficiency (von Chamier et al. 2021). This method takes a substantial amount of time and effort so the idea of transfer learning based on MobileNet version-2 has been applied first in this research. However, the confusion matrix reveals that MobileNet version-2 is unable to distinguish between algae with similar morphology and appearance. The performance of a classification algorithm can be summarized using a confusion matrix, as confusion matrix makes it simple to envisage which categorization models are correct and where it shows error. Our findings support the use of confusion matrix analysis in validation because it is resistant to any data distribution or type of relationship, accomplishes a rigorous evaluation of validity, and provides additional data on the type and sources of errors.

In the confusion matrix for MobileNet version-2, lot of errors have been noticed when it differentiates the algae like Amphidinium, Coolia, Ostreopsis, Protoceratium, and Gymnodinium (Fig. 3A). These confusions may arise as a result of morphological similarities between Amphidinium and Ostreopsis, resulting in a significant inaccuracy in the interpretation of 576 pictures. As Amphidinium cells are dorso-ventrally flattened, athecate dinoflagellates with a minute epicone while oval to teardrop-shaped with thecal plates are present in Ostreopsis cells with scattered pores as an internal sieve-like structure. Coolia and Ostreopsis classifications, on the other hand, noticed 613 errors. Spherical cells and anteroposteriorly compressed cell shape with rounded hypotheca of Coolia could be the primary factors for the affinity. Apart from that, MobileNet version-2 has noted similar types of confusions in other algae as well, and the findings for MobileNet version-2 with a low F1-score and an accuracy of 40.68% were not satisfactory. These substandard results with transfer learning lead this research to apply other CNNs like VGG-16, AlexNet, modified ResNeXt50 and the values of measures like precision, recall, and F1-score have been achieved in the increasing order for these models, respectively (Table 1). Despite the fact that the majority of the errors reported in the case of MobileNet version-2 were resolved with the use of VGG-16, the confusion matrix (Fig. 3B) suggests that 115 images of Chatonella were incorrectly read as Ostreopsis (due to their similar external morphology), aside from some minor discrepancies by the model VGG-16. These errors drag VGG-16's accuracy to 96% and lowering the F1 score to 0.96 (Table 1). The error matrix for AlexNet and ResNext50 (Fig. 3C and D) have shown that the misperception has been reported only with the Tolypothrix and Pseudo-nitzschia, respectively; it may be due to the morphological affinity to other algae.

Table 1 The F1-score, precision, and recall values for the studied models

AlexNet and ResNext50 reported 98 and 99% accuracy with high F1 scores of 0.97 and 1, respectively (Table 1). The results of this study were similar to those obtained by Pant et al. (2020) and Yadav et al. (2020) with classification accuracies of 98.45 and 99.97%, respectively, for the same modified ResNeXt50 deep learning model. As a result of implementing these well-proven models, modified ResNext50 (with 99% accuracy) surpassed existing algal classification models and paving the path for future research such as the development of artificial intelligence-based detectors and other identification tools.

Deep learning models have significant implications in algal classification due to their ability to accurately classify different species of algae based on their morphological and genetic features, aid in their identification, and aid in conservation efforts (Li et al. 2017). These learning models are capable of analyzing large datasets of images faster than humans, automating the classification process, and allowing researchers to process more data in less time (Wang et al. 2020). Furthermore, deep learning models are scalable, easily adjustable to the size of the dataset and the complexity of the classification task.

However, there are limitations to the use of deep learning in algal classification. These include datasets bias, the reliance on data quality, difficulties in interpretability, and the risk of overfitting to training data (Elgendi et al. 2021). These limitations should be taken into consideration when interpreting the results of deep learning models in algal classification (Zhang et al. 2021). Future research should focus on developing more transparent and interpretable models that can handle biases and low-quality data. In conclusion, deep learning models have significant implications in algal classification, but their limitations should be considered. Future research should focus on developing more transparent and interpretable models that can handle biases and low-quality data.

Conclusion

The applications of deep learning in image recognition and classification of the bloom-forming algae, a major cause of water pollution, will be on the front line in the new innovative products, technologies, and ideas that can improve our environment. For the categorization of 15 bloom-forming algae, the MobileNet V-2, Visual Geometry Group-16 (VGG-16), AlexNet, and ResNeXt-50 models were tested with the goal of identifying or developing the best-suited convolution neural network (CNN) model for effective monitoring of bloom-forming algae. VGG-16 and Alex Net may appear to be good at first glance, with 96 and 98% accuracy, respectively, but another well-known algal-classifier model, ResNeXt-50, outperformed these two with 99% classification accuracy. The results proved that ResNeXt-50 model architecture to algal image collections can reliably distinguish both large and small particles, and it is resilient against a range of imaging conditions and datasets. The current study is an attempt in HAB science with an eye headed for new ideas and approaches, in rapid identification and where possible, the unpredicted yet promising new horizons that will be taken by the research in the diverse field of environmental studies.