Towards Phytoplankton Parasite Detection Using Autoencoders

Phytoplankton parasites are largely understudied microbial components with a potentially significant ecological impact on phytoplankton bloom dynamics. To better understand their impact, we need improved detection methods to integrate phytoplankton parasite interactions in monitoring aquatic ecosystems. Automated imaging devices usually produce high amount of phytoplankton image data, while the occurrence of anomalous phytoplankton data is rare. Thus, we propose an unsupervised anomaly detection system based on the similarity of the original and autoencoder-reconstructed samples. With this approach, we were able to reach an overall F1 score of 0.75 in nine phytoplankton species, which could be further improved by species-specific fine-tuning. The proposed unsupervised approach was further compared with the supervised Faster R-CNN based object detector. With this supervised approach and the model trained on plankton species and anomalies, we were able to reach the highest F1 score of 0.86. However, the unsupervised approach is expected to be more universal as it can detect also unknown anomalies and it does not require any annotated anomalous data that may not be always available in sufficient quantities. Although other studies have dealt with plankton anomaly detection in terms of non-plankton particles, or air bubble detection, our paper is according to our best knowledge the first one which focuses on automated anomaly detection considering putative phytoplankton parasites or infections.


Introduction
Phytoplankton are key players in aquatic systems, mediating biogeochemical cycles and forming the base of the food webs [1].Phytoplankton population dynamics result from the interplay between resource availability and mortality losses [2].While some loss mechanisms, such as grazing, are well known, the contribution of other loss mechanisms such as parasitism remains poorly considered and largely understudied in many aquatic systems.Phytoplankton are susceptible to a wide variety of parasites such as viruses, bacteria, protists and fungi.They can cause mortality of certain phytoplankton species, thereby altering phytoplankton bloom dynamics and changing the cycling and flow of energy and matter in aquatic ecosystems [3], [4], [5].
Zoosporic or nanoflagellate parasites infecting phytoplankton encompass a highly diverse functional group of eukaryotic protist and fungal species [6].They have in common the production of free-living motile stages as their infective propagules which attach to a phytoplankton host cell and further develop either inside (endobiotic) or outside (epibiotic) the host cell, using the host resources for their growth and reproduction.Due to their inconspicuous nature they are difficult to identify and something we are not able to identify typically tends to get overlooked or discarded.Consequently, although the presence and potential importance of these phytoplankton parasites is increasingly recognized, quantitative data in nature is extremely scarce.
An additional challenge includes capturing rapid infection dynamics on a relevant temporal (eg., days) and spatial scale.Obtaining quantitative information of parasite infections by traditional methods is labour-intensive and time-consuming, limiting the spatial and/or temporal coverage of studies investigating phytoplankton-parasite interactions [7].
Recent technological advances in imaging instruments have made it possible to collect large volumes of plankton image data to study plankton populations, opening new research possibilities [8].The possibility of high frequency sampling enabled with imaging instruments can allow to better understand phytoplankton dynamics and their potential interaction with parasites [7].However, while methods for automatic recognition of phytoplankton classes have been widely developed, methods to automatic recognition of parasites infections are still underdeveloped for phytoplankton.This is likely associated to challenges to obtain a sufficient amount of image data of plankton parasites, which requires screening of a huge amount of raw image data, urging for automated solutions for the task.
The scarcity of plankton parasite images is the major challenge for developing deep learning-based computer vision methods for detecting them.While object detection methods such as Faster R-CNN [9] and YOLO [10] have been shown to achieve high accuracy on various detection tasks, including parasite detection (see, e.g., [11]), they struggle when the amount of training data is limited.Therefore, a more promising approach is to formulate parasite detection as an anomaly detection task.Here the idea is to train the model with images of healthy plankton and to detect images that deviate from the data the models were trained on.Due to the availability of large amounts of plankton image data without parasites for training and relatively small intra-class variation among healthy samples, the images that deviate notably from the training data can be expected to contain potential parasites.
In this work, phytoplankton parasite detection is considered.The problem is formulated as an anomaly detection problem and solved using an autoencoder.The proposed method consists of a vector-quantized variational autoencoder (VQVAE) [12] that encodes the input image into a compressed latent representation and uses it to reconstruct the original image.The rationale is that when the autoencoder is trained only on images of healthy phytoplankton it fails to reconstruct the parasites which allows to detect them from the difference image (see Fig. 1).The proposed method further employs the HardNet [13] feature extractor and Local Outlier Factor [14] for classifying between healthy plankton and plankton with parasites.
In the experimental part of the work, an extensive set of different backbone convolutional neural networks (CNNs), autoencoder architectures, feature extractors, and classifiers are systematically evaluated on challenging phytoplankton image data to find the best combination to demonstrate the performance of the proposed method.Furthermore, we compare the autoencoder-based anomaly detection method to a Faster R-CNN based object detector.The results show that the proposed method obtains comparable accuracy to the state-of-the-art Faster R-CNN object detector while requiring no images with parasites for training.This makes the autoencoder based method a promising approach for wider utilization in plankton image analysis where collecting large training data of plankton with parasites is infeasible.

Related work
Anomaly detection is a data classification technique where a detector models the representation of samples within a specification (OK) and where it distinguishes all samples different from the specification as anomalous (NOK).This problem could be challenging because of potentially high diversity among the NOK samples, imbalance between the number of samples in OK and NOK, and irregularity of the NOK class.A comprehensive overview describing the anomaly detection problems, techniques, and categorization is presented in [15].First time introduced on image data in [16], autoencoder (AE) models are widely used in computer vision.The authors of [17] were the first to make the use of worse generalisation ability on different kind out-of-training data of the AE models to detect anomalies in both synthetic and real-world telemetry data using a fully connected AE trained only on the data without anomalies.Based on the results, such an AE can be used to detect previously unseen anomalous samples.This concept was further enhanced and used also on image data, for example, in [18] and [19].A comprehensive overview of the AE techniques can be found in [20].
Plankton anomaly detection has been studied in the context of open-set recognition, i.e., image classification with the present of previously unseen classes (plankton species).In [21], the authors presented an unsupervised approach to classify a plankton sample and to detect potential significant differences (i.e., anomalies) with the respect to the detected class.Image features are extracted using classical computer vision methods and they consists of geometrical, moment-based, and other traditional features.
In [22], a CNN, trained on the OK samples and the artificial NOK samples derived from the OK data by common data augmentation techniques as blurring and noise adding, is used as the feature extractor.An anomaly score is then computed from those features and it is used together with the trained feature extractor to distinguish between the plankton samples and anomalies.Here air bubbles and non-plankton water particles are considered as anomalies.
In [23], the authors used a parallel network of custom statistical classifiers called TailDeTect (TDT) to discover previously unseen plankton species.Each of the TDT classifiers is trained on one particular species, and a sample is considered as unknown if none of the classifiers is able to detect it.Unknown samples are collected and validated by experts.Feature extraction and the concept is based on [21].
In [24], open-set recognition plankton recognition was addressed using a similarity learning approach.Metric learning with angular margin loss was applied to obtain image embedding vectors that model the similarity between images.The anomalies (images from previously unseen classes) were detected by setting a threshold values for the similarity.
Faster R-CNN [9] is a popular deep learning algorithm that has been successfully applied to various domains and tasks, including object detection and anomaly detection.Anomaly detection using Faster R-CNN involves training the model on abnormal images to learn the features of abnormal instances.Then during classification, the model is used to detect abnormal samples that deviate from the expected one.For instance, in industrial manufacturing, abnormal behavior can include machine malfunctions, while in medical diagnosis, it can be unusual patterns in medical images.
For example, in [25], an improved Faster R-CNN to detect defects in steel plates was used.The algorithm was trained on a dataset of abnormal regions on steel plate images and was able to accurately detect anomalies such as cracks and holes in test images.Similarly, in [26], a subtle modification of Faster R-CNN to detect anomalies in CT images of lungs was considered.
Also object detection methods have been successfully used on the parasite detection, for example, in [11].In this case, a YOLOv5 object detector was used to detect a parasitic mite on the honey bee's body.An overview of the other object detection techniques and commonly used dataset could be found for example in [27].
In plankton research, Faster R-CNN is widely adopted for segmentation and object detection.In [28] several object detection approaches including Faster R-CNN were utilized to evaluate a synthetically augmented dataset.Similar work was presented in [29] where a plankton dataset from a darkfield microscope was compiled, and then tested with various object detection methods, including YOLOv3 [30], R-CNN [31], and SSD [32].

Proposed methods for phytoplankton anomaly detection
To detect phytoplankton samples with anomalies, we primarily study an unsupervised autoencoder based approach, followed by different feature extractors and one-class classifiers.To compare the results of our proposed method with a state-of-the-art approach, we utilize supervised object detection based on the Faster R-CNN [9].

Autoencoder-based approach
The proposed method to detect anomalous plankton samples is based on top of the framework available in [33].This implementation allows to test various combinations of AE cores, convolutional layers, feature extractors, and oneclass classifiers.For the approach, we combined five AE cores, six convolutional encoders and decoders, six feature extractors, and four classifiers (720 combinations in total).The processing pipeline is shown in Fig. 2 and it is described in more detail in the sections below.In the approach, anomaly detection is based on the comparison between the original and autoencoder-reconstructed data, followed by feature extraction and one-class classification.

Autoencoder architectures and convolutional layers
As the first step of anomaly detection, we use AE models trained only on the OK data to reconstruct unknown input samples of both OK and NOK classes.Because of the non-optimal generalisation ability of the AE models and training only on the OK class of data, we hypothesize that data from the NOK class will be reconstructed worse than the data from the OK class as described in [17].
To better understand the effect of the AE architecture's core and the complexity of the convolutional encoding and decoding layers, we decided to build our implementation so that the core of the model could be combined with the selected convolutional pairs of the encoders and the decoders.This allows us to analyze contributions of the selected architecture and the convolutional layers separately.
For the AE cores, we evaluated five different options.As the first ones, we used implementations of the basic convolutional AE [34] as the BAE1 core, convolutional variational AE [35] as the VAE1 core and the vector-quantized AE [12] as the VQVAE1 core.Besides those cores, we tried to further reduce the features extracted by an encoder by inserting fully-connected layers to the basic convolutional AE as the BAE2 core [36] and to the variational AE as the VAE2 core.Described modifications are shown in Fig. 3.We expect that the basic convolutional AE is going to be surpassed by both variational and the vector-quantized cores, because of their non-probabilistic encoding space, which allows the encoding of more anomalies residuum.The quality of reconstructed images should be better in the case of the basic and vector-quantized cores than in the case of variational ones, which typically produce blurry outputs.When training on different classes, the best results are expected from the vector-quantized core, which are supposed to create separable clusters for each class in the encoded space.
Besides the AEs cores described above, we consider six pairs of convolutional encoding and decoding layer architectures, whose structure is described in the complementary tables Table A1 for encoders and in Table A2 for decoders.Each convolutional layer or block described in those tables is complemented with the batch normalization layer.Activation function was set as Leaky ReLu by the ConvM1 architecture and as ReLu for the rest.
The tested convolutional layers go from the most complex ConvM2 suggested for the anomaly detection in [19] and ConvM1 architectures, where we expect the ability to reconstruct fine features and details, to the more simple architectures ConvM5, ConvM4 and ConvM3.By the more simple architectures, we expect that the fine features and smaller image structures are going to be suppressed and that they might perform better on the shape or structure anomalies.The last architecture ConvM6 is unsymmetrical as suggested in [37] and it uses a more complex encoder of the ConvM5 architecture and a simpler decoder of the ConvM4 architecture.By this architecture, we expect that the anomalies, which would be propagated to the encoded space, will be further suppressed by the decoder reconstruction.
In the optimal case, anomalous areas of the original image are removed during the image reconstruction as shown in Fig. 1.A difference image between the original sample and the reconstructed sample is computed and used in the feature extraction.

Feature extraction
The second step the framework applies feature extractors to analyze the reconstructions.The features are based on the comparison between the original and reconstructed data (Error metrics, HardNet3 and HardNet4), or the difference image (SIFT feature extraction, HardNet1 and HardNet2).
The first feature extraction approach (Error metrics) creates a lowdimensional feature vector for each image by computing selected error metrics between the original and reconstructed image.The L2 and SSIM metrics applied in [36] are complemented with the Average hash and Mean squared error metrics.
The second feature extraction method (SIFT feature extraction) uses scale and metrics properties of the image keypoints found by the SIFT method and it is a direct re-implementation presented in [38].This method uses difference images between the original and reconstructed data.
The last four feature extraction methods (HardNet1, HardNet2, HardNet3 and HardNet4) are all based on the batch similarity metric presented in [13].HardNet1 is the most simple one where each sample is described by the Hard-Net (HN) feature vector of the original image resized to the size of 32x32 as required by the original HN implementation.Since such resizing might not be optimal for small anomalies, the HardNet2 splits the image of the original size to the blocks of 32x32 and computes the HN feature vector for each such block.The resulting feature vector consists of the norms over those vectors.HardNet3 splits the original and reconstructed images to the 32x32 blocks as HardNet2 method, but the resulting feature vector is computed as a cosine similarity between the HN feature vectors of the corresponding blocks of the original and decoded images.HardNet4 uses the same technique, but the cosine similarity is supplemented by the logarithm which is supposed to emphasise smaller differences of the HardNet3 feature vector.
A 2D visualisation of the resulting feature space obtained by the ConvM5-BAE2 autoencoder over the Aphanizomenon plankton species using the HardNet2 feature extractor is shown in Fig. 4. The OK samples form an elliptical cluster and most of the NOK samples are separated from that cluster.

One-class classification
For the classification part, we used the following one-class classifiers: • Robust covariance (RC) [39]: The RC classifier assumes the same distribution for all OK samples and fits an elliptic envelope to the central data point.The anomaly score is computed using the distribution estimations and Mahalanobis distance.The fraction of anomaly samples for the OC-SVM, IF and LOF was set to 1% being the minimum value of common implementations.We should also assume, that even some OK samples might differ from the majority.All classifiers are fit on the dataset containing only OK samples.
Input features for the one-class classification are normalized using the robust scaling, which normalizes the median and the interquartile range as suggested in [42].This normalization should be more robust to the outliers than the simple normalization approaches such as the min-max normalization or standardization.
In order to select the optimal decision threshold for anomaly detection, we use the Equal error rate (EER) over the ROC curve of the classifier as it is shown in Fig. 5.All classifiers are fit only on the OK data and the ROC curve was obtained from the test dataset.

Object detection based approach
The Faster R-CNN [9] algorithm is composed of three main components: a base feature extractor network, a region proposal network (RPN) for extracting the regions of interest, and a detector that uses the region proposals and respective feature maps to classify the detected objects as shown in Fig. 6.The first component is the feature extractor responsible for generating feature maps from the input image.Usually, this module is a CNN such as VGG-16 or ResNet-50.
The RPN is a kind of fully convolutional network that takes the feature maps from the previous step and returns a set of region proposals that guides the detector on where to find the objects in the image.Then the proposals and corresponding feature maps from the CNN are utilized to yield candidate objects with bounding boxes and fixed-length feature vectors using the ROI Pooling layer.Finally, those outputs are passed to the R-CNN network.The R-CNN network uses the proposed feature maps to classify each bounding box as an object or background and predict final class scores with the bounding boxes.
For our object detection experiments, we used the Faster R-CNN implementation available from [43] based on the ResNet-50 backbone presented in [44].In order to employ an anomaly detection task in the Faster R-CNN baseline, the architecture is supplemented by a one-class classification module based on the predicted object labels as shown in Fig. 7.
Since the anomalies such as parasites are relatively small compared to the image size, it is important to consider the anchor generator which is a part of the region-proposal network.Anchors define regions of an image, usually of  different aspect ratios and sizes, that are used as references to detect objects.The anchor generator creates a set of anchors for each location in a feature map, then for each region of interest, the model predicts which anchor box best encloses the object.The choice of an anchor generator mostly depends on the type of detection task.For example, if we want to detect small objects, then a smaller anchor size should be used; On the other hand, if the task is to detect objects of various sizes, then a range of anchor sizes should be defined [9].Additionally, the aspect ratios of the anchors should match the aspect ratios of the objects in the image.
As suggested in [11], three separate object detectors are considered, each trained on different ground truth: 1) plankton and anomalies, 2) plankton (clean) and anomalous plankton, and 3) anomalies only (see Fig. 8).In the first column, we can see that the model detects a plankton sample in both cases and an anomaly in the top row.The second column shows a detection of plankton sample with anomaly in the top row and a detection of a clean sample in the bottom row, and finally, the third column shows a detection of a anomaly in the top row only.

Experiments
In this chapter, we describe the used datasets, the evaluation metrics, and the results of the autoencoder based experiments and the object detection based experiments.

Phytoplankton anomaly dataset
Natural Baltic Sea phytoplankton communities are continuously imaged with an Imaging FlowCytobot [45] deployed at Utö Atmospheric and Marine Research Station (59°46.84'N, 21°22.13'E).The IFCB is connected to the station flow-through system which receives water pumped from a ∼5 m deep inlet located 250 m offshore, representative of the sub-surface layer.At Utö, IFCB takes a 5 ml sample nearly every 20 minutes and is set to trigger based on the detection of chlorophyll a, targeting phytoplankton cells rather than non-living particles.The research station and IFCB deployment at Utö are described in detail in [46] and [47].
The phytoplankton data from Utö IFCB can be currently classified near real-time into 50 different classes as described by [48].Putative parasite infection images were manually annotated by experts using another Utö data collected between February-August 2021, using phytoplankton data from nine classes.These classes were selected based on their importance during the spring or summer blooms in the Baltic Sea.
In our experiments, we use a phytoplankton anomaly dataset derived from the annotated images used to train the classifier described in the previous paragraph with the OK samples from the dataset published in [48] and the NOK samples from the unpublished 2021 Utö data.It contains over 6200 manually annotated and expert-validated samples throughout 9 plankton species with known anomalies as is shown in Table 1.Non-anomalous and anomalous samples of each class are shown in Fig. 9.As an annotation tool, we used the free version of the Label Studio available at [49].Annotated dataset is available online at [50] in both COCO and YOLO format.

Dataset annotations
When annotating the dataset we needed to use three kinds of labels to achieve a separate species set as described below: • Label Anomaly marks the parasite, or other kinds of anomalies on the plankton sample.• Label PlanktonSpecies Anomaly marks the plankton species with the attached parasite.• Label PlanktonSpecies Clean marks the plankton species with or without the parasite.
The last two labels could overlap, but if it was possible, the Plankton-Species clean label does not cover the sample part with parasite.In the need of distinguishing between the OK and NOK sample, PlanktonSpecies clean label should be removed if it overlaps with the PlanktonSpecies Anomaly one.
An example of the annotation over a Dolichospermum plankton species sample is shown in Fig. 10.The red color marks a plankton anomaly and in this case, the darker green marks the clean sample and the lighter green marks the sample with anomaly.In order to help the AE model to learn more robust features, we added saltand-pepper noise to the image samples used during the training with a clean sample used as a label as suggested in [51].Besides this noise augmentation, we also use random flipping, contrast, saturation, brightness, inversion, and hue augmentation.
Because the HardNet based feature extractors work correctly only with image sizes of multiples of 32, all samples were resized with a respect of the major aspect ratio of each class (1:4 by five classes, 1:1 by three classes and 1:2 by one class) as could be seen in Fig. 9.For the experiment over all classes, we have chosen the aspect ratio of 1:2 as a compromise.

Derived dataset for object detection based experiment
For the object detection experiment, the model is trained in a supervised manner.The split ratios were set as 70%, 10% and 20% for the training, validation, and test subsets, respectively.Training and validation sets do not include clean samples, while a test contains a balanced number of anomalies and clean images.
Furthermore, we applied the following augmentation techniques: horizontal and vertical flip with a probability 30%, random brightness, contrast, and saturation adjustment with a probability 10%.

Performance metrics
To compare the results of the autoencoder and object detection experiments, we need to evaluate the predictions of the models with the respect to the ground truth labels.To do so, we can define true positive (TP) and true negative (TN) predictions, where the model correctly classifies OK and NOK samples together with the false positive (FP) and false negative (FN) predictions, where the model misclassifies NOK samples as OK in the FP case and OK samples as NOK in the FN case.
For the comparison between the different variations of autoencoders and object detection methods, the Precision, Recall and F1 score metrics are used.The metrics are defined as follows: Similarly as precision and recall, we can also define specificity as follows: In the autoencoder experiment, we complemented the metrics with the area under curve (AUC) score.This parameter is defined as the area under the Receiver Operator Characteristics (ROC) curve, for which an example is shown in Fig. 5.This curve is obtained by changing the decision threshold of a binary classifier by a defined step and by plotting the resulting specificity on the x-axis and recall on the y-axis for the each threshold step.Each point of the ROC curve then corresponds to one threshold setting.

Autoencoder-based approach
Due to the high number of combinations in our framework (five autoencoder types, six pairs of convolutional layers, six feature extractors and four classifiers), we decided to split the experimental results to two parts.The first part describes the optimal combination of model, feature extractor and classifier trained on all datasets together with its selection criteria whereas the second part describes the optimal results achieved per plankton species.The optimal combination of the autoencoder model, convolutional layers, feature extractor and one-class classifier was determined based on the maximal achieved F1 score.The whole implementation is built using the TensorFlow 2 platform [52] and Scikit-learn library [53].

Optimal model for anomaly detection
To select an optimal anomaly detection model, we analyzed the results of all model combinations over the experiment with all plankton species.The best results were achieved with the autoencoder ConvM2-VQVAE1, HardNet1 as a feature extractor and Local Outlier Factor one-class classifier.Those results are shown in Table 2 and  To further demonstrate that the selected anomaly detection model performs best, we evaluated the F1 score over 1) model combinations with a fixed feature extractor and classifier (see Table B3), 2) feature extractors combinations with a fixed model and classifier (viz Table B6) and 3) classifier combinations with a fixed model and feature extractor (viz Table B7).The highest F1 score was 0.75 consistently over all described combinations.

Optimal anomaly detection model per plankton species
Results of the optimal anomaly detection models per plankton species are shown in Table B8.The detection results are approximately 10% better than when using one anomaly detection model trained and all datasets, which could be particularly important for the Centrales and Chaetocero species with the lowest values of the performance metrics.This is nevertheless traded with a need of a separately trained anomaly detection model for each plankton species.

Object detection based approach
Results of the Plankton vs Anomalies, Plankton vs Anomalous Plankton and Anomalies only experiments over all samples are shown in Table 3. Plankton vs Anomalies experiment contains both large bounding boxes of plankton annotations and small bounding boxes of anomalies.Therefore, the anchor generator was set up as follows.Sizes of feature map were 16, 32, 64, 128, 256 and 512.In Plankton vs Anomalous Plankton experiment, only the large bounding boxes were used and the sizes were 64, 128, 256, 512, and 1024.For Anomalies only, the sizes were 4, 8, 16, 32, 64, and 128.The scales and the aspect ratios of sizes for each experiment were the same: 0.5, 1.0, 1.5, 2.0, 3.0.The highest F1 score was achieved with the Plankton vs Anomalies experiment and the lowest one with the Anomalies only experiment.We were able to reach a high F1 score also by Plankton vs Anomalous Plankton experiment, but the resulting plankton labels are often misleading in this case.For the object detection approach, one major issue is that the model is incapable to distinguish between plankton parts and anomalies as shown in Fig. 12.This major drawback of the Faster R-CNN object detection method originates from the architecture itself and could not be solved by anchor modifications or any other parameter tuning.To have a better comparison with Table B8, we also performed Anomalies only experiment trained on species-specific data whose results are shown in Table 4.In this experiment, the approach performed worse than the speciesspecific autoencoder experiment and also worse than the universal model within the same experiment on average.We also provide supplementary material as Table C10, Table C9 and Table C11 which shows the results of Plankton vs Anomalies, Plankton vs Anomalous Plankton and Anomalies experiments with the respect to the individual plankton species.

Discussion
Learning to detect parasites from phytoplankton images is a challenging problem due to large variation in appearance of the plankton cells and parasites, small size of parasites combined with the limited spatial resolution of images, as well as, scarcity of training data caused by the relative rarity of the plankton cells with parasites.Even for an expert it is often impossible to confirm the parasitic nature of all the attached (non-host) structures from the images with certainty.For example, spherical structures that are attached to the host cell and have a different appearance to the phytoplankton cell are typically parasites, but can be also loosely attached free living (i.e., non-parasitic) cells or phytoplankton-cell derived organelles expelled from the cell due to stress.This makes it infeasible to collect annotated training and test data just on phytoplankton parasites.Therefore, we formulated the problem as anomaly detection, where the goal is to detect the phytoplankton cells that deviate from "healthy" cells.The method can be seen as an anomaly detector of putative parasite infections allowing to screen large volumes of plankton image data to obtain a subset of interesting images for further analysis.
For the study, a large dataset of phytoplankton cell images with and without anomalies was collected.Untypically for most anomaly detection studies, the collected dataset contains a relatively large amount of images with putative parasites (NOK samples).This made it possible to train also supervised object detectors for the task, and to compare unsupervised anomaly detection methods with the object detector based methods.We evaluated two approaches to detect whether an image contains anomalies or not: 1) autoencoder-based anomaly detection approach and 2) Faster R-CNN-based object detector for anomalies.
For the autoencoder-based approach, a full pipeline consisting of the CNNbased autoencoder architecture, feature extraction from the reconstruction and difference image, and one-class classifier was proposed.Various methods to each part of the pipeline were considered and extensively evaluated.The best overall accuracy (F1 score) was obtained using the combination of vectorquantized variational autoencoder (VQVAE) [12] architecture with the CNN backbone by [19], HardNet feature extractor [13] and Local Outlier Factor classifier [14].The F1 score for the best combination varied between different species from 0.6 (Centrales) to 0.83 (Peridiniella Single) with an overall F1 score of 0.75 over all species.The ablation study (Tables B4-B7) demonstrated the superiority of the proposed combination over the other alternatives.The accuracy can be further improved by optimizing the method for each phytoplankton species separately as can be seen from Table B8 with F1 scores varying from 0.73 to 0.94.While fine-tuning class-specific models reduces the generalizability of the method, these are promising results for studying parasites on individual plankton species.
The limited amount of training data is a notable challenge for the supervised object detectors.To properly learn the large variation in the appearance of anomalies, the training stage would require a sufficient number of example images for each class.The difficulty of the detection task is further emphasized by the fact that the anomalies are typically attached to the host cell and are often very small compared to the plankton cell.These challenges are apparent when observing the Faster R-CNN results (Table 3) where the accuracies are not as high as are commonly seen in object detection problems.Three configurations for the R-CNN-based method were evaluated: 1) detection of anomalies only, 2) detection of both anomalies and plankton cells, and 3) detection of healthy plankton cells and plankton cells with anomalies as separate classes.Based on the results it is evident that learning how normal plankton cells look like is beneficial for the R-CNN.The model trained only on anomalies tends to often detect parts of phytoplankton itself as anomalies.It was further noticed that when trained on healthy plankton cells and plankton cells with anomalies as separate classes, the detector often fails to correctly detect the bounding boxes and classify in the presence of anomalies, but still produces correct classification results (OK vs. NOK).This raises questions about the generalizability of the method.
The Faster R-CNN-based method (model trained to both anomalies and plankton cells) achieved higher accuracy (F1 score: 0.86) than the autoencoderbased method (F1 score: 0.75).This is understandable as the autoencoderbased method did not have access to the images with anomalies during the training stage.Faster R-CNN-based method is more suitable when enough annotated training data is available.This, however, is not typically the case in plankton anomaly and parasite detection due to the reasons discussed above.The autoencoder-based approach has some notable advantages over the supervised object detectors: 1) no training data with anomalies is needed, 2) no annotated bounding boxes are needed, and 3) the method works also with previously unseen anomalies.These together with the comparable accuracy to the R-CNN-based method makes the proposed autoencoder method a more promising approach for plankton anomaly detection on new datasets.
Being able to screen large volumes of plankton image data for anomalies has potential to noticeably reduce the manual work and allows more extensive research on parasitic infections.Ecologically speaking, separating cells with anomalies is interesting and can lead to new research avenues in the future.Anomaly detections can give a valuable first hint of putative parasite infections (or physiologically stressed phytoplankton).However, further method development is needed to make it possible to distinguish between the different types of anomalies and relate them with more certainty to parasites.An interesting future direction would be to apply clustering methods for anomalies to identify different types of anomalies.The classification of anomaly types could be validated by a wider community effort with expertise on different parasite groups, epibionts and symbionts.In combination with classifying different types of anomalies, different datasets could also be collected from known parasites/epibionts/host-derived anomalies from culture systems.Detected parasites and their presence could be further qualitatively confirmed in parallel by additional methods such as microscopy or eDNA-based approaches.

Conclusion
This paper presents an unsupervised anomaly detection approach to detect anomalies in nine phytoplankton classes.Although there exists studies on plankton anomaly detection in the context of open-set recognition, i.e. detecting previously unseen plankton classes and non-plankton particles, our paper is according to our best knowledge the first one focusing on the detection of small anomalies such as potential phytoplankton parasites or infections in the known set of plankton classes.
We propose an anomaly detection pipeline consisting of vector-quantizedvariational autoencoder (VQVAE) in combination with the HardNet feature extractor and the Local Outlier Factor classifier.With this pipeline, we achieved an average F1 score of 0.75 for all nine analyzed phytoplankton species.We also suggest that the achieved anomaly detection results could be further improved by optimizing the components of the pipeline for each species separately.
The results achieved with this approach were compared to a supervised Faster R-CNN object detector experimented in three configurations: 1) detection of anomalies only, 2) detection of anomalies and plankton cells, and 3) detection of plankton cells with and without anomalies.The best results were achieved with the second configuration with an average F1 score of 0.86.Although this score is higher than the one achieved by the the suggested autoencoder, our approach is more universal because it can detect also previously unseen data and it needs no anomalous samples for its training.These benefits make the proposed autoencoder approach more promising for plankton research where the annotated anomaly datasets are not available or not feasible to collect.We have made our code and used dataset publicly available as a part of this paper.

Fig. 1
Fig. 1 Original (a), encoded space (b), reconstruction (c) and the difference image (d) of the Centrales plankton species's anomalous sample.

Fig. 4
Fig. 4 Example feature space of the Aphanizomenon plankton species.
• One-class SVM (OC-SVM)[40]: The OC-SVM classifier utilizes the support vector machine (SVM) and a non-linear kernel to create a separating hyperplane of the training data from the origin of the feature space.Samples on the other side of this hyperplane are considered as anomalies.•Isolation Forest (IF)[41]: The IF classifier uses random feature selection and splitting for isolating observed samples.The anomaly score is based on the total number of splits.Anomalies are supposed to have a smaller number of splits as it should be easier to separate them.• Local Outlier Factor (LOF)[14]: The LOF classifier is based on the local density deviation of the observed point with respect to its k-nearest neighbours.Density of the anomalies should be lower in comparison with the OK samples, which are considered to create denser clusters.

Fig. 5
Fig. 5 Illustration of equal-error-rate (EER) threshold selection criterion on the ROC curve.

Fig. 8
Fig. 8 Object detection tasks using the Faster R-CNN approach: (a) Plankton vs Anomalies; (b) Plankton vs Anomalous Plankton; (c) Anomalies only.The NOK samples are shown in the top row and the OK samples in the bottom row.

Fig. 9
Fig. 9 Anomalous (left column, or upper row) and non-anomalous samples (right column, or lower row) from all dataset classes of the used dataset.

Fig. 10
Fig. 10 Example of the annotation bounding boxes

Fig. 12
Fig. 12 Plankton species without any anomalies recognized as an anomaly by Faster R-CNN.The model is confused by a chain of cells and can not distinguish between plankton parts and anomalies.

Table 1
Species-specific statistics of the Plankton anomaly dataset.

Table 2
Species specific results with the optimal combination over all plankton species based on the highest F1-score (ConvM2-VQVAE1, HardNet1, Local Outlier Factor).

Table 3
Faster R-CNN detection results.

Table 4
Species-specific dataset Faster R-CNN detection results.

Table A1
Overview of the tested convolutional encoder models

Table A2
Overview of the tested convolutional decoder models

Table B4
Results over all plankton species for fixed convolutional layer architecture (ConvM2), feature extractor (HardNet1) and classifier (Local Outlier Factor).Table C11Faster R-CNN detection results for Anomalies experiment.