1 Introduction

Intestinal parasitic diseases are the most widespread infectious disease, affecting millions of people globally, and they are particularly prevalent in underdeveloped regions where individuals live in unsanitary conditions. The World Health Organization reported that around 1.5 billion people were afflicted with soil-transmitted helminth infections in 2020 [1]. Human intestinal parasites, which are responsible for causing these diseases, such as diarrhea, malnutrition, and anemia, particularly impacting children and impeding their growth, can be categorized into three groups: helminths, protozoa, and ectoparasites [2]. It also affects physical and mental growth, job performance, and education, potentially influencing the quality of the future population and the country’s long-term growth [3]. The physical similarity of parasites and the presence of impurities in samples present difficulties in manually distinguishing between different types of parasite eggs using a microscope [4, 5]. As a result, significant training is required to develop skilled experts to perform diagnoses. This manual evaluation is both labor-intensive and time-consuming, taking an experienced technician an average of 30 min to analyze a single sample [6]. As a result, the development of an automated diagnostic faecal examination for parasitic diseases is essential to overcome the limitations of traditional diagnostic methods. Further, most infected people exhibit no or mild symptoms, it’s important to recognize that parasitic infection grow during pregnancy may result in severe nerve damage and, in some cases, infant mortality [7]. Leishmania is a neglected tropical disease spread out with female phlebotomine sandflies affecting over 700,000 people annually [8]. Moreover, trichomonad parasites found in the intestines, and oral cavity cause the human disease trichomoniasis [9].

Machine learning methods have been used in several studies to analyze microscopic images containing parasite eggs/cysts. Support Vector Machine (SVM) [10, 11] and Artificial Neural Networks (ANN) [12, 13] are examples of such systems. Prior attempts towards automating the detection and estimation of intestinal parasites [14, 15] involved complex processes that involved image processing and machine learning classification. These methods generally rely on extracting features from a set of measurements, especially intensity, dimension, and surface texture. As a result, considerable work is necessary during the feature extraction stage to fine-tune the features. Despite these efforts, none of these methods have achieved widespread acceptance due to generalizability issues as well as replication, comparison, and extension difficulties. Over the last decade, deep learning-based algorithms have been improved as a result of advances in computer performance and the availability of image datasets [16]. Deep learning has been shown to be extremely effective in solving a wide range of problems in a variety of disciplines, including text recognition, computer-aided diagnosis, face identification, and drug development [17, 18].

We applied the Faster-RCNN detector [19] as the foundation for our research, as it has exhibited favourable precision and speed when applied to images as to other deep models. Medical image analysis, on the other hand, presents distinct obstacles. Supervised deep learning requires large training datasets, which can be difficult to come by for medical images due to their high acquisition costs and the labour-intensive nature of manual annotation. To overcome these limitations, we propose expanding the baseline training dataset through data augmentation. While many data augmentation approaches use image transformations such as rotations and translations [20], we adopt a different CycleGAN approach [21], an unsupervised system capable of generating images based on annotated source images from a different modality. Our findings show that combining CycleGAN and Faster-RCNN provides an efficient and effective method for augmenting datasets and recognizing intestinal parasites in microscopy images.

Our work encapsulates several major contributions, which are summarized as below:

  • To provide fully automated proposal for dealing with low-quality intestinal parasite images captured using portable devices in clinical practice.

  • To provide an oversampling strategy that does not require a paired dataset, effectively capturing domain variability and improving dataset representativeness.

  • To provide robust methodology for detecting parasites in data-scarce contexts, significantly improving on existing state-of-the-art methods.

  • Extensive experimentation is used to validate our methodology, demonstrating its suitability, robustness, and uniqueness in augmenting intestinal parasite images with CycleGAN architectures and detecting with Faster-RCNN.

The rest of the present manuscript is organized into the following sections: Sect. 2, “Related Work," comprehensively outlines the important resources to reproduce our work. Section 3 “Methodology” provides details of the proposed strategy, experiment setup and specific parameters for each experiment to evaluate the performance that is discussed in Sect. 6 “result and Discussion.” Sect. 5, ‘‘Evaluation’’, shows the various metric tools to validate the recent work. Section 6, ‘‘Results and Discussion’’, shows the findings and detailed analysis achieved after the validation of the proposed method was performed. Finally, Sect. 7, “Conclusions,” succinctly provides contributions and noteworthy aspects, emphasizing the importance of our findings validated through extensive experimentation.

2 Related work

2.1 Object detection

Various architectural designs that perform better in object detection tasks have served as inspiration for the development of deep convolutional neural networks, which are used in modern methodologies in medical images for detection, classification, and segmentation tasks. Here, we presented an examination of some current methodologies used in the field of microorganisms to detect parasite eggs/cysts from microscopy images. Waithe et al. [22] evaluated how well state-of-the-art neural network designs detected luminous cells in microscope images. L Von et al. [23] introduced the ZeroCostDL4Mic, which allows researchers with no coding expertise to train and apply key deep learning networks to perform tasks including segmentation, object detection, and denoising. Kumar et al. [18] proposed an efficient and effective framework for intestinal parasite egg detection using YOLOv5, which achieved a mean average precision of approximately 97% for detection. Deep learning-based detection methods are widely classified into two approaches: two-stage and one-stage methods. In the former, models are trained separately for two unique tasks: detecting regions of interest and classifying and localizing objects. The Region-Based Convolutional Neural Network (R-CNN) algorithms are among the best in this area [13, 24]. These approaches make use of modules for feature extraction, classification, and regression, with region proposal handled by a distinct convolutional network in [4]. In the field of medical image analysis, regression forests have typically been the most effective statistical detection methods [2]. As observed in [25, 26], these methodologies have been deployed in a cascaded fashion, going from a global to a local environment. AI platform enables non-programmers to use AI for microscope image processing. ResNeXt-50–32 4d algorithm outperforms others with 96.83% accuracy and an F1-score of 96.82%. MobileNet-V2 strikes a balance between 95.72% accuracy and computational cost. Deep learning methods, on the other hand, are fast gaining popularity in this domain. Primarily, Faster RCNN has been used to recognize objects in parasite images [27], while Fast RCNN has been used to detect parasite eggs in medical images [28]. Our proposed framework applies a deep learning framework in two steps. The first step involves image enhancement before input into the object detection model. This enhancement is achieved through a Cycle Generative Adversarial Network (CycleGAN) model that is trained to convert low-resolution images into high-resolution ones. Finally, the object detection is then performed using a Faster-RCNN model, with ResNet50 as its backbone.

2.2 Data augmentation

Research organizations have explored the use of CycleGAN, an unsupervised technique to synthesize unpaired images particularly from one domain to another domain [29]. CycleGAN has been a frequently used method for creating synthetic datasets of image collection. Its primary potential to handle unpaired data, which is extremely useful in our situation. Image acquisition of multiple modalities for the same subject under identical conditions is usually not possible. CycleGAN has been used in prior studies, including [30], which used it to produce chest X-rays images for pneumonia detection, and [31], which used it to generate lung MRI images from CT images for lung tumour segmentation. CycleGAN is used to produce target modality images from labeled source images, and the source labels are then translated to the target domain. Additionally, several proposals were explored for synthetic image creation for over-sampling the original sample collection. These techniques, as demonstrated by Bouteldja N et al. [32, 33] and Motamed S et al. [34], make use of distinct GAN frameworks in similar contexts.

3 Methodology

The proposed approach has divided into two stages, which are meant for data augmentation and object detection, as shown in Fig. 1. The first stage focuses on synthetic image synthesis, with the CycleGAN algorithm. Section 3.2 has more information about the first stage. The second stage focuses on the detection of intestinal parasites from microscopy images with customized Faster-RCNN algorithm. Section 3.3 explores the workings of the second module.

Fig. 1
figure 1

Experimental setup of the proposed framework

3.1 Datasets

To evaluate the proposed framework, we obtained the intestinal parasite image dataset from Chulalongkorn University in Thailand. The total number of parasite images and dimensions are shown in Table 1. The dataset collection contains images obtained with different devices under different environmental conditions. The dataset includes 2,500 images categorized into 5 classes, consisting of 500 images of Ascaris lumbricoides (AL), 500 of Hookworm (HW), 509 of Fasciolopsis buski (FB), 500 of Taenia spp. (TS), and 500 of Hymenolepis nana (HN). These images, shown in Fig. 2, display distinct features, with some showing clear definitions and others showing blurriness or variations in lighting conditions. Furthermore, the resolution, color saturation, and contrast differ depending on the microscope used. The presence of debris in the background also varies significantly among the images. Thus, we proposed a framework that allows the transformation of input data so that the architecture would not experience any reduction in performance or model generalization. To train the CycleGAN model, images from [35] were utilized. We standardized the image size to 416 × 416 for compatibility with the Faster-RCNN algorithm. Furthermore, we divided the dataset into training, validation, and testing sets, with proportions of 70%, 20%, and 10%, respectively.

Table 1 The dataset include different species of parasite eggs in varying sizes and resolutions
Fig. 2
figure 2

View of parasitic cyst/eggs under microscope

3.2 Model architectures and training details

3.2.1 Augmentation methods

In this experiment, we implemented deep neural network based CycleGAN algorithm() to generate synthetic intestinal parasitic images. The cyclic nature of this algorithm employs reverse transformation i.e. the architecture capable of converting generated images back into original images. CycleGAN architectures are widely employed in medical image analysis for image-to-image generation due to their robustness, flexibility and encouraging results on related problems. The CycleGAN model has two generators, each paired with a discriminator. The key concept in CycleGAN is the cycle consistency loss function, which is used to optimize the framework. Here's how it performs the operation: the output from the first generator can serve as the input image for the second generator, and the resulting image from the second generator should match the original image. Similarly, the output image from the second generator can be used as the input image for the first generator, and it should match the input image from the second generator, as shown in Fig. 3.

Fig. 3
figure 3

A representation of the CycleGAN architecture, which was modified for this work's research studies to detect intestinal parasite

CycleGAN operates at the batch level: it is given a set of images in domain X and another set of images in domain Y. The goal is to learn the mapping G:X → Y in such a way that the distribution of images in domain X closely approaches the distribution of images in domain Y, such that the training images are indistinguishable from the original dataset. It uses adversarial losses for the mapping function, just like regular Generative Adversarial Networks. Equation (5) describes this function and its related discriminator, Dy.

$${L}_{GAN}\left(G,{D}_{y},X,Y\right)= {E}_{y\sim Pdata(y)}\left[logD\left(y\right)\right]+{E}_{x\sim Pdata(x)}\left[1-log{D}_{y}\left(x\right)\right]$$
(1)

In this context, we have the generator G, which aims to produce images similar to those in domain Y, and the discriminator D, whose task is to separate the generated image using G as effectively as possible from a genuine image y. When the parameters of the generator model G are changed, G attempts to minimize certain factors, whereas D aims to maximize certain aspects when the parameters of the discriminator model D are updated. However, if the network's capacity is high enough, it may translate the same collection of input images to any arbitrary arrangement of images in the destination domain. This, however, does not guarantee that each input x and output y are properly matched. Through successive mappings using G, this can result in a similar distribution in y, rendering the loss useless. To overcome this problem, CycleGAN combines the original and inverse mappings and employs a cyclic consistency loss to provide a meaningful link in both directions.

In this study’s CycleGAN model incorporates two mapping functions, G: → X and F: → Y, as well as the related adversarial discriminators Dy and Dx. CycleGAN introduces two cycle consistency losses to further regularize the mapping process: the forward cycle loss assures that when an image travels from one domain to another and back, it recovers to its initial state, as represented by the equation x → G(x) → F (G(x)) ≈x. Similarly, the backward cycle loss assures that an image closely approximates y when it moves from y to (y) and then back to G(F(y)). In the CycleGAN network, the overall loss is composed of various components, including the discriminator loss for X → Y, as indicated in Eq. 3.

$${L}_{GAN}\left(G,{D}_{y},X,Y\right)= {E}_{y\sim Pdata(y)}\left[log{D}_{y}\left(y\right)\right]+{E}_{x\sim Pdata(x)}\left[{\text{log}}(1-{D}_{y}\left(G(x)\right))\right]$$
(2)

The discriminator loss Y → X is indicated in Eq. 4

$${L}_{GAN}\left(F,x,X,Y\right)= {E}_{x\sim Pdata(x)}\left[log{D}_{X}\left(x\right)\right]+{E}_{y\sim Pdata(y)}\left[{\text{log}}(1-{D}_{X}(F\left(x\right)))\right]$$
(3)

The cycle consistency loss generated by generators is indicated in Eq. 5

$${L}_{cyc}\left(G,F\right)= {E}_{x\sim Pdata(x)}\left[||F(G\left(x\right))\right]-x{||}_{1}]+{E}_{y\sim Pdata(y)}\left[||G(F\left(y\right))\right]-y{||}_{1}]$$
(4)

CycleGAN network final loss is given by Eq. 6

figure a

The goal is to solve:

$${G}^{*},{F}^{*}={{\text{arg}}min}_{G,Y}{min}_{G,Y}{L}_{GAN}\left(G,F,,{D}_{x},{D}_{y}\right)$$
(6)

Regarding the training configuration, following parameters are applied for CycleGAN setup. About 200 epochs are required for the training of CycleGAN, with a fixed learning rate of 0.0002 for the first 100 epochs and linear decay till zero for the remaining epochs. The training process utilizes the Adam algorithm (Kingma & Ba, 2014) with decay rates of β1 = 0.5 and β2 = 0.999. Loss weights are set as follows: λA = 10.0, λB = 10.0, and λidt = 0.5. The Table 2, outlines the hyper-parameter settings utilized in the training process of the CycleGAN model. The process of hyper-parameter tuning is a crucial and iterative one that requires several rounds of experimentation. It is essential to strike a balance between exploring new configurations and refining promising ones while undertaking this process. In this regard, we acknowledge that CycleGAN has a higher convergence rate, which mitigates the risk of mode collapse, a common concern in GANs. Additionally, we have observed that the Adam optimizer requires less hyper-parameter tuning compared to SGD.

Table 2 Hyper-parameters setting for CycleGAN model

3.3 Detection module

As we mentioned in the section above, Faster RCNN is our detector since it broadly shows a good balance between speed and accuracy. Using this method, an image is first divided into a grid of S × S cells. Three important elements must be predicted for each grid cell: the coordinates of bounding boxes, a confidence score indicating the presence of an item, and a class probability if an object is detected within the bounding box. We use Faster RCNN with ResNet50 as the backbone in our research, as shown in Fig. 4. We use scale-dependent box priors, which we learn from the training set, to improve prediction accuracy. Faster RCNN also incorporates cross-layer connections between each pair of prediction layers, except for the output layer. Specifically, the dataset is randomly partitioned into three subsets, 70% of the samples allocated for training, 20% for validation, and 10% for testing. The initialization of the trained model involves adopting weights from another model previously trained on the ImageNet dataset [36]. These weights were optimized over 200 epochs using the [37] with a mini-batch size of 4, a first-order momentum of 0.9, and a consistent learning rate (α) of 0.01. Table 3 includes an assortment of hyper-parameters for the Faster-RCNN model, which have been fine-tuned. These hyper-parameters can be utilized to reproduce the outcomes with simplicity. Furthermore, it provides a valuable reference point for training a network utilizing other datasets that have comparable sample sizes to ours.

Fig. 4
figure 4

Faster-RCNN architecture containing ResNet50 backbone

Table 3 Hyper-parameters setting for Faster R-CNN

4 Evaluation metrics

Different detection performance metrics were considered to offer comprehensive insights into the evaluation of the proposed methodology. Accuracy, F1-score, recall, precision, and mIoU were taken into consideration to provide an extensive evaluation. To be more specific, in this scenario, the metrics are derived using True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN) as reference values. The framework demonstrates improvements across most of the metrics listed in Eq. 7–10.

Metric

Formula

 

Accuracy

\(\frac{TP+TN}{TP + TN + FP + FN}\)

(7)

Recall

\(\frac{TP}{TP + FN}\)

(8)

Precision

\(\frac{TP}{TP + FP}\)

(9)

F1-Score

\(2*\frac{Precision*Recall}{Precision+Recall}\)

(10)

5 Results and discussion

The primary evaluation criteria's precision, recall, F1-score, and mean Intersection over Union (mIoU) were used to assess how well the proposed framework performs for every class in the intestinal parasitic dataset. In this context, we also leverage the popular concepts of true positive, true negative, false positive, and false negative. The mIoU score was calculated by combining the IoU values of each type after evaluating and assigning an IoU score to all types. Moreover, we evaluated performance in terms of precision and recall to compute the F1-score for each egg type with IoU ≥ 0.5. The mAP@[0.5:0.95] represented the average mean Average Precision (mAP) across various IoU thresholds with a 0.05 step. The following scenarios were used to test the effectiveness of the proposed framework. First, using the original dataset, that is, without any previous image enhancement we trained the Faster-RCNN and conducted tests on the original image domain. The results obtained after using the proposed framework on the original dataset are shown in Table 4. Next, we used an enhanced dataset using standard augmentation methods to evaluate the proposed model's performance. We opted to conduct this experiment to investigate whether the pre-processing on this dataset could improve performance in parasite eggs/cysts detection. In addition, we used a number of processes and settings to modify the test input data in order to reproduce a wide range of variability in the results. We tested the model with this changed dataset.

Table 4 Precision, recall, F1 Score and mIoU for settings original training domain/testing domain (no augmentation) assessed in this study for the classes Ascaris lumbricoides (AL), Hookworm (HW), Fasciolopsis buski (FB), Taenia spp. (TS), and Hymenolepis nana (HN)

This trained network before being transmitted to the detector network without being retrained.

Finally, we noticed that the detection performance in the original dataset was significantly improved with the synthetic dataset generated using the CycleGAN augmentation model. Generally, the transformations presented to the images using CycleGAN include brightness, image rotation, vibrant color, contrast, motion blurring, and saturation. The Faster-RCNN architecture is trained on these images. Table 5 shows the metrics averaged obtained in the validation dataset. It seems that the performance of the Faster-RCNN trained on CycleGAN-enhanced dataset images achieved the highest precision, recall and F1-Score. The accuracy and loss performance of proposed framework are shown in Figs. 5 and 6 respectively. We demonstrate images in Fig. 7 that were fed into the proposed framework under different scenarios. It depicts that the framework sometimes experienced difficulties in making accurate predictions on original the dataset. This issue, however, is effectively addressed with the enhanced image processed through the CycleGAN model.

Table 5 Precision, recall, F1 Score and mIoU for settings enhanced dataset (CycleGAN augmentation) assessed in this study for the classes Ascaris lumbricoides (AL), Hookworm (HW), Fasciolopsis buski (FB), Taenia spp. (TS), and Hymenolepis nana (HN)
Fig. 5
figure 5

Comparing accuracy over epochs between proposed framework and other models

Fig. 6
figure 6

Comparing loss over epochs between proposed framework and other models

Fig. 7
figure 7

Depicts parasite eggs detection after process through proposed framework under different scenarios

6 Comparison against objects detection state-of-the-art methods

We conducted a thorough evaluation of our methods against leading object detection techniques: the Single Shot Detector, AlexNet, ResNet, YOLOv5 and Faster R-CNN, as shown in Table 6. SSD is known for its lightweight architecture, which can recognize multiple items in a single shot. On the other hand, Faster R-CNN requires two steps: first, identifying regions of interest (ROI), and then detecting objects within each ROI using convolutional neural networks (CNNs). Although this approach makes Faster R-CNN slower compared to other deep learning models, it is more accurate and robust. In our tests, SSD uses the VGG-16 backbone, while Faster R-CNN uses the ResNet50 architecture. It's worth noting that we found the You Only Look Once (YOLO) paradigm unsuitable for our application due to its inability to accurately detect small objects in images. Table 7 shows comparisons of the proposed framework with other methods in terms of speed and memory usage.

Table 6 Comparison of CNN based models against proposed framework
Table 7 Comparison indicator of models regarding inference time and memory efficiency

7 Conclusions and future work

Labeled medical imaging data is both scarce and expensive to generate, posing a major challenge to developing generalized deep learning models that require substantial amounts of data. To address this limitation, standard data augmentation techniques are specifically employed to enhance the generalizability of deep learning based models. However, seeking innovative approaches, generative adversarial networks (GANs) have emerged as a novel method for data augmentation. In this context, we proposed the CycleGAN model to generate synthetic datasets to overcome the data scarcity problem. This data augmentation principle is based on the idea of translating low-resolution parasitic images from a normal scenario to higher resolution in a fully automatic way, generating a new synthetic intestinal parasitic image dataset. This dataset is then merged into the original dataset to enhance the amount of data for the training process. Further, images collected using portable devices provide lower-quality and less detailed images than those captured by stationary cameras. In this regard, the proposed research demonstrates the feasibility of converting lower- dimensional dataset into enhanced-dimensional datasets to improve intestinal parasite eggs/cysts detection from images. The goal of this technique is to make it easier to apply automatic screening methods and models in a realistic clinical scenario.

To validate our framework, we evaluated the performance of the Faster-RCNN network on the newly generated dataset and then tested it with previously unseen data. Our results demonstrated that the proposed trained detector, based on high-quality data generated with the CycleGAN module, enhanced performance. Moreover, we could also perform additional experiments to evaluate the detection performance of other deep learning architectures. This experiment improved the intestinal parasitic detection performance to 0.97, 0.97, and 0.95 based on mIoU, precision, and F1-score, respectively. Moreover, the framework efficiently detects intestinal parasites from microscopic images affected by brightness variations, blurring, noise, and chrominance. As the present efforts focus solely on locating a few types of parasites from images, our plans involve extending this approach to encompass other parasite types as well.