Introduction

Mosquitoes spread many diseases including malaria, dengue, chikungunya, yellow fever, and Zika1. In Africa alone, malaria is responsible for the deaths of more than 750,000 people annually, most of them being children2. For efforts to model and combat the spread of mosquitoes and mosquito-borne diseases, identifying the gonotrophic stage of the female mosquito is critical for assessing behavior, age, and suitability for different analyses3,4. This gonotrophic cycle governs the reproduction of mosquitoes, and consists of four stages. The first is when a female adult is yet to consume a blood meal, the unfed stage. When a female mates, it needs to take a blood meal for the eggs to mature. When the mosquito has acquired a full blood meal, its eggs are ready to mature, and the mosquito is in the fully fed stage. The next stage is when the eggs have started to mature as the blood is being digested, and this is called the semi-gravid or half-gravid stage. When the eggs are fully mature (i.e., when the blood is fully digested), the mosquito is in the gravid stage.

Our goal in this paper is to automate the identification of gonotrophic stages in a female mosquito. Today, this process is manual, time-consuming, and requires trained expertise that is increasingly harder to find. Each mosquito specimen has to be visually analyzed to look for the color and shape of the abdomen to make a determination of the gonotrophic stage (see Fig. 1), and there is a need to automate this process. With automation, mosquito observations from the general public can also be processed, which can provide larger-scale surveillance data for public health agencies.

Our technical contributions

We raised 97 female mosquitoes in a lab in Southern India and let them go through all four stages in the gonotrophic cycle. The mosquitoes were distributed between three medically important vector species—Aedes (Ae.) aegypti, Culex (Cx.) quinquefasciatus, and Anopheles (An.) stephensi. Subsequently, as the mosquitoes went through each stage, our team took pictures of them via multiple smartphones on a plain grey or white background to generate a total of 1379 images (details on our dataset are provided in Discussions and Methods sections). In addition, we also raised 42 Anopheles stephensi mosquitoes at a lab in the US, from which we took 580 images of these mosquitoes in their unfed and semi-gravid stages via multiple smartphones on a similar background. Our total image dataset was thus 1959 images (see Table 1). Using this dataset, our contributions are the following.

  • Designing multiple neural network architectures for classification: In this study, we trained, fine-tuned, and tested four different neural network architectures for classifying gonotrophic stages—ResNet505, MobileNetV26, EfficientNet-B07, and ConvNeXtTiny8. Each architecture provides contextual relevance to our classification problem, while also being diverse from the other in their design. The ResNet50 is popular, but computationally very expensive. The MobileNetV2 architecture is lighter and particularly suited for execution on embedded devices like smartphones. The EfficientNet-B0 architecture is newer and is a nice trade-off between good accuracy and lighter in complexity. Finally, the ConvNeXtTiny architecture is a hybrid of CNNs and the more recent Vision Transformers9. Our metrics to assess performance were precision, recall, F1-score, and accuracy. Our analysis identified that overall the EfficientNet-B0 architecture out-performed others. This model was able to yield an overall accuracy of \(93.59\%\), with a tolerable model size and speed of execution. Most confusion happened between the gravid and semi-gravid stages across all models.

  • Visualizing the predictive ability of features using t-SNE Analysis: We leveraged the t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm10 to construct a 2D plot to visualize the features extracted by our AI models. We observed that the results obtained from the EfficientNet-B0 model displayed distinct and separable features for each class, aligning precisely with the different stages of the gonotrophic cycle. This observation further substantiates the effectiveness of the EfficientNet-B0 model in gonotrophy stage classification.

  • Providing model explainability via Grad-CAMs: To further the explainability of our trained EfficientNet-B0 model, we utilized the Gradient-weighted Class Activation Mapping (Grad-CAM)11 technique to identify those pixels that the AI model prioritized in making a classification. Our findings demonstrate that our model gives the greatest weight to those pixels that represent the abdomen of the mosquito. This finding is important and indicates that our AI model has learned correctly because the visual markers for identifying stages in the gonotrophic cycle are indeed located in the abdomen of a mosquito (please refer to Fig. 1).

  • Highlighting the practical impact of our work: To the best of our knowledge, our study is the first to design and validate computer vision methods for automatic identification of the stages in the mosquito gonotrophic cycle. The practical impact of our study is elaborated later in the paper.

Figure 1
figure 1

Abdominal conditions of a female mosquito according to the stages of its gonotrophic cycle, redrawn from4.

Table 1 Number of mosquito images and specimens across three species in our dataset.

Results

Classification accuracies

The results here are for 234 images in our testing dataset that are unseen by the four AI models that we trained. They are presented in Table 2. Our metrics to test are Precision, Recall, F1-score, and Accuracy. These metrics are calculated the same way for each class and are defined below:

$$\begin{aligned} Precision= & {} \frac{True\;Positive}{True\;Positive + False\;Positive}, \end{aligned}$$
(1)
$$\begin{aligned} Recall= & {} \frac{True\;Positive}{True\;Positive + False\;Negative}, \end{aligned}$$
(2)
$$\begin{aligned} F\it{1} \text {-} score= & {} 2 * \frac{Precision * Recall}{Precision + Recall}, \end{aligned}$$
(3)
$$\begin{aligned} Accuracy= & {} \frac{True\;Positive + True\;Negative}{Positive + Negative}. \end{aligned}$$
(4)

As we can see from Table 2, the highest classification accuracy is yielded by the ResNet50 model, followed by the EfficientNet-B0 model. The lowest classification accuracy was yielded by the ConvNeXtTiny model, which is a new architecture combining the features of CNNs, and drawing inspiration from Vision Transformers9. Table 3 presents the confusion matrices for all four architectures. As we can see all models exhibit confusion classifying between semi-gravid and gravid mosquitoes. This is reasonable because the morphological differences between these two classes are very fine – extremely delicate and inexact changes in color across the abdomen of the mosquito – which sometimes confuses even trained entomologists.

To analyze the architectures further, Table 4 presents the complexity of the models trained, since it is also important that models are lightweight and leave minimal footprints in their execution. As we can see the ResNet50 model is the heaviest with a very large model size and features extracted. The EfficientNet-B0 model is much lighter in comparison. It is our judgment that for the problem of classifying gonotrophic stages, the EfficientNet-B0 model is most practical, since it is very accurate, and lightweight also, lending its ability to be executed in embedded devices like smartphones and edge computers, which is the practical need in mosquito surveillance. The average inference time per image for the ResNet50, EfficientNet-B0, MobileNetV2, and ConvNeXtTiny models were 0.82, 1,22, 0.67, and 2.58 seconds respectively. These are small and tolerable delays.

Table 2 Results for four classification model architectures on test images.
Table 3 Confusion matrices for four classification model architectures on test images.
Table 4 Comparison of model architectures from a complexity perspective.

Features visualization using t-distributed stochastic neighbor embedding (t-SNE) algorithm

In order to highlight the discriminatory power across the four classes of gonotrophic stages, we leverage the technique of t-SNE 10. Basically, t-SNE is an unsupervised, non-linear technique for dimensionality reduction, and is used for visualizing high-dimensional data (in our case, the activation maps or output features of the final convolutional layer of our AI model). Basically, this method provides an intuition of how high-dimensional data points are related in low-dimensional space. As such, we can use this technique to evaluate the discriminatory power of the AI models.

To implement the t-SNE method, the following steps were executed for all four AI models. For each model, we start with the base, from which two sequential phases were executed. First, t-SNE builds up a probability distribution matrix for data points, which are the activation maps of the final convolutional layer of our AI model. For each pair, if there is a high level of similarity, a large probability value is assigned; otherwise, the probability value is small. Next, t-SNE considers those data points in a lower-dimensional space and generates another probability distribution. Here, the algorithm minimizes the loss or difference between the two probability distributions with respect to the locations on the map. To accomplish that, the algorithm calculates the Kullback-Leibler divergence (KL divergence)12 value and minimizes it over several iterations. This helps us understand how our AI model separates different classes in the data by visualizing how the decision boundaries are formed in a lower-dimensional space.

Figure 2
figure 2

Feature maps of all trained models after implementing the t-SNE algorithm. (a) ResNet50, (b) MobileNetV2, (c) EfficientNet-B0, (d) ConvNeXtTiny.

For each AI model, we obtained the activation maps from the final convolutional layer for all 234 test images, and each image resulted in a matrix with dimensions of \(7 \times 7 \times m\) (where m denotes the number of features extracted from the last convolution layer of each model (see Table 4)). To prepare the data for analysis, we flattened each image’s feature matrix into an array of size \(49 \times m\). Subsequently, we applied the t-SNE algorithm to the flattened feature data of the 234 images, as described earlier. This process generated 2D coordinates for each image, allowing us to visualize them in a reduced space. To provide additional information, we color-coded the images based on their gonotrophic stages in Fig. 2, and used circles and crosses to denote correct and incorrect classifications.

In Fig. 2, we show the resulting t-SNE plots for all four trained AI models. While ResNet50 (Fig. 2a) and ConvNeXtTiny (Fig. 2d) achieved higher classification accuracies, they did not exhibit a clearly distinguishable t-SNE plot. On the other hand, although MobileNetV2 (Fig. 2b) displayed a coherent t-SNE plot, this model had the poorest performance metrics for the fully fed class. Notably, the t-SNE plot for EfficientNet-B0 (Fig. 2c) demonstrated better alignment with its performance metrics for all classes. The markers corresponding to each color (representing a gonotrophic stage) were distinctly located in separate areas on the plot, reaffirming that the AI had effectively learned discernible features, and classified them accurately.

Enhancing model explainability utilizing Grad-CAMs

In this study, we provide further explainability of our AI model (EfficientNet-B0 only). In our study, we attempt to do so using the technique of Grad-CAM11. Grad-CAM is a technique that leverages the gradients of each target class as they propagate through the final convolutional layer of a neural network. By analyzing these gradients, Grad-CAM generates a coarse localization map that highlights the important regions of an image that contribute to the network’s prediction for a specific class. To accomplish this, Grad-CAM first computes the gradients of the target class with respect to the feature maps produced by the final convolutional layer. These gradients serve as important weights, indicating how crucial each feature map is for predicting the class of interest. Next, the gradients are global-average-pooled to obtain a single weight per feature map. This pooling operation helps to capture the overall importance of each feature map rather than focusing on individual spatial locations. Finally, the weights are combined with the corresponding feature maps using a weighted combination, producing the final localization map. This map provides a visual representation of the regions in the image that are most relevant for the neural network’s decision-making process regarding the target class. In the resulting implementation of this technique, the pixels in an image that were prioritized more during a classification will appear progressively redder, while those pixels prioritized less will appear progressively bluer.

Figures 3, 4, 5, and 6 show specific instances of Grad-CAM outputs on test images based on classifications made by our EfficientNet-B0 model, organized by gonotrophic stage and species. Results demonstrated that the redder pixels are indeed concentrated in and around the abdomen in all instances. This provides a high degree of confidence that the EfficientNet-B0 model has learned correctly from the right anatomical components in an image, and we can hence explain the classification of gonotrophic stages in a mosquito. We mention here that while only some instances are presented in Figs. 3, 4, 5 and 6, the results are indeed generalizable.

Figure 3
figure 3

Unfed gonotrophic stage images with Grad-CAM per species for the EfficientNet-B0 model.

Figure 4
figure 4

Fully fed gonotrophic stage images with Grad-CAM per species for the EfficientNet-B0 model.

Figure 5
figure 5

Semi-gravid gonotrophic stage images with Grad-CAM per species for the EfficientNet-B0 model.

Figure 6
figure 6

Gravid gonotrophic stage images with Grad-CAM per species for the EfficientNet-B0 model.

Discussions

The surveillance and control of mosquito vectors is a critical aspect of epidemiology, but the process is fraught with obstacles. The standard surveillance practice is to lay mosquito traps in an area of interest, after which the trapped mosquitoes—sometimes hundreds per trap—are brought to a lab and spread out on a board that is light-colored for visual inspection one-by-one (to identify to species, gonotrophic stage, etc.). Sometimes, a microscope is needed too. This process is arduous, manual, and time-consuming. Additionally, across the globe, entomology is a profession for which expertise is increasingly difficult to find and sustain. There is a clear need to automate the surveillance process, with practical ramifications elaborated below13,14.

Knowing the abundance of mosquitoes in each gonotrophic stage is important for a variety of assessments, including near-time forecasting of the population of vector mosquitoes (via knowing the abundance of gravid mosquitoes), the effectiveness of eradication strategies in controlling vectors, conduciveness of local climactic factors for reproduction (gleaned via the ratio of the number of gravid to fully fed mosquitoes), and propensity for diseases to spread in any area during any outbreak (based on the number of blood-fed mosquitoes).

For the specific case of malaria (and also for other mosquito-borne diseases), it has been shown that understanding the gonotrophic stages of mosquitoes has vital importance for disease control and associated environmental impact3,4. Specifically, it has been shown that being aware of the timing of blood meals and egg laying will enable highly targeted eradication strategies to reduce mosquito populations and hence diseases. This is because a targeted plan to use pesticides across space and time will not only suppress mosquito populations and the spread of disease effectively, but it will also lower costs and the associated environmental impact.

Mosquito fecundity is primarily determined by their neurosecretory system, the amount of blood they consume, and local climatic circumstances15,16. If any of these conditions are unfavorable, fertility decreases. Hence, the effect of a single factor on fecundity can be determined, after controlling for other variables, by determining the relative abundance of mosquitoes in various gonotrophic stages. In addition, given species-specific and gonotrophic stage knowledge, public health experts can compare the fecundity of different mosquito species to gain a deeper understanding of the differences in their reproductive biology.

Knowledge of the gonotrophic stages is also critical to other facets of mosquito-borne disease epidemiology. For example, a fully fed mosquitoes are required for enzyme-linked immunosorbent assays (ELISA), to identify human blood meals in mosquitoes17, and semi-gravid mosquitoes are required for cytogenetic analysis to assess chromosomal mutations18. Furthermore, since a mosquito needs to have consumed a blood meal to carry pathogens, an automated and rapid mechanism to classify a fed mosquito from an unfed one will enhance operational efficiency in determining the presence or absence of pathogens in any specific mosquito during outbreaks.

Beyond merely helping entomologists save time in gonotrophic stage identification, the impact of our paper extends onto two novel avenues. The first is leveraging image data generated by citizen science (aka. community science). Our team is now close partners with three well-established platforms that the general public uses to upload mosquito observations. These platforms are Mosquito Alert19, iNaturalist20, and GLOBE Observer’s Mosquito Habitat Mapper21. Via these partnerships, we work with volunteers across Africa, the Americas, and Europe to train citizen scientists on best practices for recording and uploading mosquito observations from smartphones. Furthermore, utilizing Open Geospatial Consortium standards, we have harmonized data streams from all of these platforms to facilitate interoperability and utility for experts and the general public. This GIS mapping platform, the Global Mosquito Observations Dashboard (GMOD), is accessible at www.mosquitodashboard.org for visualizing and downloading data in multiple tabular and geospatial formats (> 300K observations to date)22,23. We are currently integrating computer vision algorithms that we have designed and validated in prior work22,23,25 to process images from these citizen science platforms for species identification (and soon for gonotrophic stage identification). Notably, most mosquito images uploaded by citizen scientists are taken indoors with a light-colored wall background when the mosquito is resting. This is also a reason why the images of mosquitoes in our dataset were taken on a gray- or white-colored background.

The second novel practical impact of our work lies in augmenting AI technologies that we and multiple other groups are designing to identify mosquito species automatically, thereby eliminating the need for expert human involvement26,26,28. While in some technologies, a mosquito must be emplaced in an imaging system26, in other technologies, mosquito images are captured in flight inside the trapping chamber29. In either case though, the background is light-colored to provide appropriate contrast. Ultimately, the algorithms shared in our paper (Data availability) can enable novel tools that harness the power of both AI and the general public, as they upload images from which we can now not only identify vector mosquitoes but also their gonotrophic stages, with greater utility for mosquito surveillance and control.

Conclusions

In this study, we develop computer vision approaches to automate the determination of gonotrophic stages from mosquito images. Our data came from mosquitoes distributed across three important species: Ae. aegypti, An. stephensi, and Cx. quinquefasciatus. A total of 139 mosquitoes were raised in two separate facilities, and they went through the four gonotrophic stages: unfed, fully fed, semi-gravid, and gravid. Using multiple smartphones, we then captured 1959 photographs of these mosquitoes against a plain background. Following that, we trained and tested four diverse but popularly used AI model architectures and implemented explainable AI techniques (t-SNE, Grad-CAMs) to validate their outcomes. Overall, the EfficientNet-B0 model gave the best performance when combining model accuracy, model size, distinguishable t-SNE plots, and correct Grad-CAMs.

To the best of our knowledge, our contributions in this paper are the first towards automating the process of determining the gonotrophic stage of a mosquito using computer vision techniques. We believe that our method provides novel tools for entomologists, citizen-science platforms, and image-based mosquito surveillance. With the increasing spread and resurgence of mosquito-borne diseases across the globe (e.g., the first local transmission of malaria in the US in two decades this summer), our study assumes critical and urgent significance.

Methods

Generation of image database and augmentation

The images comprising our dataset came from mosquitoes raised in captivity in two separate labs. One lab is in South India, and the other is in the US. The mosquitoes raised in South India belonged to three species: Ae. aegypti, An. stephensi, and Cx. quinquefasciatus. Mosquitoes were fed with chicken and sheep blood in India and the US respectively. It took about two minutes for the mosquitoes to reach a fully fed state. After this, the mosquitoes took about 24 hours to move from one stage in the gonotrophic cycle to the next. At each stage, the mosquitoes were visually observed by entomological experts to determine the correct stage. Please note that after visual identification, live mosquitoes were emplaced in test tubes and anesthetized using a few drops of diethyl ether added to the cotton plug of the test tubes. Within a minute, mosquitoes were anesthetized. The mosquitoes were then photographed over a plain grey or white background with multiple smartphones. The background was chosen specifically since (a) entomologists today emplace mosquitoes on a light-colored platform for identification; (b) citizen-uploaded images of mosquitoes in portals today are predominantly taken indoors on a light colored background; and (c) the light-colored background provides the highest contrast. The reason for taking images via multiple smartphones was to introduce noise that commonly occurs in real life due to diversity across cameras, and is a standard procedure in computer vision. This same image-capturing procedure was also followed for the mosquitoes raised in the US, except that these were only An. stephensi mosquitoes, and the photographs taken were for the unfed and semi-gravid stages only. The final dataset contained 579 images of unfed female mosquitoes, 521 images of fully fed mosquitoes, 438 images of semi-gravid mosquitoes, and 421 images of gravid mosquitoes across the three species (see Table 1). It is important to note here that a mosquito that was photographed in one stage was not used for photographs taken in another stage. In other words, photographs of a single mosquito specimen were taken for only one gonotrophic stage in our dataset. This alleviates pseudo-replication concerns in our dataset.

Once the images were generated, the entire image and species dataset was split into training, validation, and testing sets in the proportion of \(80\%\) (1504 images), \(10\%\) (221 images), and \(10\%\) (234 images), respectively. Images in the training set were augmented, which is a standard step before developing AI models. The idea here is to introduce sufficient diversity to the training samples, so that the model learns to better ignore noninformative variation during practical use, and is not over-fitted. To augment the training images (i.e., add diversity to the 1504 training images), we used eight methods that are standard in image processing—rotating clockwise (a) and counter-clockwise (b), flipping horizontally (c) and vertically (d), changing blurriness (e) and sharpness (f), altering brightness randomly from 5 to 20\(\%\) (g), and manually cropping images to extract only the mosquito body (h). Figure 7 shows eight augmented images with the original image of a sample mosquito. After augmentation, the number of training samples per class was increased by a factor of eight. The images in the validation and testing sets were not augmented.

Figure 7
figure 7

Augmented images for a sample mosquito image.

Our deep neural network architectures to classify gonotrophic stages of mosquitoes

Deep neural networks architectures rationale

In this paper, we trained and validated four distinct deep neural network architectures for gonotrophic stage classification—ResNet50, MobileNetV2, EfficientNet-B0, and ConvNeXtTiny. All these architectures are popular in the literature and are sufficiently diverse. The ResNet505 architecture employs a series of residual blocks, each containing convolutional layers and utilizing a bottleneck design to optimize computation. ResNet505 addresses the vanishing gradient problem by introducing shortcut connections (skip connections) which help in training very deep neural networks. However, this model can be computationally very intensive and memory-consuming due to the increased depth and the necessity to store intermediate activations for the skip connections. This can make it challenging to deploy on resource-constrained devices or platforms. A lighter-sized model well-suited for execution on embedded devices like smartphones is MobileNetV26, which utilizes depth-wise separable convolutions, significantly reducing the number of parameters and computations compared to traditional convolutional layers. This makes it highly efficient for mobile and embedded devices, allowing for faster inference and lower memory requirements. The efficiency gain achieved in MobileNetV2 comes at the cost of some loss in accuracy compared to larger and more computationally intensive models. As a trade-off between accuracy and computation cost, we chose the EfficientNet-B07 model for training and validation. It has achieved state-of-the-art performance across various tasks while requiring fewer parameters compared to other architectures. Instead of independently scaling the width, depth, and resolution of the network, EfficientNet-B0 scales all three aspects simultaneously and uniformly using scaling coefficients. Thus it strikes a superior balance between model size, computational efficiency, and accuracy, making it highly efficient for practical applications and deployment. Apart from these three convolutional neural networks, we finally trained and validated ConvNeXtTiny8, a recent neural network inspired by the concepts of Vision Transformers9 (which are very state-of-the-art now, but heavy). It employs a technique known as depth-wise convolution and it is a distinct approach to image processing where the network analyzes various segments of the image independently. This method effectively cuts down on the required computational workload while preserving accuracy.

Optimization of hyperparameters

As is customary in developing deep neural network architecture, hyperparameters are determined by multiple rounds of training and validation on the dataset30. Critical hyperparameters tuned in our neural network architecture are presented in Table 6. Please note that Table 6 lists those hyperparameters that we used for training and validating the EfficientNet-B0 model although they were very similar for the other three architectures too.

  • Resized images: To maintain image consistency, we must resize them. As we collected data from numerous cell phones for our problem, we downsized each input image to \(224 \times 224 \times 3\) pixels, regardless of its actual dimension for consistency. This enables us to achieve faster training without loss of image quality. We standardized the RGB value of each pixel in the image by dividing it by 255.

  • Optimizer: In this work, the Adam (Adaptive Moment Estimation)31 optimization algorithm was utilized. This technique enables adaptive learning rates for weights across architectural layers, so that lower rates are allocated to weights receiving larger updates and higher rates are given to weights receiving smaller updates. The exponential decay rates for the first and second moment estimations (\(\beta _1\) and \(\beta _2\)) are set to 0.89 and 0.999 respectively.

  • Loss functions: In this study, we utilized the categorical cross-entropy loss function. This function minimizes the difference between the expected probability function and the actual probability function. This is in contrast to other loss functions, such as focal loss and triplet loss, which perform better when variations in terms of the complexity of entities inside classes and their inter-variabilities are greater, neither of which is true for our situation.

  • Fine-tuning of the architecture and compensating for overfitting: For fine-tuning, we initially froze the layers of the base model (all the layers except for the the dense layers we appended to the bottom of the architecture (see Table 5)) with the weights of a pre-trained model that was trained on the ImageNet32 of 14 million images of 20,000 categories. Since the model was already trained on a large dataset, these weights were already highly optimized. That is why we only trained the last 14 layers of the model with a higher learning rate (\(1e-3\)). After training with 500 epochs, we unfroze all layers and again trained the model with a smaller learning rate (\(1e-5\)) so that the change in weights would be smaller. Within 500 epochs (in total 1000), the model reached the best optimization. Figure 8 shows the loss and accuracy for each iteration during training and validation of the EfficientNet-B0 model. We have used Python DL libraries (keras33, tensorflow34) to implement the codes and trained on an Intel Xeon E5-2620 v4 processor with 128GB GPU memory.

Table 5 EfficientNet-B0 model architecture with layer information (input and output sizes).
Table 6 Values of critical hyperparameters in training the EfficientNet-B0 model.
Figure 8
figure 8

Plotting the loss (a) and accuracy (b) for each epoch in training and validation of EfficientNet-B0.

Related work

Mosquito surveillance and control are critical tasks in epidemiology. There is always a need to enhance the speed and scale of these activities, especially with rising cases of mosquito-borne diseases across the globe.

In the past decade or so, citizen-science platforms such as iNaturalist20, Mosquito Alert19, and Mosquito Habitat Mapper21 have been deployed with great success22,35, enabling non-experts to take and upload photographs of mosquitoes that they encounter in nature. Experts can then identify and analyze these data, hence providing a new source of surveillance information beyond the limits of traditional trapping methods. In addition, rapid advancements in AI techniques have also enabled numerous image processing methods for mosquito identification. Specific problems addressed by such studies are presented below in limited detail.

Goodwin et al.26 presented a method for identifying mosquito species using convolutional neural networks (CNNs) and a multitiered ensemble model. The approach utilized deep learning techniques to analyze mosquito images and accurately classify 67 mosquito species. Kittichai et al.,36 focused on utilizing the well-known deep learning techniques: you-only-look-once (YOLO) algorithm37 for the identification of both mosquito species and gender. The YOLO algorithm, with its ability to handle complex and challenging visual data, aided in accurately identifying and classifying mosquito vectors. Kittichai proposed to concatenate two YOLO v338 models and was able to show optimal performance in mosquito species and gender classification. Finally, our prior work has demonstrated the utility of CAMs in the identification of mosquito species22,24.

To the best of our knowledge, there is no work yet in the literature on automating the determination of gonotrophic stages in a mosquito, hence making the contributions in this paper unique and practically impactful.