Identifying Aedes aegypti Mosquitoes by Sensors and One-Class Classifiers

Souza, Vinicius M. A.

doi:10.1007/978-3-319-52277-7_2

Vinicius M. A. Souza¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10125))

Included in the following conference series:

Iberoamerican Congress on Pattern Recognition

1564 Accesses
3 Citations

Abstract

Yellow fever, zika, and dengue are some examples of arboviruses transmitted to the humans by the Aedes aegypti mosquitoes. The efforts to curb the transmission of these viral diseases are focused on the vector control. However, without the knowledge of the exact location of the insects with a reduced time delay, the use of techniques as chemical control becomes costly and inefficient. Recently, an optical sensor was proposed to gather real-time information about the spatio-temporal distributions of insects, supporting different vector control techniques. In field conditions, the assumption of knowledge of all classes of the problem, it is hard to be fulfilled. For this reason, we address the problem of insect classification by one-class classifiers, where the learning is performed only with positive examples (target class). In our experiments, we identify Aedes aegypti mosquitos with an AUC = 0.87.

V.M.A. Souza—The author thank the financial support of FAPESP (Grants #2011/17698-5, and #2015/16004-0).

You have full access to this open access chapter, Download conference paper PDF

Field evaluation of an automated mosquito surveillance system which classifies Aedes and Culex mosquitoes by genus and sex

Article Open access 01 March 2024

Exploring Low Cost Laser Sensors to Identify Flying Insect Species

Article 11 January 2015

A novel optical sensor system for the automatic classification of mosquitoes by genus and sex with high levels of accuracy

Article Open access 06 June 2022

Keywords

1 Introduction

The Aedes aegypti mosquito is one of the most important vectors of arboviruses that affect human health, including yellow fever, chikungunya, zika, Japanese encephalitis, and dengue. The viruses are passed on to humans through the bites of an infective female Aedes mosquito, which mainly acquires the virus while feeding on the blood of an infected person.

In May 2015, the Pan American Health Organization issued an alert regarding the first confirmed Zika virus infections in Brazil. Since this identification, the virus has spread rapidly throughout the America. The illness is usually mild with symptoms lasting for several days to a week after being bitten by an infected mosquito. However, Zika virus infection during pregnancy can cause a serious birth defect called microcephaly, as well as other severe fetal brain defects [1].

Dengue is the most important vector-borne viral disease of humans and likely more important than malaria globally in terms of morbidity and economic impact [2]. Studies estimate that 3.6 billion people living in areas of risk, with 390 million dengue infections per year globally, of which 96 million manifests clinically [3, 4]. According to the World Health Organization, only 9 countries had experienced severe dengue epidemics before 1970. The disease is now endemic in more than 100 countries. In Latin America, the incidence and severity of this disease have increased rapidly in recent years. In 2015, 2.35 million cases of dengue were reported in the Americas alone, of which 10,200 cases were diagnosed as severe dengue causing 1,181 deaths [5].

Currently, no licensed vaccine against dengue infection is available, and the most advanced vaccine candidate did not meet expectations in a large trial [6]. Thus, the efforts to curb the transmission of these viral diseases are focused on the vector control in order to reduce the population of Aedes aegypti. There are many methods to insect control, as biological, genetic technology, environmental management and chemical control. However, without the knowledge of the exact location of the insects with a reduced time delay, the use of these techniques becomes costly and inefficient.

Recently, a new optical sensor was proposed as a tool to gather information about the spatio-temporal distributions of insects and to control disease vectors by the use of this sensor combined with an electronic trap [7]. The sensor captures insect flight information using a source light and automatically classifies the insects according to their species using machine learning algorithms. This sensor can provide real-time population estimates of insect species, supporting the effective use of traditional strategies to vector control.

The previous efforts related to insect classification by optical sensors have focused on multiclass classifiers, such as Support Vector Machines, k-Nearest Neighbors, Random Forest, Deep Neural Network, among others [7,8,9,10]. In multiclass classification, we have n predefined classes composed by the set of class labels $Y = \{y_1, y_2, \ldots , y_n \}$, where the main goal of a classifier is to assign the most probable class label $y_i \subset Y$ for an unknown example $\overrightarrow{x}$, where $\overrightarrow{x} \in \mathbb {R}^d$ is a feature vector with d dimensions. This procedure can be problematic when the example does not belong to any of predefined classes.

For the effective use of the sensor in field conditions, we note that the assumption of knowledge of all classes made by multiclass classifiers, it is hard to be fulfilled. For example, it is estimated that only the insects of the order Diptera, has more than 240,000 different species, where about 120,000 are cataloged [11]. Thus, it is impossible to conduct a comprehensive data collection that covers all possible species to build a classification model with all possible species. In practice, this means that there is a high probability of the sensor to deal with unknown species. In this case, a multiclass classifier will assign an incorrect class label to this insect, due the lack of data from other possible species.

Given the need of identification of Aedes aegypti mosquitoes by sensors to support methods of vector control and the challenge to cope with unknown species, in this paper we address this classification problem using one-class classifiers [12]. In one-class classification, the learning is performed only with positive examples (target class) and none or few unlabeled examples from negative class.

We evaluated eight algorithms learned with only data from Aedes aegypti mosquitoes. The test was conducted with a dataset with five insect species collected by optical sensors. In our experimental evaluation, we conclude that the Parzen and SVDD are the most accurate algorithms for this application to the identification of Aedes aegypti mosquitoes.

The rest of the paper is organized as follows. Section 2 presents the optical sensor for insect classification. Section 3 describes the main concepts of one-class classification. Section 4 shows the results of our experimental evaluation. Finally, our conclusions are presented in Sect. 5.

2 Optical Sensor and Insect Data

The data evaluated in this paper was obtained from an optical sensor built with low-cost components to remotely capture information about flying insects. The sensor uses a light source, as a low-powered planar laser, that is pointed at an array of phototransistors as illustrated in Fig. 1-a). When a flying insect crosses the laser, its wings partially occlude the light, causing small variations in the light captured by the phototransistors. These variations are recorded as an audio signal, as the example presented in Fig. 1-b), given an Aedes aegypti crossing.

In general, the data consist of background noise with occasional “events”, resulting in the brief moment that an insect flies across the sensor. Note that the signal generated by the passage of the insect has an amplitude that is significantly higher than the amplitude of the background noise. In this way, using a simple threshold it is a trivial task to identify signal sections in which there is an insect passage. In contrast, the correct classification of each passage according to the insect species that generated the event is a more elaborate task. Basically, this task consists in extracting discriminant features from the signals for each species and using these features with machine learning algorithms.

2.1 Data Collection

In our study, we use the stream insect dataset previously evaluated in [10]. In this dataset, the collection was performed during six consecutive days in laboratory conditions in which the temperature varied slightly between $20^{\circ }$C and $22^{\circ }$C and humidity varied between 20% and 35%. This dataset has insect passage signals from two species of flies and three species of mosquitoes. The flies species are the Drosophila melanogasler and the Musca domestica. The mosquito species are the Culex quinquefascialus, Culex tarsalis and the Aedes aegypti. It is interesting to note that Culex are species visually similar to Aedes and predominant in the Latin America houses. Table 1 presents a general description of the dataset.

Table 1. Insect dataset distribution.

Full size table

2.2 Feature Extraction

In this work, we use the Mel-Frequency Cepstral Coefficients (MFCC) as recommended in a previous evaluation with a wide variety of signal processing techniques for feature extraction [7]. MFCCs are popular features in various application domains, particularly speech and speaker recognition [13].

MFCCs are calculated by taking the magnitudes of frequency components using an acoustically-defined scale called mel [14]. This scale relates physical frequencies to the frequencies perceived by the human auditory system. Equation 1 shows the conversion from frequency (f) to mel-frequency (m). Next, we apply a Discrete Cosine Transform. The MFCC are the cepstrum coefficients obtained from this operation. Specifically, we consider the 40 first coefficients as features.

$$\begin{aligned} m = 2595 \times log_{10} (1+\frac{f}{700}) \end{aligned}$$

(1)

3 One-Class Classification

Conventional multiclass classification algorithms aim to classify an unknown object into one of the several predefined categories. A problem arises when the unknown object does not belong to any of those categories. In one-class classification (OCC) [12, 15], one of the classes (referred as target class) is well characterized by instances in the training data. For the other class (non-target), it has either no instances at all, very few of them, or they do not form a statistically representative sample of the negative concept.

In general, the problem of one-class classification is harder than the problem of conventional two-class or multiclass classification. For example, in binary classification problems, the decision boundary is supported from both sides by examples of both classes. Because in the case of one-class classification only one set of data is available, only one side of the boundary is supported. It is therefore hard to decide, on the basis of just one class, how strictly the boundary should fit around the data in each of the feature directions [15]. This task is often called data domain description.

This OCC problem is often solved by estimating the target density or by fitting a model to the data support vector classifier. Instead of using a hyperplane, to distinguish between two classes, a hypersphere around the target set is used. The volume of the hypersphere is minimized directly. This method is called support vector data description (svdd) [16]. In svdd, a spherically shaped decision boundary around a set of objects is constructed by a set of support vectors describing the sphere boundary.

Different methods for data domain description have been developed. In this work, we evaluated eight different algorithms from the Data Description toolbox (DDtools) [12, 17]. Specifically, the following algorithms: gausdd (Gaussian target distribution), svdd (support vector data description), parzendd (Parzen density estimator data description), kmeansdd (k-means data description), knndd (k-nearest neighbor data description), lpdd (linear programming data description), mstdd (minimum spanning tree data description), and mogdd (mixture of Gaussians data description). Unfortunately, due to space constraints, it is not possible to describe the algorithms. We direct the interested readers to [12] and [17] for a detailed explanation. However, an intuition of the decision boundary considered for each algorithm is shown in Fig. 2, given an artificial data example.

4 Experimental Evaluation

In our experimental evaluation, the classifiers were learned only with data of Aedes aegypti (target class). More specifically, we have considered the data from the first 48 h of the data collection, which represents 347 examples. To test the classifiers, we consider the remaining 557 examples from the class Aedes aegypti that was not used to train the classifiers and the 4,421 examples from the other four species of insects, totalizing 4,978 test examples.

Due to the imbalanced proportion of examples of target class compared to the non-target, a classifier that predicts the non-target class for all test examples achieves an accuracy around 90%. For this reason, we evaluate our results by the analysis of different performance measures, as Precision, Recall, F1-Score. Thus, given the rates of true positive (TP), true negative (TN), false positive (FP), and false negative (FN) observed in a confusion matrix builded from the errors of a classifier, these measures are defined as follow:

$Precision = \frac{TP}{TP+FP}$, $Recall = \frac{TP}{TP+FN}$, $F1-Score = \frac{2\times (Precision \times Recall)}{Precision+Recall}$

In addition, we also consider the measure Area Under Curve (AUC). This measure is related to the observed area on the Receiver Operating Characteristic curve (ROC curve). The ROC curve is a two-dimensional graphical representation which corresponds to false positive rate on the horizontal axis and the true positive rate on the vertical axis. Thus, in an ideal scenario, is expected a minimum value of false positives and a maximum value of true positives, which consequently leads to a value for AUC = 1.

The general results of the algorithms considering the five performance measures discussed are shown in Table 2. For each measure, the best result is highlighted. In this table, we also show the results achieved by a baseline which corresponds to a classifier that predicts the target class for all test examples.

Table 2. Results of one-class classifiers.

Full size table

We can see in Table 2 that the algorithm parzendd showed the best results for the measures Recall and AUC. On the other hand, the algorithm svdd showed the best results for the measures F1-Score and Accuracy. To better compare the results, the ROC curves achieved by the algorithms are shown in Fig. 3.

From the results showed in Table 2 and Fig. 3, we can note that both parzendd and svdd are very competitive, but the svdd showed results better balanced in terms of false positive and true positive rates. Although the parzend algorithm correctly identifies a higher number of Aedes aegypti mosquitoes, it also incorrectly identifies a higher number of insects from other species as Aedes. In Table 3 we shown more details about the errors of both algorithms.

Table 3. Confusion matrices showed by the algorithms parzendd and svdd.

Full size table

5 Conclusion

In this paper, we showed an evaluation of one-class classifiers for the recognition of Aedes aegypti mosquitoes by optical sensors. Aedes aegypti is one of the most important vector of arboviruses as yellow fever, chikungunya, zika, and dengue. Thus, the recognition task is essential to support the efficient use of traditional methods to reduce the mosquitoes population, given the spatio-temporal informations provided by the sensors. From the results, we conclude that even with a reduced number of target examples for training the classifiers (347 examples) and the absence of non-target examples, we can learn accurate classifiers. Among the evaluated algorithms, svdd and parzendd showed the best results, with AUC = 0.85 and AUC = 0.87, respectively. In future works, we want to explore the combination of different OCC algorithms and feature sets, and in conditions with concept drifts and extreme latency to update the classification model [18, 19].

References

Plourde, A.R., Bloch, E.M.: A literature review of Zika virus. Emerg. Infect. Dis. 22(7), 1185–1192 (2016)
Article Google Scholar
Gubler, D.J.: The economic burden of dengue. Am. J. Trop. Med. Hyg. 86(5), 743–744 (2012)
Article Google Scholar
Beatty, M.E., Letson, G.W., Margolis, H.S.: Estimating the global burden of dengue. Am. J. Trop. Med. Hyg. 81(5), 231 (2009)
Google Scholar
Bhatt, S., Gething, P.W., Brady, O.J., Messina, J.P., Farlow, A.W., Moyes, C.L., Drake, J.M., Brownstein, J.S., Hoen, A.G., Sankoh, O.: The global distribution and burden of dengue. Nature 496(7446), 504–507 (2013)
Article Google Scholar
W.H.O.: Dengue and severe dengue. Technical report Fact Sheet 117, World Health Organization (2015)
Google Scholar
Halstead, S.B.: Dengue vaccine development: a 75% solution? The Lancet 380(9853), 1535–1536 (2012)
Article Google Scholar
Silva, D.F., Souza, V.M.A., Ellis, D.P.W., Keogh, E.J., Batista, G.E.A.P.A.: Exploring low cost laser sensors to identify flying insect species. J. Intell. Robot. Syst. 80(1), 313–330 (2015)
Article Google Scholar
Qi, Y., Cinar, G.T., Souza, V.M.A., Batista, G.E.A.P.A., Wang, Y., Principe, J.C.: Effective insect recognition using a stacked autoencoder with maximum correntropy criterion. In: Proceedings of the International Joint Conference on Neural Networks, pp. 1–7 (2015)
Google Scholar
Silva, D.F., Souza, V.M.A., Batista, G.E.A.P.A., Keogh, E., Ellis, D.P.W.: Applying machine learning and audio analysis techniques to insect recognition in intelligent traps. In: Proceedings of the International Conference on Machine Learning and Applications, pp. 99–10 (2013)
Google Scholar
Souza, V.M.A., Silva, D.F., Batista, G.: Classification of data streams applied to insect recognition: initial results. In: Proceedings of the Brazilian Conference on Intelligent Systems, pp. 76–81 (2013)
Google Scholar
Wiegmann, B., Yeates, D.K.: The Tree of Life Diptera (1996)
Google Scholar
Tax, D.M.J.: One-class classification. Ph.D. thesis, TU Delft, Delft University of Technology (2001)
Google Scholar
Zhen, B., Wu, X., Liu, Z., Chi, H.: On the importance of components of the MFCC in speech and speaker recognition. Acta Scietiarum Naturalium 37(3), 371–378 (2001)
Google Scholar
Stevens, S.S., Volkmann, J., Newman, E.B.: A scale for the measurement of the psychological magnitude pitch. J. Acoust. Soc. Am. 8(3), 185–190 (1937)
Article Google Scholar
Tax, D.M.J., Duin, R.P.: Uniform object generation for optimizing one-class classifiers. J. Mach. Learn. Res. 2, 155–173 (2002)
MATH Google Scholar
Tax, D.M.J., Duin, R.P.W.: Support vector domain description. Pattern Recogn. Lett. 20(11), 1191–1199 (1999)
Article Google Scholar
Tax, D.: Ddtools, the data description toolbox for matlab version 2.1.2, June 2015
Google Scholar
Souza, V.M.A., Silva, D.F., Gama, J., Batista, G.E.A.P.A.: Data stream classification guided by clustering on nonstationary environments and extreme verification latency. In: Proceedings of the SIAM International Conference on Data Mining, pp. 873–881 (2015)
Google Scholar
Souza, V.M.A., Silva, D.F., Batista, G.E.A.P.A., Gama, J.: Classification of evolving data streams with totally delayed labels. In: Proceedings of the International Conference on Machine Learning & Applications, pp. 214–219 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo, São Carlos, São Paulo, Brazil
Vinicius M. A. Souza

Authors

Vinicius M. A. Souza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vinicius M. A. Souza .

Editor information

Editors and Affiliations

Pontificia Universidad Católica del Perú, Lima, Peru
César Beltrán-Castañón
Uppsala University, Uppsala, Sweden
Ingela Nyström
University of Ottawa, Ottawa, Ontario, Canada
Fazel Famili

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Souza, V.M.A. (2017). Identifying Aedes aegypti Mosquitoes by Sensors and One-Class Classifiers. In: Beltrán-Castañón, C., Nyström, I., Famili, F. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2016. Lecture Notes in Computer Science(), vol 10125. Springer, Cham. https://doi.org/10.1007/978-3-319-52277-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-52277-7_2
Published: 16 February 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52276-0
Online ISBN: 978-3-319-52277-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Identifying Aedes aegypti Mosquitoes by Sensors and One-Class Classifiers

Abstract

Similar content being viewed by others

Field evaluation of an automated mosquito surveillance system which classifies Aedes and Culex mosquitoes by genus and sex

Exploring Low Cost Laser Sensors to Identify Flying Insect Species

A novel optical sensor system for the automatic classification of mosquitoes by genus and sex with high levels of accuracy

Keywords

1 Introduction