Best Practices to Train Deep Models on Imbalanced Datasets—A Case Study on Animal Detection in Aerial Imagery

Kellenberger, Benjamin; Marcos, Diego; Tuia, Devis

doi:10.1007/978-3-030-10997-4_40

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11053))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2851 Accesses
3 Citations

Abstract

We introduce recommendations to train a Convolutional Neural Network for grid-based detection on a dataset that has a substantial class imbalance. These include curriculum learning, hard negative mining, a special border class, and more. We evaluate the recommendations on the problem of animal detection in aerial images, where we obtain an increase in precision from 9% to 40% at high recalls, compared to state-of-the-art. Data related to this paper are available at: http://doi.org/10.5281/zenodo.609023.

Supported by the Swiss National Science Foundation (grant PZ00P2-136827).

You have full access to this open access chapter, Download conference paper PDF

Issues in Training a Convolutional Neural Network Model for Image Classification

Light-Weight Deep Learning Framework for Automated Remote Sensing Images Classification

SAT-CNN: A Small Neural Network for Object Recognition from Satellite Imagery

Keywords

1 Introduction

Convolutional Neural Networks (CNNs) [5] have led to tremendous accuracy increases in vision tasks like classification [2] and detection [8, 9], in part due to the availability of large-scale datasets like ImageNet [11]. Many vision benchmarks feature a controlled situation, with all classes occurring in more or less similar frequencies. However, in practice this isn’t always the case. For example, in animal censuses on images from Unmanned Aerial Vehicles (UAVs) [6], the vast majority of images is empty. As a consequence, training a deep model on such datasets like in a classical balanced setting might lead to unusable results.

In this paper, we present a collection of recommendations that allow training deep CNNs on heavily imbalanced datasets (Sect. 2), demonstrated with the application of big mammal detection in UAV imagery. We assess the contribution of each recommendation in a hold-one-out fashion and further compare a CNN trained with all of them to the current state-of-the-art (Sect. 4), where we manage to increase the precision from 9% to 40% for high target recalls. The paper is based on [3].

2 Proposed Training Practices

The following sections briefly address all the five recommendations that make training on an imbalanced dataset possible:

Curriculum Learning. For the first five training epochs, we sample the training images so that they always contain at least one animal. This is inspired by Curriculum Learning [1] and makes the CNN learn initial representations of both animals and background. This provides it with a better starting point for the imbalance problem later on.

Rotational Augmentation. Due to the overhead perspective, we employ \(90^{\circ }\)-stop image rotations as augmentation. However, we empirically found it to be most effective at a late training stage (from epoch 300 on), where the CNN is starting to converge to a stable solution.

Hard Negative Mining. After epoch 80 we expect the model to have roughly learned the animal and background appearances, and thus focus on reducing the number of false positives. To do so, we amplify the weights of the four most confidently predicted false alarms in every training image for the rest of the training schedule.

Border Class. Due to the CNN’s receptive field capturing spatial context, we frequently observed activations in the vicinity of the animals, leading to false alarms. To remedy this effect, we label the 8-neighborhood around true animal locations with a third class (denoted as “border”). This way, the CNN learns to treat the surroundings of the animals separately, providing only high confidence for an animal in its true center. At test time, we simply discard the border class by merging it with the background.

Class Weighting. We balance the gradients during training with constant weights corresponding to the inverse class frequencies observed in the training set.

3 Experiments

3.1 The Kuzikus Dataset

We demonstrate our training recommendations on a dataset of UAV images over the Kuzikus game reserve, Namibia^{Footnote 1}. Kuzikus contains an estimated 3000 large mammals such as the Black Rhino, Zebras, Kudus and more, distributed over \(\mathrm {103\,km^2}\) [10]. The dataset was acquired in May 2014 by the SAVMAP Consortium^{Footnote 2}, using a SenseFly eBee^{Footnote 3} with a Canon PowerShot S110 RGB camera as payload. The campaign yielded a total of 654 \(\mathrm {4000\,\times \,3000}\) images, covering \(\mathrm {13.38\,km^2}\) with around 4 cm resolution. 1183 animals could be identified in a crowdsourcing campaign [7]. The data were then divided image-wise into 70% training, 10% validation and 20% test sets.

3.2 Model Setup

We employ a CNN that accepts an input image of \(512\,\times \,512\) pixels and yields a \(32\,\times \,32\) grid of class probability scores. We base it on a pre-trained ResNet-18 [2] and replace the last layer with two new ones that map the 512 activations to 1024, then to the 3 classes, respectively. We add a ReLU and dropout [12] with probability 0.5 in between for further regularization. The model is trained using the Adam optimizer [4] with weight decay and a gradually decreasing learning rate for a total of 400 epochs.

We assess all recommendations in a hold-one-out fashion, and further compare them to a full model and the current state-of-the-art on the dataset, which employs a classifier on proposals and hand-crafted features (see [10] for details).

4 Results and Discussion

Figure 1 shows the precision-recall curves for all the models.

All recommendations boost precision, but with varying strengths. For example, disabling curriculum learning (“CNN 3”) yields the worst precision at high recalls—too many background samples from the start seem to severely drown any signal from the few animals. Unsurprisingly, a model trained on only images that contain at least one animal (“CNN 2”) is similarly bad: this way, the model only sees a portion of the background samples and yields too many false alarms. The full model provides the highest precision scores of up to 40% at high recalls of 80% and more. At this stage, the baseline reaches less than 10% precision, predicting false alarms virtually everywhere. In numbers, this means that for 80% recall our model predicts 447 false positives, while the baseline produces 2546 false alarms.

5 Conclusion

Many real-world computer vision problems are characterized by significant class imbalances, which in the worst case makes out-of-the-box applications of deep CNNs unfeasible. An example is the detection of large mammals in UAV images, out of which the majority is empty. In this paper, we presented a series of practices that enable training CNNs by limiting the risk of the background class drowning the few positives. We analyzed the contribution of each individual practice (curriculum learning, hard negative mining, etc.) and showed how a CNN, trained with all of them, yields a substantially higher precision if tuned for high recalls.

Notes

References

Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 41–48. ACM, New York (2009)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Kellenberger, B., Marcos, D., Tuia, D.: Detecting mammals in UAV images: best practices to address a substantially imbalanced dataset with deep learning. Remote Sensing of Environment (in revision)
Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Linchant, J., Lisein, J., Semeki, J., Lejeune, P., Vermeulen, C.: Are unmanned aircraft systems (UASs) the future of wildlife monitoring? A review of accomplishments and challenges. Mammal Rev. 45(4), 239–252 (2015)
Article Google Scholar
Ofli, F., et al.: Combining human computing and machine learning to make sense of big (aerial) data for disaster response. Big Data 4(1), 47–59 (2016)
Article Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition, June 2016
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Rey, N., Volpi, M., Joost, S., Tuia, D.: Detecting animals in African Savanna with UAVs and the crowds. Remote Sens. Environ. 200, 341–351 (2017)
Article Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Wageningen University and Research, Wageningen, The Netherlands
Benjamin Kellenberger, Diego Marcos & Devis Tuia

Authors

Benjamin Kellenberger
View author publications
You can also search for this author in PubMed Google Scholar
Diego Marcos
View author publications
You can also search for this author in PubMed Google Scholar
Devis Tuia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Benjamin Kellenberger .

Editor information

Editors and Affiliations

Leuphana University, Lüneburg, Germany
Ulf Brefeld
National University of Ireland, Galway, Ireland
Edward Curry
IBM Research - Ireland, Dublin, Ireland
Elizabeth Daly
University College Dublin, Dublin, Ireland
Brian MacNamee
Nokia (Ireland), Dublin, Ireland
Alice Marascu
Vodafone, Milan, Italy
Fabio Pinelli
IBM Research - Ireland, Dublin, Ireland
Michele Berlingerio
University College Dublin, Dublin, Ireland
Neil Hurley

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kellenberger, B., Marcos, D., Tuia, D. (2019). Best Practices to Train Deep Models on Imbalanced Datasets—A Case Study on Animal Detection in Aerial Imagery. In: Brefeld, U., et al. Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2018. Lecture Notes in Computer Science(), vol 11053. Springer, Cham. https://doi.org/10.1007/978-3-030-10997-4_40

Download citation

DOI: https://doi.org/10.1007/978-3-030-10997-4_40
Published: 18 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-10996-7
Online ISBN: 978-3-030-10997-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)