A New Supervised Clustering Framework Using Multi Discriminative Parts and Expectation–Maximization Approach for a Fine-Grained Animal Breed Classification (SC-MPEM)

Sundaram, Divya Meena; Loganathan, Agilandeeswari

doi:10.1007/s11063-020-10246-3

A New Supervised Clustering Framework Using Multi Discriminative Parts and Expectation–Maximization Approach for a Fine-Grained Animal Breed Classification (SC-MPEM)

Published: 18 June 2020

Volume 52, pages 727–766, (2020)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

656 Accesses
15 Citations
Explore all metrics

Abstract

Fine-grained image classification is active research in the field of computer vision. Specifically, animal breed classification is an arduous task due to the challenges in camera traps images like occlusion, camouflage, poor illumination, pose variation, etc. In this paper, we propose a fine-grained animal breed classification model using supervised clustering based on Multi Part-Convolutional Neural Network (MP-CNN) and Expectation–Maximization (EM) clustering. The proposed model follows a straightforward pipeline that combines the deep feature extraction using the CNN pre-trained on ImageNet and classifies unsupervised data using EM clustering. Further, we also propose a multi discriminative part selection and detection for the precise classification of animal breeds without using bounding box and annotations on both training and testing phases. The model is tested on several benchmark datasets for animals, including the largest camera trap Snapshot Serengeti dataset and has achieved a cumulative accuracy of 98.4%. The results from the proposed model strengthen the belief that supervised training of deep CNN on a large and versatile dataset, extracts better features than most of the traditional approaches, even for the unsupervised tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pig Breed Detection Using Faster R-CNN

A new dataset of dog breed images and a benchmark for finegrained classification

Article Open access 01 October 2020

Automated Cattle Classification and Counting Using Hybridized Mask R-CNN and YOLOv3 Algorithms

References

Swanson A, Kosmala M, Lintott C, Simpson R, Smith A, Packer C (2015) Snapshot Serengeti, high-frequency annotated camera trap images of 40 mammalian species in an African savanna. Sci Data 2:150026
Google Scholar
Deng J, Dong W, Socher R, Li LJ, Li K, Fei LF (2009) ImageNet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 248–255
Guérin J, Gibaru O, Thiery S, Nyiri E (2017) CNN features are also great at Unsupervised Classification. arXiv:abs/1707.01700
Feng H, Wang S, Ge SS (2018) Fine-grained visual recognition with salient feature detection. arXiv:abs/1808.03935
Alexander Gomez J, Salazar A, Vargas F (2017) Towards automatic wild animal monitoring: identification of animal species in camera-trap images using very deep convolutional neural networks. Ecol Inform 41:24–32. https://doi.org/10.1016/j.ecoinf.2017.07.004
Google Scholar
Norouzzadeh MS, Nguyen A, Kosmala M, Swanson A, Palmer MS, Packer C, Clune J (2018) Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. Proc Natl Acad Sci 115(25):5716–5725
Google Scholar
Jaskó G, Giosan I, Nedevschi S (2017) Animal detection from traffic scenarios based on monocular color vision. In: 2017 13th IEEE international conference on intelligent computer communication and processing (ICCP), IEEE, pp 363–368
Sharma SU, Shah DJ (2016) A practical animal detection and collision avoidance system using computer vision technique. IEEE Access 5:347–358
Google Scholar
Meena SD, Agilandeeswari L (2020) Stacked convolutional autoencoder for detecting animal images in cluttered scenes with a novel feature extraction framework. In: Soft computing for problem solving, Springer, Singapore, pp 513–522
Meena SD, Agilandeeswari L (2019) Adaboost cascade classifier for classification and identification of wild animals using movidius neural compute stick. Int J Eng Adv Technol (IJEAT) 9(1S3):495–499. https://doi.org/10.35940/ijeat.a1089.1291s319
Google Scholar
Gupta P, Verma GK (2017) Wild animal detection using discriminative feature-oriented dictionary learning. In: 2017 International conference on computing, communication and automation (ICCCA), IEEE, pp 104–109
Antônio WH, Da Silva M, Miani RS, Souza JR (2019) A proposal of an animal detection system using machine learning. Appl Artif Intell 33(13):1093–1106
Google Scholar
Xie L, Tian Q, Hong R, Yan S, Zhang B (2013) Hierarchical part matching for fine-grained visual categorization. In: International conference of computer vision (ICCV), pp 1641–1648
Berg T, Belhumeur P (2013) Poof: part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 955–962
Branson S, VanHorn G, Belongie S, Perona P (2014) Bird species categorization using pose normalized deep convolutional nets. arxiv:1406.2952
Zhang N, Donahue J, Girshick R, Darrell T (2014) Part based R-CNNs for fine-grained category detection. In: European conference on computer vision (ECCV), pp 834–849
Lin D, Shen X, Lu C, Jia J (2015) Deep lac: deep localization, alignment and classification for fine-grained recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1666–1674
Huang S, Xu Z, Tao D, Zhang Y (2016) Part-stacked CNN for fine-grained visual categorization. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1173–1182
Yao H, Zhang S, Zhang Y, Li J, Tian Q (2016) Coarse-to-fine description for fine-grained visual categorization. IEEE Trans Image Process (TIP) 25(10):4858–4872
MathSciNet Google Scholar
Xu Z, Huang S, Zhang Y, Tao D (2016) Webly-supervised fine-grained visual categorization via deep domain adaptation. In: IEEE transactions on pattern analysis and machine intelligence (TPAMI)
Xu Z, Tao D, Huang S, Zhang Y (2017) Friend or foe: fine-grained categorization with weak supervision. IEEE Trans Image Process (TIP) 26(1):135–146
MathSciNet MATH Google Scholar
Xie L, Tian Q, Wang M, Zhang B (2014) Spatial pooling of heterogeneous features for image classification. IEEE Trans Image Process (TIP) 23(5):1994–2008
MathSciNet MATH Google Scholar
Krause J, Jin H, Yang J, Fei-Fei L (2015) Fine-grained recognition without part annotations. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 5546–5555
Simon M, Rodner E (2015) Neural activation constellations: unsupervised part model discovery with convolutional networks. In: International conference of computer vision (ICCV), pp 1143–1151
Lin TY, Chowdhury AR, Maji S (2015) Bilinear CNN models for fine-grained visual recognition. In: International conference of computer vision (ICCV), pp 1449–1457
Zhang X, Xiong H, Zhou W, Tian Q (2016) Fused one-vs-all features with semantic alignments for fine-grained visual categorization. IEEE Trans Image Process (TIP) 25(2):878–892
MathSciNet MATH Google Scholar
Zhang L, Yang Y, Wang M, Hong R, Nie L, Li X (2016) Detecting densely distributed graph patterns for fine grained image categorization. IEEE Trans Image Process (TIP) 25(2):553–565
MathSciNet MATH Google Scholar
Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. In: International conference of computer (ICCV), pp 5209–5217
Liu J, Kanazawa A, Jacobs D, Belhumeur P (2012) Dog breed classification using part localization. In: European conference on computer vision, Springer, Berlin, pp 172–185
Parkhi OM, Vedaldi A, Zisserman A, Jawahar CV (2012) Cats and dogs. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3498–3505
Khosla A, Jayadevaprakash N, Yao B, Li FF (2011) Novel dataset for fine-grained image categorization: Stanford dogs. In: Proc. CVPR workshop on fine-grained visual categorization (FGVC), vol 2, no 1
Mulligan K, Rivas P (2019) Dog breed identification with a neural network over learned representations from the exception cnn architecture. In: 21st International conference on artificial intelligence (ICAI 2019)
Cubuk ED, Zoph B, Mane D, Vasudevan V, Le QV (2019) Autoaugment: learning augmentation strategies from data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 113–123
Touvron H, Vedaldi A, Douze M, Jégou H (2019) Fixing the train-test resolution discrepancy. In: Advances in neural information processing systems, pp 8250–8260
Kolesnikov A, Beyer L, Zhai X, Puigcerver J, Yung J, Gelly S, Houlsby N (2019). Large scale learning of general visual representations for transfer. arXiv preprint arXiv:1912.11370
Lee J, Won T, Hong K (2020) Compounding the performance improvements of assembled techniques in a convolutional neural network. arXiv preprint arXiv:2001.06268
Meena SD, Agilandeeswari L (2019) An efficient framework for animal breeds classification using semi-supervised learning and multi-part convolutional neural network (MP-CNN). IEEE Access 7:151783–151802
Google Scholar
Liu X, Xia T, Wang J, Yang Y, Zhou F, Lin Y (2016) Fully convolutional attention networks for fine-grained recognition. arXiv preprint arXiv:1603.06765
Zheng H, Fu J, Mei T, Luo J (2017) Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE international conference on computer vision, pp 5209–5217
Sun M, Yuan Y, Zhou F, Ding E (2018) Multi-attention multi-class constraint for fine-grained image recognition. In; Proceedings of the European conference on computer vision (ECCV), pp 805–821
Dubey A, Gupta O, Guo P, Raskar R, Farrell R, Naik N (2018) Pairwise confusion for fine-grained visual classification. In: Proceedings of the European conference on computer vision (ECCV). pp 70–86
Sun G, Cholakkal H, Khan S, Khan FS, Shao L (2019) Fine-grained recognition: accounting for subtle differences between similar classes. arXiv preprint arXiv:1912.06842
Hu T, Qi H, Huang Q, Lu Y (2019) See better before looking closer: weakly supervised data augmentation network for fine-grained visual classification. arXiv preprint arXiv:1901.09891
Zhuang P, Wang Y, Qiao . (2020) Learning attentive pairwise interaction for fine-grained classification. arXiv preprint arXiv:2002.10191
Guo J, Ma S, Guo S (2019) MAANet: multi-view aware attention networks for image super-resolution. arXiv preprint arXiv:1904.06252
Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3146–3154
Hu T, Yang P, Zhang C, Yu G, Mu Y, Snoek CG (2019) Attention-based multi-context guiding for few-shot semantic segmentation. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 8441–8448
Zhang L, Nizampatnam S, Gangopadhyay A, Conde MV (2019) Multi-attention networks for temporal localization of video-level labels. arXiv preprint arXiv:1911.06866
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2001) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
MATH Google Scholar
Yan X, Ai T, Yang M, Yin H (2019) A graph convolutional neural network for classification of building patterns using spatial vector data. ISPRS J Photogram Remote Sens 150:259–273
Google Scholar
Liu JE, An FP (2020) Image classification algorithm based on deep learning-kernel function. In: Scientific programming
Huang C, Li H, Xie Y, Qingbo W, Luo B (2017) PBC: Polygon-based classifier for fine-grained categorization. IEEE Trans Multimed (TMM) 19(4):673–684
Google Scholar
Guérin J, Boots B (2018) Improving image clustering with multiple pretrained cnn feature extractors. arXiv preprint arXiv:1807.07760
Long X, Gan C, De Melo G, Wu J, Liu X, Wen S (2018) Attention clusters: purely attention based local feature integration for video classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7834–7843

Download references

Author information

Authors and Affiliations

School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
Divya Meena Sundaram & Agilandeeswari Loganathan

Authors

Divya Meena Sundaram
View author publications
You can also search for this author in PubMed Google Scholar
Agilandeeswari Loganathan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Agilandeeswari Loganathan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

The two important factors in deep learning models are the dataset split and dataset balancing. The proportion of train and test data plays an important role in computational time and complexity, besides influencing the performance. Similarly, dataset balancing is an essential step in training. Hence, the first set of preliminary results with SC-MPEM is analysing the effects of dataset splitting and data balancing.

The effects of different proportions of training data on the accuracy of the proposed system are evaluated on the benchmark animal datasets and the result for the same is presented in Table 10. The proposed model is validated against a different proportion of training data like 10%, 20%, 30%, 40% and 60%. SC-MPEM has achieved good accuracy even with 10% training data. Nevertheless, the accuracy increases with an increase in the proportion of training data. However, the accuracy starts saturating at around 40% of training data and further on, it does not have much impact on the performance. But, increasing the size of training data increases the computational cost and complexity. Hence, the proposed system is trained with 40% training data. Thus, the following experiments are trialed with a 40:40:20 proportions, where 40% is for training and testing and the remaining 20% is for validation. The effect of a balanced vs. imbalanced dataset is studied and the result is presented in Fig. 22. Following the previous experiment, the dataset proportion is maintained at 40:40:20 on all the benchmark datasets.

Table 10 Effects of different proportion of training data on benchmark datasets

Full size table

It is inferred that an imbalanced dataset affects performance. The difference between the balanced and imbalanced dataset is hard to neglect. Generally, it is good to have a balanced dataset for any classification problem, as an imbalanced dataset will have a bias favouring the majority class. Usually, the problem of an imbalanced dataset is predominant in a classification problem, specifically in a multi-class classification problem, where more than one class may have minimal data. Hence, we synthetically balanced the dataset using SMOTE. However, it is not always the case that a balanced dataset performs well. If the classes are balanced, then we will miss some valuable patterns in the dataset. Hence, it is preferred to have a large dataset though unbalanced. Balancing a large dataset by data augmentation techniques doesn’t make much difference in the overall results. Among the datasets, the Snapshot Serengeti is the largest dataset and so data augmentation had minimal effect on the performance when compared to the other three datasets. Hence, the Snapshot Serengeti dataset was not balanced, however, the remaining datasets were balanced. There exists a trade-off in every application and so is the performance metrics. For a balanced dataset, accuracy is the best metric. On the other hand, for an imbalanced dataset, precision and recall are the appropriate metrics to measure the performance.

The second set of results discusses the details of transfer learning (see Table 11). Specifically, we discuss various details like layer from which the features are extracted, NMI (Normalized Mutual Information) score and the time complexity. For estimating the NMI, we ensured that no image contains more than one label, as it makes it difficult to analyze if the clustering classified the images accurately or not. In order to understand the above three factors, we tried clustering the dataset using 5 different CNN architectures, 8 different clustering algorithms, and various choices of layers. For assessing the results of these, we utilized the NMI score and time complexity. For simplicity, we used the default value for all the hyper-parameters. Both KM and MKM were randomly initialized. Every experiment was run for a total of 10 times and an average value is taken over these 10 runs. In Table 11, the layer names are unchanged from the architecture and the NMI score is given in bold. The time complexity of each clustering for the various layers is also given below the NMI score. The best results are highlighted in bold italic values.

Table 11 Details of transfer learning—architectures, feature extraction layer, NMI score and clustering

Full size table

From the table, it is inferred that features extracted from the penultimate layers of Inception v3 network and clustered using EM clustering produced the state-of-the-art result. Although AHC had the least time complexity, however, we do not consider it due to its poor NMI score. Besides, the tenfold cross-validation also stands in favor of the Inception v3 network. The result of each of the model is depicted in Fig. 23.

The final sets of results discuss the misclassification and how they are tackled by MHKC. Figure 24 represents the training accuracy for the class pre-trained using MHKC. The highest score for the chosen images is 6.60 and the least one is 1.96. The precision for the training dataset is found to be 100%, which is a good indicator for better testing accuracy. We have improved the accuracy of misclassified horse images to 100% with MHKC. With 100% PR on training data, we achieved 99.97% on testing data. This is far better than the accuracy of TensorFlow and the same is depicted in Fig. 25. The feature vector is pre-computed and so the accuracy of the classifier corresponds merely to its kernel functions.

As additional information regarding the performance of our proposed model on the Snapshot Serengeti dataset, we present the confusion matrix of the dataset in Fig. 26. Despite being the largest camera trap dataset, it has not been predominantly used for testing. Some of the reasons are the size of the dataset and the lack of resources to train and test such a huge dataset [5]. To counter this problem, Gomez et al. [5] have trained only 26 classes out of the 40 mammalian classes. In fact, the original Snapshot Serengeti has 48 classes, of which 40 classes are mammalian species. The remaining 8 classes include humans, birds, rodents, etc. We have intentionally excluded these 8 classes as they do not belong to animal species and by adding them; we needlessly increase the computational burden. Thus, we have trained and tested only 40 mammalian species as Norouzzadeh et al. [6].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sundaram, D., Loganathan, A. A New Supervised Clustering Framework Using Multi Discriminative Parts and Expectation–Maximization Approach for a Fine-Grained Animal Breed Classification (SC-MPEM). Neural Process Lett 52, 727–766 (2020). https://doi.org/10.1007/s11063-020-10246-3

Download citation

Published: 18 June 2020
Issue Date: August 2020
DOI: https://doi.org/10.1007/s11063-020-10246-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A New Supervised Clustering Framework Using Multi Discriminative Parts and Expectation–Maximization Approach for a Fine-Grained Animal Breed Classification (SC-MPEM)

Abstract

Access this article