Underwater Live Fish Recognition by Deep Learning

Tamou, Abdelouahid Ben; Benzinou, Abdesslam; Nasreddine, Kamal; Ballihi, Lahoucine

doi:10.1007/978-3-319-94211-7_30

Abdelouahid Ben Tamou^17,18,
Abdesslam Benzinou¹⁷,
Kamal Nasreddine¹⁷ &
…
Lahoucine Ballihi¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10884))

Included in the following conference series:

International Conference on Image and Signal Processing

2816 Accesses
20 Citations

Abstract

Recently, underwater videos have gained great interest by marine ecologists for studying fish populations. Actually, this technique produces large amount of visual data and does not affect fish behavior. However, visual processing and analyzing of the recorded data can be subjective, time consuming and costly. We propose in this paper to use the convolutional neural network AlexNet with transfer learning for automatic fish species classification. We extract features from foreground fish images of the available underwater dataset using the pretrained AlexNet network either with or without fine-tunig. For classification, we use a linear SVM classifier. The experiment results demonstrate the effectiveness of the proposed approach on the Fish Recognition Ground-Truth dataset. We achieve an accuracy of 99.45%.

You have full access to this open access chapter, Download conference paper PDF

Crosspooled FishNet: transfer learning based fish species classification model

Article 22 August 2020

FishResNet: Automatic Fish Classification Approach in Underwater Scenario

Article 14 May 2021

Automatic Fish Species Classification Using Deep Convolutional Neural Networks

Article 02 August 2019

Keywords

1 Introduction

In the last few years, underwater video cameras are extensively used in scientific, industrial and military fields for exploring and studying underwater environments. Marine biologists are interested in using underwater video analysis to study fish populations as species richness and size measurement [1,2,3,4,5], abundance [6] or animal behavior [1]. Automatic processing is an advantage compared to manual processing which is relatively off putting task, subjective, time consuming and costly. Automatic fish classification can be divided into two parts. (1) Fish detection which aims to detect and separate the subject from the background. (2) Fish recognition which aims to identify the species of the detected fish. The underwater environment presents a lot of difficulties and poses great challenges for computer vision. The luminosity changes frequently, the visibility is limited and the background can change rapidly due to moving aquatic plants. There are some attempts to improve image contrast and resolution for underwater images [7, 8]. In addition, in fish recognition task, the fish can move in three dimensions, it can also hide behind rocks and algae. We also encounter the problems of fish overlapping and of the similarity in shape and patterns among fish of different species. In this paper, we will focus on fish recognition in underwater video images.

Convolutional neural networks (ConvNets) [9] consist of L learned layers. The first layer is the input layer and represents a raw image. The hidden layers typically consist of convolutional layers, pooling layers, normalization layers and fully connected layers. The output layer consists of N-dimensional vector where N is the number of classes, this layer uses Softmax function to predict a single class of N mutually exclusive classes. ConvNets are trained using a standard error-backpropagation algorithm.

Li et al. [2] applied Fast R-CNN (Regions with Convolutional Neural Networks) on underwater images to detect and recognize fish species. They achieved an accuracy of 81.4% on LifeCLEF 2014 dataset that contains 24277 fish images of 12 species. Choi [11] participated in LifeCLEF 2015 task for detecting and identifying fish in underwater videos and achieved the best performance of 81% in this task [12]. He detected fish by using background subtraction and a selective search strategy [13]. Then, he used the GoogleNet [14] based on convolutional neural networks to classify fish species. Qin et al. [3] used convolutional neural networks on the Fish Recognition Ground-Truth dataset consisting of a total of 27370 fish images of 23 species and they reached an accuracy of 98.57%. Qin et al. [4] used also deep architecture to extract the features of fish images. In their architecture, two convolutional layers use Principal Component Analysis (PCA), the non-linear layer uses a binary hashing and the feature pooling layer uses a block-wise histograms. Then, information invariant to large poses are extracted by using spatial pyramid pooling (SPP). Finally, they use a linear SVM classifier for the classification. Despite they introduced hand-crafted layers they have improved marginally the accuracy by 0.07%. Sun et al. [5] extracted features from underwater images by applying two deep learning architectures, PCANet [15] and Network In Network (NIN) [16]. For classification, they used a linear SVM classifier. They tested their model on a database of 15 species and obtained an accuracy of 69.84% with the NIN architecture and 77.27% with the PCANet architecture. Salman et al. [17] created a deep architecture of three convolutional layers to extract features, then they combined the features from multiple layers of the network to feed standard classifiers like SVM and KNN. They achieved an accuracy of 96.75% on test set of 7500 fish images issue from LifeCLEF 2015 Fish dataset.

Learning deep architectures from scratch necessitates a large dataset because of the huge number of weights to be trained. Available underwater datasets are of small size for learning ConvNets for underwater fish recognition. To overcome this problem, we introduce transfer learning framework [18] to train ConvNets from pretrained networks that could be trained on large datasets. AlexNet [10], GoogleNet [14], VGG [19] and ResNet [20] are some examples of pretrained models that have emerged in this field last few years. In this paper, we propose to transfer the learned weights from AlexNet model to a deep ConvNet for fish recognition in the open sea. We extract the fish features from images of the underwater dataset before and after fine-tuning the model and classify the input images with a linear SVM classifier on the extracted features. We choose AlexNet because this model needs less resources, is faster and has a simple architecture than others networks like GoogleNet (22 layers deep) and VGG (at least 16 convolutional layers) that make fine-tuning the transferred weights difficult especially with limited training data.

The paper is organized as follows: in the next Sect. 2, we describe the proposed approach for live fish recognition based on pretrained model. Then, the Sect. 3 details the experimental scheme and we give a comparative study evaluating the proposed approach on the Fish Recognition Ground-Truth dataset (c.f. Fig. 2). Finally, conclusions and perspectives are given in Sect. 4.

2 Proposed Approach

Available underwater datasets for fish recognition are too small for training deep ConvNets from scratch with random initialization. Moreover, deep learning requires immense resources of memories and processors. To overcome the difficulties imposed by limited training data, we use trained weights of AlexNet to extract fish features from images by removing some layers of the model and then using the rest of the network as a fixed feature extractor for our data. In order to demonstrate the effectiveness of fine-tuning approach, we extract features before and after fine-tuning. Fine-tuning algorithm consists of retraining the classifier on top of the network on the underwater image set and fine-tune the weights of the AlexNet via back-propagation. Finally, we will propose three schemes for classification. These schemes will be detailed below.

2.1 Architecture of AlexNet

As shown in the Fig. 1, AlexNet [10] has five convolutional layers. The number of filters and their size in these layers are 96 filters of size \(11 \times 11 \times 3\), 256 filters of size \(5 \times 5 \times 48\), 384 filters of size \(3 \times 3 \times 256\), 384 filters of size \(3 \times 3 \times 192\) and 256 filters of size \(3 \times 3 \times 192\) respectively. It has also three fully-connected layers with 4096 neurons in the two first layers and the last one has 1000 neurons. AlexNet has been trained over 1.2 million images from the ImageNet dataset [21]. It can classify images into 1000 categories of objects (such as keyboard, mouse, coffee cup, pen and many animals).

2.2 Input Images

First, we eliminate the background of images using the fish masks given in the dataset (c.f. Fig. 3). These masks are generated by Qin et al. [22] who proposed a foreground extraction method for underwater videos based on sparse and low-rank matrix decomposition. Then, we use foreground fish images as input training images after a resizing to the same size \(227 \times 227 \times 3\).

2.3 Feature Extraction and Classification

We first employ the AlexNet model without any fine-tuning to extract learned features by removing the output layer fc8 and using the value outputs of fc7’s layer as feature descriptor for each fish image, we denote this scheme by Alex-SVM.

It is recommended to fine-tune the model, especially, when data similarity is very low between the original data and the new data. As the dataset used in this work is totally different from ImageNet, we will retrain the AlexNet model on the underwater dataset. We initialize a new fully-connected layer to replace the old one fc8 with a random values of 23 outputs corresponding to the 23 species in the considered dataset. Then, we retrain only this layer and we keep the weights of the lower layers. This is because the features captured in the lower layers are universal like edges and curves that are also pertinent to our task. We get a new AlexNet model retrained, we denote this scheme by Alex-FT-Soft.

Finally, we use the retrained AlexNet model to re-extract fish features from images as in the first one. We denote this last scheme by Alex-FT-SVM.

Table 1. The fish species distribution in the Fish Recognition Ground-Truth dataset.

Full size table

3 Experimental Results

The performances evaluation of the proposed system is carried out on the Fish Recognition Ground-Truth dataset^{Footnote 1}. We implement the algorithms in Matlab and use MatConvNet^{Footnote 2} library for training the deep ConvNet.

The Fish Recognition Ground-Truth dataset is an underwater live fish image dataset acquired from a live video dataset made by the European project Fish4-Knowledge^{Footnote 3} (F4K) [23]. The dataset contains 27370 fish images and their fish masks of 23 different species. The fish species are manually labeled by following instructions from marine biologists. The dataset is imbalanced in the number of different fish species where the number of the most frequent species is about 1000 times more than the least one. The Fig. 2 shows examples of the 23 fish species and Table 1 shows the distribution of the fish species in the dataset.

Table 2. Comparison of fish recognition performances of various methods on the Fish Recognition Ground-Truth dataset.

Full size table

We use 7-Fold Cross-Validation in order to estimate the performance of the proposed approach [4]. The results of classification using the three proposed schemes are given in the Table 2 in terms of accuracy and precision.

As shown in Table 2, proposed schemes are efficient and give promising results. Alex-FT-SVM performs better than Alex-SVM, this is because in Alex-SVM the higher layers of the network are more precise to the details of the objects contained in ImageNet dataset. However, after fine-tuning, these layers become more precise to the details of the fish species contained in our dataset, therefore, we achieve the best accuracy of 99.45%. We conclude that the fine-tuning improves the performance of the system. We can also see that the Softmax classifier is less robust than SVM classifier, especially, for species with fewer samples like ‘Zebrasoma scopas’, ‘Hemigymnus melapterus’, ‘Scolopsis bilineata’, ‘Scaridae’, ‘Pempheris vanicolensis’, ‘Zanclus cornutus’, ‘Balistapus undulates’, ‘Neoglyphidodon nigroris’ and ‘Siganus fuscescens’.

Table 2 shows also the comparison of our approach with state-of-the-art methods on the Fish Recognition Ground-Truth dataset. In this work, we want to test a purely deep learning-based system without any layers that contain hand-crafted methods like PCA, block-wise histograms, spatial pyramid pooling (SSP) [4], nor any method to improve the performance like data augmentation or scale images as in DeepFish-SVM-aug and DeepFish-SVM-aug-scale [4]. We can observe that Alex-FT-SVM outperforms the state-of-the-art methods even those with hand-crafted layers. In Deep-CNN [3], the authors created a ConvNet with three convolutional layers and trained the network from scratch. As we can see the network trained with transfer learning gives better results than networks trained from scratch.

The Fig. 4 visualizes weights of the 96 filters, 32 filters and 64 filters in the first convolutional layer of the adopted AlexNet, DeepFish and DeppCNN respectively. The color filters extract low-frequency features and the grayscale filters extract high-frequency features. As we can see, the AlexNet has more filters than DeepFish and DeepCNN which means more variety of selective filters extracting more features at different scales and different orientations. We note that additional filters are all very nice, smooth, well-formed and without noisy patterns that make AlexNet richer by feature representations.

4 Conclusion and Future Work

Underwater live fish recognition will become a necessary tool to assist marine ecologists in studying the biodiversity in underwater areas because traditional techniques are destructive, affect fish behavior, demand time and labor costs. Proposed convolutional neural networks for fish identification require large datasets due to the huge number of parameters to be trained, especially in deeper networks. Transfer learning is a solution for training this kind of networks by using pretrained models which have been trained on a large dataset.

In this paper, we proposed to transfer learned weights of the pretrained network AlexNet which has been trained on ImageNet dataset to recognize fish species in underwater images. We have extracted fish features from images by AlexNet to feed a linear SVM before and after fine-tuning.

Experiments on the Fish Recognition Ground-Truth dataset demonstrate that the proposed approach outperforms various other approaches employed for fish species identification.

In future work, we plan to extend the method proposed here for larger underwater video datasets with more classes.

Notes

References

Spampinato, C., Giordano, D., Di Salvo, R., Chen-Burger, Y.H.J., Fisher, R.B., Nadarajan, G.: Automatic fish classification for underwater species behavior understanding. In: Proceedings of the First ACM International Workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Streams, pp. 45–50. ACM, October 2010
Google Scholar
Li, X., Shang, M., Qin, H., Chen, L.: Fast accurate fish detection and recognition of underwater images with fast R-CNN. In: OCEANS 2015 MTS/IEEE Washington, pp. 1–5. IEEE, October 2015
Google Scholar
Qin, H., Li, X., Yang, Z., Shang, M.: When underwater imagery analysis meets deep learning: a solution at the age of big visual data. In: OCEANS 2015 MTS/IEEE Washington, pp. 1–5. IEEE, October 2015
Google Scholar
Qin, H., Li, X., Liang, J., Peng, Y., Zhang, C.: Deepfish: accurate underwater live fish recognition with a deep architecture. Neurocomputing 187, 49–58 (2015)
Article Google Scholar
Sun, X., Shi, J., Dong, J., Wang, X.: Fish recognition from low-resolution underwater images. In: International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), pp. 471–476. IEEE, October 2016
Google Scholar
Chuang, M.C., Hwang, J.N., Williams, K.: Automatic fish segmentation and recognition for trawl-based cameras. In: Computer Vision and Pattern Recognition in Environmental Informatics, pp. 79–106. IGI Global (2016)
Google Scholar
Schettini, R., Corchs, S.: Underwater image processing: state of the art of restoration and image enhancement methods. EURASIP J. Adv. Signal Process. 2010(1), 746052 (2010)
Article Google Scholar
Chambah, M., Semani, D., Renouf, A., Courtellemont, P., Rizzi, A.: Underwater color constancy: enhancement of automatic live fish recognition. In: Color Imaging IX: Processing, Hardcopy, and Applications. International Society for Optics and Photonics, Vol. 5293, pp. 157–169, December 2003
Google Scholar
LeCun, Y., Boser, B.E., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W.E., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Advances in Neural Information Processing Systems, pp. 396–404 (1990)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Choi, S: Fish identification in underwater video with deep convolutional neural network: SNUMedinfo at LifeCLEF fish task 2015. In: CLEF (2015). Working Notes
Google Scholar
Joly, A., et al.: LifeCLEF 2015: multimedia life species identification challenges. In: Mothe, J., et al. (eds.) CLEF 2015. LNCS, vol. 9283, pp. 462–483. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24027-5_46
Chapter Google Scholar
Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)
Article Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Chan, T.H., Jia, K., Gao, S., Lu, J., Zeng, Z., Ma, Y.: PCANet: a simple deep learning baseline for image classification? IEEE Trans. Image Process. 24(12), 5017–5032 (2015)
Article MathSciNet Google Scholar
Lin, M., Chen, Q., Yan, S.: Network in network (2013). arXiv preprint arXiv:1312.4400
Salman, A., Jalal, A., Shafait, F., Mian, A., Shortis, M., Seager, J., Harvey, E.: Fish species classification in unconstrained underwater environments based on deep learning. Limnol. Oceanogr. Methods 14(9), 570–585 (2016)
Article Google Scholar
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint arXiv:1409.1556
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR 2009, pp. 248–255. IEEE, June 2009
Google Scholar
Qin, H., Peng, Y., Li, X.: Foreground extraction of underwater videos via sparse and low-rank matrix decomposition. In: 2014 ICPR Workshop on Computer Vision for Analysis of Underwater Imagery (CVAUI), pp. 65–72. IEEE, August 2014
Google Scholar
Boom, B.J., Huang, P.X., He, J., Fisher, R.B.: Supporting ground-truth annotation of image datasets using clustering. In: 2012 21st International Conference on Pattern Recognition (ICPR), pp. 1542–1545. IEEE, November 2012
Google Scholar

Download references

Acknowledgments

The authors would like to thank the Région Bretagne for financial support.

Author information

Authors and Affiliations

Univ Bretagne Loire, ENIB, UMR CNRS 6285 LabSTICC, 29238, Brest, France
Abdelouahid Ben Tamou, Abdesslam Benzinou & Kamal Nasreddine
LRIT-CNRST URAC 29, Mohammed V University In Rabat, FSR, Rabat, Morocco
Abdelouahid Ben Tamou & Lahoucine Ballihi

Authors

Abdelouahid Ben Tamou
View author publications
You can also search for this author in PubMed Google Scholar
Abdesslam Benzinou
View author publications
You can also search for this author in PubMed Google Scholar
Kamal Nasreddine
View author publications
You can also search for this author in PubMed Google Scholar
Lahoucine Ballihi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdesslam Benzinou .

Editor information

Editors and Affiliations

Université de Bourgogne, Dijon, France
Alamin Mansouri
Université de Caen Normandie, Caen, France
Abderrahim El Moataz
Université du Québec à Trois-Rivières, Trois-Rivieres, Québec, Canada
Fathallah Nouboud
Université Ibn Zohr, Agadir, Morocco
Driss Mammass

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tamou, A.B., Benzinou, A., Nasreddine, K., Ballihi, L. (2018). Underwater Live Fish Recognition by Deep Learning. In: Mansouri, A., El Moataz, A., Nouboud, F., Mammass, D. (eds) Image and Signal Processing. ICISP 2018. Lecture Notes in Computer Science(), vol 10884. Springer, Cham. https://doi.org/10.1007/978-3-319-94211-7_30

Download citation

DOI: https://doi.org/10.1007/978-3-319-94211-7_30
Published: 30 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94210-0
Online ISBN: 978-3-319-94211-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)