Multi-stream fish detection in unconstrained underwater videos by the fusion of two convolutional neural network detectors

Ben Tamou, Abdelouahid; Benzinou, Abdesslam; Nasreddine, Kamal

doi:10.1007/s10489-020-02155-8

Multi-stream fish detection in unconstrained underwater videos by the fusion of two convolutional neural network detectors

Published: 16 January 2021

Volume 51, pages 5809–5821, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Abdelouahid Ben Tamou ORCID: orcid.org/0000-0002-8260-5569^1,2,
Abdesslam Benzinou¹ &
Kamal Nasreddine¹

944 Accesses
23 Citations
Explore all metrics

Abstract

Recently, marine biologists have begun using underwater videos to study species diversity and fish abundance. These techniques generate a large amount of visual data. Automatic analysis using image processing is therefore necessary, since manual processing is time-consuming and labor-intensive. However, there are numerous challenges to implementing the automatic processing of underwater images: for example, high luminosity variation, limited visibility, complex background, free movement of fish, and high diversity of fish species. In this paper, we propose two new fusion approaches that exploit two convolutional neural network (CNN) streams to merge both appearance and motion information for automatic fish detection. These approaches consist of two Faster R-CNN models that share either the same region proposal network or the same classifier. We significantly improve the fish detection performances on the LifeClef 2015 Fish benchmark dataset not only compared with the classic Faster R-CNN but also with all the state-of-the-art approaches. The best F-score and mAP measures are 83.16% and 73.69%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

AcousticIA, a deep neural network for multi-species fish detection using multiple models of acoustic cameras

Article 20 January 2023

Temperate fish detection and classification: a deep learning based approach

Article Open access 22 March 2021

Marine Vertebrate Predator Detection and Recognition in Underwater Videos by Region Convolutional Neural Network

Notes

References

Zivkovic Z (2004) Improved adaptive Gaussian mixture model for background subtraction. In: Proceedings of the 17th international conference on pattern recognition. ICPR 2004, vol 2. IEEE, pp 28–31
Spampinato C, Chen-Burger YH, Nadarajan G, Fisher RB (2008) Detecting, tracking and counting fish in low quality unconstrained underwater videos. VISAPP 1(2):514–519
Google Scholar
Hsiao YH, Chen CC, Lin SI, Lin FP (2014) Real-world underwater fish recognition and identification, using sparse representation. Ecol Inf 23:13–21
Article Google Scholar
Bengio Y (2009) Learning deep architectures for AI. Now Publishers Inc.
LeCun Y, Bengio Y (1995) Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361(10), 1995
Li X, Shang M, Qin H, Chen L (2015) Fast accurate fish detection and recognition of underwater images with fast r-cnn. In: OCEANS 2015-MTS/IEEE Washington. IEEE, pp 1–5
Li X, Shang M, Hao J, Yang Z (2016) Accelerating fish detection and recognition by sharing CNNs with objectness learning. In: OCEANS 2016-Shanghai. IEEE, pp 1–5
Li X, Tang Y, Gao T (2017) Deep but lightweight neural networks for fish detection. In: OCEANS 2017-Aberdeen. IEEE, pp 1–5
Hong S, Roh B, Kim KH, Cheon Y, Park M (2016) PVANet: lightweight deep neural networks for real-time object detection. arXiv:1611.08588
Mandal R, Connolly RM, Schlacher TA, Stantic B (2018) Assessing fish abundance from underwater video using deep neural networks. In: 2018 international joint conference on neural networks (IJCNN). IEEE, pp 1–6
Zhuang P, Xing L, Liu Y, Guo S, Qiao Y (2017) Marine animal detection and recognition with advanced deep learning models. In: CLEF (Working Notes)
Shi C, Jia C, Chen Z (2018) FFDet: a fully convolutional network for coral reef fish detection by layer fusion. In: 2018 IEEE visual communications and image processing (VCIP). IEEE, pp 1–4
Sung M, Yu SC, Girdhar Y (2017) Vision based real-time fish detection using convolutional neural network. In: OCEANS 2017-Aberdeen. IEEE, pp 1–6
Jäger J, Rodner E, Denzler J, Wolff V, Fricke-Neuderth K (2016) SeaCLEF 2016: object proposal classification for fish detection in underwater videos. In: CLEF (Working Notes), pp 481– 489
Zhang D, Kopanas G, Desai C, Chai S, Piacentino M (2016) Unsupervised underwater fish detection fusing flow and objectiveness. In: 2016 IEEE winter applications of computer vision workshops (WACVW). IEEE, pp 1–7
Salman A, Siddiqui SA, Shafait F, Mian A, Shortis MR, Khurshid K, Schwanecke U (2019) Automatic fish detection in underwater videos by a deep neural network-based hybrid motion learning system. ICES J Marine Sci
Farahnakian F, Heikkonen J (2020) Deep learning based multi-modal fusion architectures for maritime vessel detection. Remote Sens 12(16):2509
Article Google Scholar
Zhu X, Chen C, Zheng B, Yang X, Gan H, Zheng C, Xue Y (2020) Automatic recognition of lactating sow postures by refined two-stream RGB-D faster R-CNN. Biosys Eng 189:116–132
Article Google Scholar
Guerry J, Le Saux B, Filliat D (2017) “Look at this one” detection sharing between modality-independent classifiers for robotic discovery of people. In: 2017 European conference on mobile robots (ECMR). IEEE, pp 1–6
Wang Y, Song J, Wang L, Van Gool L, Hilliges O (2016) Two-stream SR-CNNs for action recognition in videos. In: BMVC
Morvant E, Habrard A, Ayache S (2014) Majority vote of diverse classifiers for late fusion. In: Joint IAPR international workshops on statistical techniques in pattern recognition (SPR) and structural and syntactic pattern recognition (SSPR). Springer, Berlin, pp 153–162
He K, Cao X, Shi Y, Nie D, Gao Y, Shen D (2018) Pelvic organ segmentation using distinctive curve guided fully convolutional networks. IEEE Trans Med Imag 38(2):585–595
Article Google Scholar
Monkam P, Qi S, Xu M, Li H, Han F, Teng Y, Qian W (2018) Ensemble learning of multiple-view 3D-CNNs model for micro-nodules identification in CT images. IEEE Access 7:5564–5576
Article Google Scholar
Wöllmer M, Weninger F, Knaup T, Schuller B, Sun C, Sagae K, Morency LP (2013) Youtube movie reviews: sentiment analysis in an audio-visual context. IEEE Intell Syst 28(3):46–53
Article Google Scholar
Poria S, Cambria E, Howard N, Huang GB, Hussain A (2016) Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Neurocomputing 174:50–59
Article Google Scholar
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171
Article Google Scholar
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, Cham, pp 21–37
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Corneanu CA, Simón MO, Cohn JF, Guerrero SE (2016) Survey on rgb, 3d, thermal, and multimodal approaches for facial expression recognition: history, trends, and affect-related applications. IEEE Trans Pattern Anal Mach Intell 38(8):1548–1568
Article Google Scholar
Potamianos G, Neti C, Gravier G, Garg A, Senior AW (2003) Recent advances in the automatic recognition of audiovisual speech. Proc IEEE 91(9):1306–1326
Article Google Scholar
Chen C, Jafari R, Kehtarnavaz N (2017) A survey of depth and inertial sensor fusion for human action recognition. Multimedia Tools Appl 76(3):4405–4425
Article Google Scholar
James AP, Dasarathy BV (2014) Medical image fusion: a survey of the state of the art. Inf Fus 19:4–19
Article Google Scholar
Liu X, Zhu X, Li M, Wang L, Zhu E, Liu T, Gao W (2019) Multiple kernel k k-means with incomplete kernels. IEEE Trans Pattern Anal Mach Intell 42(5):1191–1204
Google Scholar
Peng X, Schmid C (2016) Multi-region two-stream R-CNN for action detection. In: European conference on computer vision. Springer, Cham, pp 744–759
Yu X, Ye X, Gao Q (2020) Infrared handprint image restoration algorithm based on apoptotic mechanism. IEEE Access 8:47334–47343
Article Google Scholar
Bianco G, Muzzupappa M, Bruno F, Garcia R, Neumann L (2015) A new color correction method for underwater imaging. Int Arch Photog Remote Sens Spat Inf Sci 40(5):25
Article Google Scholar
Horn B, Berthold KP (1981) Schunck. Determining optical flow. Artif Intell 17(1–3):185–203
Article Google Scholar
Tsai YH, Yang MH, Black MJ (2016) Video segmentation via object flow. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3899–3908
Xu D, Yan Y, Ricci E, Sebe N (2017) Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput Vis Image Understand 156:117–127
Article Google Scholar
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, pp 568–576
Xiao F, Jae Lee Y (2016) Track and segment: an iterative unsupervised approach for video object proposals. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 933–942
Tu Z, Xie W, Zhang D, Poppe R, Veltkamp RC, Li B, Yuan J (2019) A survey of variational and CNN-based optical flow techniques. Sig Process Image Commun 72:9–24
Article Google Scholar
Zach C, Pock T, Bischof H (2007) A duality based approach for realtime TV-L 1 optical flow. In: Joint pattern recognition symposium. Springer, Berlin, pp 214–223
Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Murphy K (2017) Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7310–7311
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097– 1105
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255
Zuo Z, Yu K, Zhou Q, Wang X, Li T (2017) Traffic signs detection based on faster r-cnn. In: 2017 IEEE 37th international conference on distributed computing systems workshops (ICDCSW). IEEE, pp 286–288
Lei HW, Wang B, Wu HH, Wang AH (2018) Defect detection for polymeric polarizer based on faster R-CNN. J Inf Hid Multimed Sign Process 9:1414–1420
Google Scholar
Boom BJ, Huang PX, He J, Fisher RB (2012) Supporting ground-truth annotation of image datasets using clustering. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012). IEEE, pp 1542–1545

Download references

Acknowledgements

The authors would like to thank the Région Bretagne for financial support.

Author information

Authors and Affiliations

ENIB, UMR CNRS 6285 LabSTICC, Brest, 29238, France
Abdelouahid Ben Tamou, Abdesslam Benzinou & Kamal Nasreddine
LRIT-CNRST URAC 29, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco
Abdelouahid Ben Tamou

Authors

Abdelouahid Ben Tamou
View author publications
You can also search for this author in PubMed Google Scholar
Abdesslam Benzinou
View author publications
You can also search for this author in PubMed Google Scholar
Kamal Nasreddine
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdesslam Benzinou.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ben Tamou, A., Benzinou, A. & Nasreddine, K. Multi-stream fish detection in unconstrained underwater videos by the fusion of two convolutional neural network detectors. Appl Intell 51, 5809–5821 (2021). https://doi.org/10.1007/s10489-020-02155-8

Download citation

Accepted: 15 December 2020
Published: 16 January 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s10489-020-02155-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-stream fish detection in unconstrained underwater videos by the fusion of two convolutional neural network detectors

Abstract

Access this article

Similar content being viewed by others

AcousticIA, a deep neural network for multi-species fish detection using multiple models of acoustic cameras

Temperate fish detection and classification: a deep learning based approach

Marine Vertebrate Predator Detection and Recognition in Underwater Videos by Region Convolutional Neural Network

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-stream fish detection in unconstrained underwater videos by the fusion of two convolutional neural network detectors

Abstract

Access this article

Similar content being viewed by others

AcousticIA, a deep neural network for multi-species fish detection using multiple models of acoustic cameras

Temperate fish detection and classification: a deep learning based approach

Marine Vertebrate Predator Detection and Recognition in Underwater Videos by Region Convolutional Neural Network

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation