Skip to main content

Advertisement

Log in

Multi-stream fish detection in unconstrained underwater videos by the fusion of two convolutional neural network detectors

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Recently, marine biologists have begun using underwater videos to study species diversity and fish abundance. These techniques generate a large amount of visual data. Automatic analysis using image processing is therefore necessary, since manual processing is time-consuming and labor-intensive. However, there are numerous challenges to implementing the automatic processing of underwater images: for example, high luminosity variation, limited visibility, complex background, free movement of fish, and high diversity of fish species. In this paper, we propose two new fusion approaches that exploit two convolutional neural network (CNN) streams to merge both appearance and motion information for automatic fish detection. These approaches consist of two Faster R-CNN models that share either the same region proposal network or the same classifier. We significantly improve the fish detection performances on the LifeClef 2015 Fish benchmark dataset not only compared with the classic Faster R-CNN but also with all the state-of-the-art approaches. The best F-score and mAP measures are 83.16% and 73.69%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. https://www.noaa.gov/oceans-coasts

  2. https://www.imageclef.org/2014/lifeclef/fish

  3. https://www.imageclef.org/lifeclef/2016/sea

  4. https://www.imageclef.org/lifeclef/2015/fish

  5. https://github.com/vinthony

References

  1. Zivkovic Z (2004) Improved adaptive Gaussian mixture model for background subtraction. In: Proceedings of the 17th international conference on pattern recognition. ICPR 2004, vol 2. IEEE, pp 28–31

  2. Spampinato C, Chen-Burger YH, Nadarajan G, Fisher RB (2008) Detecting, tracking and counting fish in low quality unconstrained underwater videos. VISAPP 1(2):514–519

    Google Scholar 

  3. Hsiao YH, Chen CC, Lin SI, Lin FP (2014) Real-world underwater fish recognition and identification, using sparse representation. Ecol Inf 23:13–21

    Article  Google Scholar 

  4. Bengio Y (2009) Learning deep architectures for AI. Now Publishers Inc.

  5. LeCun Y, Bengio Y (1995) Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361(10), 1995

  6. Li X, Shang M, Qin H, Chen L (2015) Fast accurate fish detection and recognition of underwater images with fast r-cnn. In: OCEANS 2015-MTS/IEEE Washington. IEEE, pp 1–5

  7. Li X, Shang M, Hao J, Yang Z (2016) Accelerating fish detection and recognition by sharing CNNs with objectness learning. In: OCEANS 2016-Shanghai. IEEE, pp 1–5

  8. Li X, Tang Y, Gao T (2017) Deep but lightweight neural networks for fish detection. In: OCEANS 2017-Aberdeen. IEEE, pp 1–5

  9. Hong S, Roh B, Kim KH, Cheon Y, Park M (2016) PVANet: lightweight deep neural networks for real-time object detection. arXiv:1611.08588

  10. Mandal R, Connolly RM, Schlacher TA, Stantic B (2018) Assessing fish abundance from underwater video using deep neural networks. In: 2018 international joint conference on neural networks (IJCNN). IEEE, pp 1–6

  11. Zhuang P, Xing L, Liu Y, Guo S, Qiao Y (2017) Marine animal detection and recognition with advanced deep learning models. In: CLEF (Working Notes)

  12. Shi C, Jia C, Chen Z (2018) FFDet: a fully convolutional network for coral reef fish detection by layer fusion. In: 2018 IEEE visual communications and image processing (VCIP). IEEE, pp 1–4

  13. Sung M, Yu SC, Girdhar Y (2017) Vision based real-time fish detection using convolutional neural network. In: OCEANS 2017-Aberdeen. IEEE, pp 1–6

  14. Jäger J, Rodner E, Denzler J, Wolff V, Fricke-Neuderth K (2016) SeaCLEF 2016: object proposal classification for fish detection in underwater videos. In: CLEF (Working Notes), pp 481– 489

  15. Zhang D, Kopanas G, Desai C, Chai S, Piacentino M (2016) Unsupervised underwater fish detection fusing flow and objectiveness. In: 2016 IEEE winter applications of computer vision workshops (WACVW). IEEE, pp 1–7

  16. Salman A, Siddiqui SA, Shafait F, Mian A, Shortis MR, Khurshid K, Schwanecke U (2019) Automatic fish detection in underwater videos by a deep neural network-based hybrid motion learning system. ICES J Marine Sci

  17. Farahnakian F, Heikkonen J (2020) Deep learning based multi-modal fusion architectures for maritime vessel detection. Remote Sens 12(16):2509

    Article  Google Scholar 

  18. Zhu X, Chen C, Zheng B, Yang X, Gan H, Zheng C, Xue Y (2020) Automatic recognition of lactating sow postures by refined two-stream RGB-D faster R-CNN. Biosys Eng 189:116–132

    Article  Google Scholar 

  19. Guerry J, Le Saux B, Filliat D (2017) “Look at this one” detection sharing between modality-independent classifiers for robotic discovery of people. In: 2017 European conference on mobile robots (ECMR). IEEE, pp 1–6

  20. Wang Y, Song J, Wang L, Van Gool L, Hilliges O (2016) Two-stream SR-CNNs for action recognition in videos. In: BMVC

  21. Morvant E, Habrard A, Ayache S (2014) Majority vote of diverse classifiers for late fusion. In: Joint IAPR international workshops on statistical techniques in pattern recognition (SPR) and structural and syntactic pattern recognition (SSPR). Springer, Berlin, pp 153–162

  22. He K, Cao X, Shi Y, Nie D, Gao Y, Shen D (2018) Pelvic organ segmentation using distinctive curve guided fully convolutional networks. IEEE Trans Med Imag 38(2):585–595

    Article  Google Scholar 

  23. Monkam P, Qi S, Xu M, Li H, Han F, Teng Y, Qian W (2018) Ensemble learning of multiple-view 3D-CNNs model for micro-nodules identification in CT images. IEEE Access 7:5564–5576

    Article  Google Scholar 

  24. Wöllmer M, Weninger F, Knaup T, Schuller B, Sun C, Sagae K, Morency LP (2013) Youtube movie reviews: sentiment analysis in an audio-visual context. IEEE Intell Syst 28(3):46–53

    Article  Google Scholar 

  25. Poria S, Cambria E, Howard N, Huang GB, Hussain A (2016) Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Neurocomputing 174:50–59

    Article  Google Scholar 

  26. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  27. Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171

    Article  Google Scholar 

  28. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  29. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  30. Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387

  31. Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229

  32. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  33. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, Cham, pp 21–37

  34. Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988

  35. Corneanu CA, Simón MO, Cohn JF, Guerrero SE (2016) Survey on rgb, 3d, thermal, and multimodal approaches for facial expression recognition: history, trends, and affect-related applications. IEEE Trans Pattern Anal Mach Intell 38(8):1548–1568

    Article  Google Scholar 

  36. Potamianos G, Neti C, Gravier G, Garg A, Senior AW (2003) Recent advances in the automatic recognition of audiovisual speech. Proc IEEE 91(9):1306–1326

    Article  Google Scholar 

  37. Chen C, Jafari R, Kehtarnavaz N (2017) A survey of depth and inertial sensor fusion for human action recognition. Multimedia Tools Appl 76(3):4405–4425

    Article  Google Scholar 

  38. James AP, Dasarathy BV (2014) Medical image fusion: a survey of the state of the art. Inf Fus 19:4–19

    Article  Google Scholar 

  39. Liu X, Zhu X, Li M, Wang L, Zhu E, Liu T, Gao W (2019) Multiple kernel k k-means with incomplete kernels. IEEE Trans Pattern Anal Mach Intell 42(5):1191–1204

    Google Scholar 

  40. Peng X, Schmid C (2016) Multi-region two-stream R-CNN for action detection. In: European conference on computer vision. Springer, Cham, pp 744–759

  41. Yu X, Ye X, Gao Q (2020) Infrared handprint image restoration algorithm based on apoptotic mechanism. IEEE Access 8:47334–47343

    Article  Google Scholar 

  42. Bianco G, Muzzupappa M, Bruno F, Garcia R, Neumann L (2015) A new color correction method for underwater imaging. Int Arch Photog Remote Sens Spat Inf Sci 40(5):25

    Article  Google Scholar 

  43. Horn B, Berthold KP (1981) Schunck. Determining optical flow. Artif Intell 17(1–3):185–203

    Article  Google Scholar 

  44. Tsai YH, Yang MH, Black MJ (2016) Video segmentation via object flow. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3899–3908

  45. Xu D, Yan Y, Ricci E, Sebe N (2017) Detecting anomalous events in videos by learning deep representations of appearance and motion. Comput Vis Image Understand 156:117–127

    Article  Google Scholar 

  46. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, pp 568–576

  47. Xiao F, Jae Lee Y (2016) Track and segment: an iterative unsupervised approach for video object proposals. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 933–942

  48. Tu Z, Xie W, Zhang D, Poppe R, Veltkamp RC, Li B, Yuan J (2019) A survey of variational and CNN-based optical flow techniques. Sig Process Image Commun 72:9–24

    Article  Google Scholar 

  49. Zach C, Pock T, Bischof H (2007) A duality based approach for realtime TV-L 1 optical flow. In: Joint pattern recognition symposium. Springer, Berlin, pp 214–223

  50. Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Murphy K (2017) Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7310–7311

  51. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097– 1105

  52. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  53. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  54. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  55. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255

  56. Zuo Z, Yu K, Zhou Q, Wang X, Li T (2017) Traffic signs detection based on faster r-cnn. In: 2017 IEEE 37th international conference on distributed computing systems workshops (ICDCSW). IEEE, pp 286–288

  57. Lei HW, Wang B, Wu HH, Wang AH (2018) Defect detection for polymeric polarizer based on faster R-CNN. J Inf Hid Multimed Sign Process 9:1414–1420

    Google Scholar 

  58. Boom BJ, Huang PX, He J, Fisher RB (2012) Supporting ground-truth annotation of image datasets using clustering. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012). IEEE, pp 1542–1545

Download references

Acknowledgements

The authors would like to thank the Région Bretagne for financial support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdesslam Benzinou.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ben Tamou, A., Benzinou, A. & Nasreddine, K. Multi-stream fish detection in unconstrained underwater videos by the fusion of two convolutional neural network detectors. Appl Intell 51, 5809–5821 (2021). https://doi.org/10.1007/s10489-020-02155-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-02155-8

Keywords

Navigation