Video-Based Recognition of Aquatic Invasive Species Larvae Using Attention-LSTM Transformer

Chowdhury, Shaif; Tisha, Sadia Nasrin; McGarrity, Monica E.; Hamerly, Greg

doi:10.1007/978-3-031-47969-4_18

Shaif Chowdhury¹⁶,
Sadia Nasrin Tisha¹⁶,
Monica E. McGarrity¹⁷ &
…
Greg Hamerly¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14361))

Included in the following conference series:

International Symposium on Visual Computing

511 Accesses
1 Citations

Abstract

Aquatic species like zebra and quagga mussels are invasive in United States waterways and cause ecological and economic damage. Due to the time-consuming nature of conventional early detection methods, there is a need for automated systems to detect and classify invasive and non-invasive species using a video-based system without any human supervision. We present a video classification model for rapidly recognizing invasive and non-invasive mussel larvae from plankton or water sample videos.

Many recent video recognition models are transformer-based and use a combination of spatial and temporal attention, often with large-scale pre-training. We present a model with a CNN-based patch encoder and transformer blocks consisting of temporal attention with LSTM that is end-to-end trainable and effective without pre-training. Based on detailed experiments, the Attention-LSTM model significantly improves over state-of-the-art video classification models, classifying invasive and non-invasive larvae with \(99\%\) balanced accuracy. Our code is available at https://anonymous.4open.science/r/AttLSTM-10CF/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M., Schmid, C.: Vivit: a video vision transformer. In: CVPR, pp. 6836–6846 (2021)
Google Scholar
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Article Google Scholar
Bertasius, G., Wang, H., Torresani, L.: Is space-time attention all you need for video understanding? In: ICML, vol. 2, p. 4 (2021)
Google Scholar
Chen, J., et al.: Transunet: transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
Chen, Z., et al.: Vision transformer adapter for dense predictions. arXiv preprint arXiv:2205.08534 (2022)
Chowdhury, S., Hamerly, G.: Recognition of aquatic invasive species larvae using autoencoder-based feature averaging. In: Bebis, G., et al. (eds.) ISVC 2022. LNCS, vol. 13598, pp. 145–161. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20713-6_11
Chapter Google Scholar
Churchill, C.J., Baldys, S.: USGS zebra mussel monitoring program for north Texas. US Department of the Interior, US Geological Survey (2012)
Google Scholar
Connelly, N.A., ONeill, C.R., Knuth, B.A., Brown, T.L.: Economic impacts of zebra mussels on drinking water treatment and electric power generation facilities. Environ. Manag. 40(1), 105–112 (2007)
Google Scholar
Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: CVPR, pp. 2625–2634 (2015)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR (2020)
Google Scholar
Gao, Z., Tan, C., Wu, L., Li, S.Z.: Simvp: simpler yet better video prediction. In: CVPR, pp. 3170–3180 (2022)
Google Scholar
Guo, M., et al.: Longt5: efficient text-to-text transformer for long sequences. arXiv preprint arXiv:2112.07916 (2021)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Jiang, Z., Zhao, C., Wang, H.: Classification of underwater target based on S-ResNet and modified DCGAN models. Sensors 22(6), 2293 (2022)
Article Google Scholar
Johnson, L.E.: Enhanced early detection and enumeration of zebra mussel (dreissena spp.) veligers using cross-polarized light microscopy. Hydrobiologia 312, 139–146 (1995)
Google Scholar
Kay, W., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)
Khan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S., Shah, M.: Transformers in vision: a survey. ACM Comput. Surv. (CSUR) 54(10s), 1–41 (2022)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Article Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: CVPR, pp. 10012–10022 (2021)
Google Scholar
Lucy, F., Muckle-Jeffs, E.: History of the zebra mussel/ICAIS conference series. Aquatic Invasions (2010)
Google Scholar
Monfort, M., et al.: Moments in time dataset: one million videos for event understanding. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 502–508 (2019)
Article Google Scholar
Nalepa, T.F., Schloesser, D.W.: Quagga and Zebra Mussels: Biology, Impacts, and Control. CRC Press, Boca Raton (2013)
Book Google Scholar
Nichols, S.J., Black, M.: Identification of larvae: the zebra mussel (dreissena polymorpha), quagga mussel (dreissena rosteriformis bugensis), and Asian clam (corbicula fluminea). Can. J. Zool. 72(3), 406–417 (1994)
Article Google Scholar
Schloesser, D.W., Metcalfe-Smith, J.L., Kovalak, W.P., Longton, G.D., Smithee, R.D.: Extirpation of freshwater mussels (bivalvia: Unionidae) following the invasion of dreissenid mussels in an interconnecting river of the laurentian great lakes. Am. Midl. Nat. 155(2), 307–320 (2006)
Article Google Scholar
Sepulveda, A.J., Amberg, J.J., Hanson, E.: Using environmental DNA to extend the window of early detection for dreissenid mussels. Manag. Biol. Invasions 10(2) (2019)
Google Scholar
Stokstad, E.: Feared quagga mussel turns up in western united states (2007)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Google Scholar
Tatsunami, Y., Taki, M.: Sequencer: deep LSTM for image classification. arXiv preprint arXiv:2205.01972 (2022)
Turner, K., Wong, W.H., Gerstenberger, S., Miller, J.M.: Interagency monitoring action plan (I-MAP) for quagga mussels in lake mead, Nevada-Arizona, USA. Aquat. Invasions 6(2), 195 (2011)
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Xie, S., Sun, C., Huang, J., Tu, Z., Murphy, K.: Rethinking spatiotemporal feature learning: speed-accuracy trade-offs in video classification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 305–321 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Baylor University, Waco, TX, 76706, USA
Shaif Chowdhury, Sadia Nasrin Tisha & Greg Hamerly
Texas Parks and Wildlife Department, Austin, USA
Monica E. McGarrity

Authors

Shaif Chowdhury
View author publications
You can also search for this author in PubMed Google Scholar
Sadia Nasrin Tisha
View author publications
You can also search for this author in PubMed Google Scholar
Monica E. McGarrity
View author publications
You can also search for this author in PubMed Google Scholar
Greg Hamerly
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shaif Chowdhury .

Editor information

Editors and Affiliations

University of Nevada Reno, Reno, NV, USA
George Bebis
Google Research, Mountain View, CA, USA
Golnaz Ghiasi
New York University, New York, USA
Yi Fang
Ben-Gurion University, Be'er Sheva, Israel
Andrei Sharf
Microsoft Research, Beijing, China
Yue Dong
The University of Oklahoma, Norman, OK, USA
Chris Weaver
University of Maryland, Collage Park, MD, USA
Zhicheng Leo
University of Central Florida, Orlando, FL, USA
Joseph J. LaViola Jr.
InnerOptic Technology, Hillsborough, NC, USA
Luv Kohli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chowdhury, S., Tisha, S.N., McGarrity, M.E., Hamerly, G. (2023). Video-Based Recognition of Aquatic Invasive Species Larvae Using Attention-LSTM Transformer. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2023. Lecture Notes in Computer Science, vol 14361. Springer, Cham. https://doi.org/10.1007/978-3-031-47969-4_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-47969-4_18
Published: 01 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47968-7
Online ISBN: 978-3-031-47969-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Video-Based Recognition of Aquatic Invasive Species Larvae Using Attention-LSTM Transformer