Abstract
Aquatic species like zebra and quagga mussels are invasive in United States waterways and cause ecological and economic damage. Due to the time-consuming nature of conventional early detection methods, there is a need for automated systems to detect and classify invasive and non-invasive species using a video-based system without any human supervision. We present a video classification model for rapidly recognizing invasive and non-invasive mussel larvae from plankton or water sample videos.
Many recent video recognition models are transformer-based and use a combination of spatial and temporal attention, often with large-scale pre-training. We present a model with a CNN-based patch encoder and transformer blocks consisting of temporal attention with LSTM that is end-to-end trainable and effective without pre-training. Based on detailed experiments, the Attention-LSTM model significantly improves over state-of-the-art video classification models, classifying invasive and non-invasive larvae with \(99\%\) balanced accuracy. Our code is available at https://anonymous.4open.science/r/AttLSTM-10CF/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M., Schmid, C.: Vivit: a video vision transformer. In: CVPR, pp. 6836–6846 (2021)
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Bertasius, G., Wang, H., Torresani, L.: Is space-time attention all you need for video understanding? In: ICML, vol. 2, p. 4 (2021)
Chen, J., et al.: Transunet: transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
Chen, Z., et al.: Vision transformer adapter for dense predictions. arXiv preprint arXiv:2205.08534 (2022)
Chowdhury, S., Hamerly, G.: Recognition of aquatic invasive species larvae using autoencoder-based feature averaging. In: Bebis, G., et al. (eds.) ISVC 2022. LNCS, vol. 13598, pp. 145–161. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20713-6_11
Churchill, C.J., Baldys, S.: USGS zebra mussel monitoring program for north Texas. US Department of the Interior, US Geological Survey (2012)
Connelly, N.A., ONeill, C.R., Knuth, B.A., Brown, T.L.: Economic impacts of zebra mussels on drinking water treatment and electric power generation facilities. Environ. Manag. 40(1), 105–112 (2007)
Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: CVPR, pp. 2625–2634 (2015)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: ICLR (2020)
Gao, Z., Tan, C., Wu, L., Li, S.Z.: Simvp: simpler yet better video prediction. In: CVPR, pp. 3170–3180 (2022)
Guo, M., et al.: Longt5: efficient text-to-text transformer for long sequences. arXiv preprint arXiv:2112.07916 (2021)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Jiang, Z., Zhao, C., Wang, H.: Classification of underwater target based on S-ResNet and modified DCGAN models. Sensors 22(6), 2293 (2022)
Johnson, L.E.: Enhanced early detection and enumeration of zebra mussel (dreissena spp.) veligers using cross-polarized light microscopy. Hydrobiologia 312, 139–146 (1995)
Kay, W., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)
Khan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S., Shah, M.: Transformers in vision: a survey. ACM Comput. Surv. (CSUR) 54(10s), 1–41 (2022)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: CVPR, pp. 10012–10022 (2021)
Lucy, F., Muckle-Jeffs, E.: History of the zebra mussel/ICAIS conference series. Aquatic Invasions (2010)
Monfort, M., et al.: Moments in time dataset: one million videos for event understanding. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 502–508 (2019)
Nalepa, T.F., Schloesser, D.W.: Quagga and Zebra Mussels: Biology, Impacts, and Control. CRC Press, Boca Raton (2013)
Nichols, S.J., Black, M.: Identification of larvae: the zebra mussel (dreissena polymorpha), quagga mussel (dreissena rosteriformis bugensis), and Asian clam (corbicula fluminea). Can. J. Zool. 72(3), 406–417 (1994)
Schloesser, D.W., Metcalfe-Smith, J.L., Kovalak, W.P., Longton, G.D., Smithee, R.D.: Extirpation of freshwater mussels (bivalvia: Unionidae) following the invasion of dreissenid mussels in an interconnecting river of the laurentian great lakes. Am. Midl. Nat. 155(2), 307–320 (2006)
Sepulveda, A.J., Amberg, J.J., Hanson, E.: Using environmental DNA to extend the window of early detection for dreissenid mussels. Manag. Biol. Invasions 10(2) (2019)
Stokstad, E.: Feared quagga mussel turns up in western united states (2007)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Tatsunami, Y., Taki, M.: Sequencer: deep LSTM for image classification. arXiv preprint arXiv:2205.01972 (2022)
Turner, K., Wong, W.H., Gerstenberger, S., Miller, J.M.: Interagency monitoring action plan (I-MAP) for quagga mussels in lake mead, Nevada-Arizona, USA. Aquat. Invasions 6(2), 195 (2011)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Xie, S., Sun, C., Huang, J., Tu, Z., Murphy, K.: Rethinking spatiotemporal feature learning: speed-accuracy trade-offs in video classification. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 305–321 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chowdhury, S., Tisha, S.N., McGarrity, M.E., Hamerly, G. (2023). Video-Based Recognition of Aquatic Invasive Species Larvae Using Attention-LSTM Transformer. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2023. Lecture Notes in Computer Science, vol 14361. Springer, Cham. https://doi.org/10.1007/978-3-031-47969-4_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-47969-4_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47968-7
Online ISBN: 978-3-031-47969-4
eBook Packages: Computer ScienceComputer Science (R0)