WaterNet: An adaptive matching pipeline for segmenting water with volatile appearance

Abstract

We develop a novel network to segment water with significant appearance variation in videos. Unlike existing state-of-the-art video segmentation approaches that use a pre-trained feature recognition network and several previous frames to guide segmentation, we accommodate the object’s appearance variation by considering features observed from the current frame. When dealing with segmentation of objects such as water, whose appearance is non-uniform and changing dynamically, our pipeline can produce more reliable and accurate segmentation results than existing algorithms.

References

  1. [1]

    Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431–3440, 2015.

    Google Scholar 

  2. [2]

    Caelles, S.; Maninis, K. K.; Pont-Tuset, J.; Leal-Taixé, L.; Cremers, D.; Van Gool, L. One-shot video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 221–230, 2017.

    Google Scholar 

  3. [3]

    Perazzi, F.; Khoreva, A.; Benenson, R.; Schiele, B.; Sorkine-Hornung, A. Learning video object segmentation from static images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2663–2672, 2017.

    Google Scholar 

  4. [4]

    Khoreva, A.; Benenson, R.; Ilg, E.; Brox, T.; Schiele, B. Lucid data dreaming for multiple object tracking. arXiv preprint arXiv:1703.09554, 2017.

  5. [5]

    Voigtlaender, P.; Leibe, B. Online adaptation of convolutional neural networks for video object segmentation. In: Proceedings of the British Machine Vision Conference, 116.1–116.13, 2017.

    Google Scholar 

  6. [6]

    Hu, Y.-T.; Huang, J.-B.; Schwing, A. G. MaskRNN: Instance level video object segmentation. In: Proceedings of the 31st Conference on Neural Information Processing Systems, 325–334, 2017.

    Google Scholar 

  7. [7]

    Luiten, J.; Voigtlaender, P.; Leibe, B. PReMVOS: Proposal-generation, refinement and merging for video object segmentation. In: Computer Vision - ACCV 2018. Lecture Notes in Computer Science, Vol. 11364. Jawahar, C.; Li, H.; Mori, G.; Schindler, K. Eds. Springer Cham, 565–580, 2018.

    Google Scholar 

  8. [8]

    Cheng, J. C.; Tsai, Y. H.; Hung, W. C.; Wang, S. J.; Yang, M. H. Fast and accurate online video object segmentation via tracking parts. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7415–7424, 2018.

    Google Scholar 

  9. [9]

    Maninis, K. K.; Caelles, S.; Chen, Y.; Pont-Tuset, J.; Leal-Taixe, L.; Cremers, D.; Van Gool, L. Video object segmentation without temporal information. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 41, No. 6, 1515–1530, 2019.

    Article  Google Scholar 

  10. [10]

    Yang, L. J.; Wang, Y. R.; Xiong, X. H.; Yang, J. C.; Katsaggelos, A. K. Efficient video object segmentation via network modulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6499–6507, 2018.

    Google Scholar 

  11. [11]

    Li, X. X.; Loy, C. C. Video object segmentation with joint re-identification and attention-aware mask propagation. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11207. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 93–110, 2018.

    Google Scholar 

  12. [12]

    Yoon, J. S.; Rameau, F.; Kim, J.; Lee, S.; Shin, S.; Kweon, I. S. Pixel-level matching for video object segmentation using convolutional neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2186–2195, 2017.

    Google Scholar 

  13. [13]

    Chen, Y. H.; Pont-Tuset, J.; Montes, A.; Van Gool, L. Blazingly fast video object segmentation with pixel-wise metric learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1189–1198, 2018.

    Google Scholar 

  14. [14]

    Hu, Y. T.; Huang, J. B.; Schwing, A. G. VideoMatch: Matching based video object segmentation. In: Computer Vision - ECCV 2018. Lecture Notes in Computer Science, Vol. 11212. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 56–73, 2018.

    Google Scholar 

  15. [15]

    Voigtlaender, P.; Chai, Y. N.; Schroff, F.; Adam, H.; Leibe, B.; Chen, L. C. FEELVOS: Fast end-to-end embedding learning for video object segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9473–9482, 2019.

    Google Scholar 

  16. [16]

    Oh, S. W.; Lee, J. Y.; Sunkavalli, K.; Kim, S. J. Fast video object segmentation by reference-guided mask propagation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7376–7385, 2018.

    Google Scholar 

  17. [17]

    He, K. M.; Zhang, X. Y.; Ren, S. Q.; Sun, J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778, 2016.

    Google Scholar 

  18. [18]

    Zhou, B. L.; Zhao, H.; Puig, X.; Fidler, S.; Barriuso, A.; Torralba, A. Scene parsing through ADE20K dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5122–5130, 2017.

    Google Scholar 

  19. [19]

    Lopez-Fuentes, L.; Rossi, C.; Skinnemoen, H. River segmentation for flood monitoring. In: Proceedings of the IEEE International Conference on Big Data, 3746–3749, 2017.

    Google Scholar 

  20. [20]

    Farson digital watercams. https://www.farsondigitalwatercams.com/. Accessed: 2019-09-30.

  21. [21]

    Pont-Tuset, J.; Perazzi, F.; Caelles, S.; Arbeláez, P.; Sorkine-Hornung, A.; Van Gool, L. The 2017 davis challenge on video object segmentation. arXiv preprint arXiv:1704.00675, 2017.

  22. [22]

    Caelles, S.; Pont-Tuset, J.; Perazzi, F.; Montes, A.; Maninis, K. K.; Van Gool, L. The 2019 davis challenge on vos: Unsupervised multi-object segmentation. arXiv preprint arXiv:1905.00737, 2019.

  23. [23]

    Perazzi, F.; Pont-Tuset, J.; McWilliams, B.; Van Gool, L.; Gross, M.; Sorkine-Hornung, A. A benchmark dataset and evaluation methodology for video object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 724–732, 2016.

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Science Foundation under Grant EAR 1760582, and the Louisiana Board of Regents ITRS LEQSF(2018–21)-RD-B-03. We would like to express our appreciation to anonymous reviewers whose comments helped improve and clarify this manuscript.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Xin Li.

Additional information

Yongqing Liang received his B.S. degree in computer science from Fudan University, China, in 2017. He is currently a Ph.D. student in the School of Electrical Engineering and Computer Science, Louisiana State University, USA. His research interests include visual data understanding, computer vision, and computer graphics.ng tools for the analysis of massive volumetric images. He specialises in high performance computing on clusters and GPUs.

Navid Jafari received his B.S. degree in civil engineering from the University of Memphis in 2010. He received his M.S. and Ph.D. degrees in 2011 and 2015, respectively, from the University of Illinois at Urbana-Champaign in the Department of Civil & Environmental Engineering. He is currently an assistant professor at Louisiana State University in the Department of Civil & Environmental Engineering, where his research is focused at the intersection of geotechnical and coastal engineering with natural hazards. He is specifically focused on the performance of natural infrastructure, natural and man-made slopes, and flood protection infrastructure during hurricanes.

Xing Luo majored in mechanical engineering, receiving his B.E. degree from the University of Science and Technology Beijing in 2018. He is currently pursuing a Ph.D. degree in the Institute of Manufacturing Technology and Automation, Zhejiang University. His research interests include multimodal image processing and analysis.

Qin Chen is a professor of Civil & Environmental Engineering and Marine & Environmental Sciences at Northeastern University. He specializes in the development and application of numerical models for coastal dynamics, including ocean waves, storm surges, nearshore circulation, fluidvegetation interaction, and sediment transport and morphodynamics. His research includes field experiments and application of remote sensing and high-performance computing technologies to solve engineering problems. He leads the Coastal Resilience Collaboratory funded by the NSF CyberSEES award.

Yanpeng Cao is a research fellow in the School of Mechanical Engineering, Zhejiang University, China. He graduated with M.Sc. degree in control engineering (2005) and Ph.D. degree in computer vision (2008), both from the University of Manchester, UK. He worked in a number of R&D institutes such as Institute for Infocomm Research (Singapore), Mtech Imaging Ptd Ltd (Singapore), and National University of Ireland Maynooth (Ireland). His major research interests include infrared imaging, sensor fusion, image processing, and 3D reconstruction.

Xin Li received his B.S. degree in computer science from the University of Science and Technology of China in 2003, and his M.S. and Ph.D. degrees in computer science from Stony Brook University (SUNY) in 2008. He is currently an associate professor with the School of Electrical Engineering and Computer Science and the Center for Computation and Technology, Louisiana State University, USA. He leads the Geometric and Visual Computing Laboratory at LSU. His research interests include geometric and visual data processing and analysis, computer graphics, and computer vision.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecomm-ons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Liang, Y., Jafari, N., Luo, X. et al. WaterNet: An adaptive matching pipeline for segmenting water with volatile appearance. Comp. Visual Media 6, 65–78 (2020). https://doi.org/10.1007/s41095-020-0156-x

Download citation

Keywords

  • video segmentation
  • water segmentation
  • appearance adaptation