Abstract
Motion estimation is an important approach to acquiring motion information of all targets in satellite video while it provides the ability to real-time monitor the Earth observation region. Compared with the case in computer vision, motion estimation in satellite video has to face two main difficulties: the large scale of observation and numerous weak targets of low signal-to-noise ratio. In this paper, a multi-frame sparse self-learning PWC-Net (MSSPWC-Net) is proposed to implement motion estimation of the weak targets in satellite video. To overcome the shortage that the existing PWC-Net fails to extract motion information from numerous weak targets, motion consistency and sparse self-learning are introduced to modify the pyramid, warping, and cost volume convolutional neural networks (CNN) network (PWC-Net). The motion consistency between neighboring frames as a multi-frame framework is mainly used to improve the accuracy of motion estimation of the weak targets, and sparse self-learning is adopted to deal with the case that labeled samples in satellite video are insufficient to train PWC-Net. Numerical experiments are conducted on 4 real satellite video datasets. Experimental results demonstrate that the proposed MSSPWC-Net achieves the excellent performance of motion estimation of the weak targets in satellite video and outperforms the state-of-the-art methods.
Similar content being viewed by others
References
Stiller C. Object-based estimation of dense motion fields. IEEE Trans Image Process, 1997, 6: 234–250
Oh T H, Jaroensri R, Kim C, et al. Learning-based video motion magnification. In: Proceedings of European Conference on Computer Vision. Cham: Springer, 2018
Zhu J S, Sun K, Jia S, et al. Urban traffic density estimation based on ultrahigh-resolution UAV video and deep neural network. IEEE J Sel Top Appl Earth Observations Remote Sens, 2018, 11: 4968–4981
Gupta M, Baireddy S, Comer M L, et al. Small target detection using optical flow. In: Proceedings of IEEE Aerospace Conference, 2021. 1–9
Du B, Cai S H, Wu C. Object tracking in satellite videos based on a multiframe optical flow tracker. IEEE J Sel Top Appl Earth Observations Remote Sens, 2019, 12: 3043–3055
Xuan S Y, Li S Y, Zhao Z F, et al. Rotation adaptive correlation filter for moving object tracking in satellite videos. Neurocomputing, 2021, 438: 94–106
Xuan S Y, Li S Y, Han M F, et al. Object tracking in satellite videos by improved correlation filters with motion estimations. IEEE Trans Geosci Remote Sens, 2020, 58: 1074–1086
Memin E, Perez P. Dense estimation and object-based segmentation of the optical flow with robust techniques. IEEE Trans Image Process, 1998, 7: 703–719
Liu H, Gu Y F, Wang T F, et al. Satellite video super-resolution based on adaptively spatiotemporal neighbors and nonlocal similarity regularization. IEEE Trans Geosci Remote Sens, 2020, 58: 8372–8383
Tanaka M, Yaguchi Y, Okutomi M. Robust and accurate estimation of multiple motions for whole-image super-resolution. In: Proceedings of IEEE International Conference on Image Processing, San Diego, 2008. 649–652
Dai W, Chen Y M, Huang C, et al. Two-stream convolution neural network with video-stream for action recognition. In: Proceedings of International Joint Conference on Neural Networks, Budapest, 2019. 1–8
Jin P, Mou L C, Hua Y S, et al. FuTH-Net: fusing temporal relations and holistic features for aerial video classification. IEEE Trans Geosci Remote Sens, 2022, 60: 1–13
Yin Z Y, Tang Y Q. Analysis of traffic flow in urban area for satellite video. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Seattle, 2020. 2898–2901
Bao L C, Yang Q L, Jin H L. Fast edge-preserving PatchMatch for large displacement optical flow. IEEE Trans Image Process, 2014, 23: 4996–5006
Sun D Q, Yang X D, Liu M Y, et al. Models matter, so does training: an empirical study of CNNs for optical flow estimation. IEEE Trans Pattern Anal Mach Intell, 2020, 42: 1408–1423
Revaud J, Weinzaepfel P, Harchaoui Z, et al. EpicFlow: Edge-preserving interpolation of correspondences for optical flow. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 1164–1172
Sun D, Roth S, Lewis J P, et al. Learning optical flow. In: Proceedings of European Conference on Computer Vision. Berlin: Springer, 2008. 83–97
Liu P P, King I, Lyu M R, et al. DDFlow: learning optical flow with unlabeled data distillation. In: Proceedings of American Association for Artificial Intelligence, Hawaii, 2019
Dosovitskiy A, Fischer P, Ilg E, et al. FlowNet: learning optical flow with convolutional networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 2758–2766
Ilg E, Mayer N, Saikia T, et al. FlowNet2.0: evolution of optical flow estimation with deep networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 1647–1655
Sun D Q, Yang X D, Liu M Y, et al. PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018. 8934–8943
Lim J H, Choi H, Park J C, et al. Learning spatio-temporally invariant representations from video. In: Proceedings of International Joint Conference on Neural Networks, Brisbane, 2012. 1–6
Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell, 2017, 39: 640–651
Bai M, Luo W, Kundu K, et al. Exploiting semantic information and deep matching for optical flow. In: Proceedings of European Conference on Computer Vision. Cham: Springer, 2016
Jia X, Ranftl R, Koltun V. Accurate optical flow via direct cost volume processing. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, 2017. 5807–5815
Žbontar J, LeCun Y. Computing the stereo matching cost with a convolutional neural network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 1592v1599
Hosni A, Rhemann C, Bleyer M, et al. Fast cost-volume filtering for visual correspondence and beyond. IEEE Trans Pattern Anal Mach Intell, 2013, 35: 504–511
Ren Z Z, Lee Y J. Cross-domain self-supervised multi-task feature learning using synthetic imagery. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 762–771
Wang X L, He K M, Gupta A. Transitive invariance for self-supervised visual representation learning. In: Proceedings of IEEE International Conference on Computer Vision, Salt Lake City, 2017. 1338–1347
Doersch C, Zisserman A. Multi-task self-supervised visual learning. In: Proceedings of IEEE International Conference on Computer Vision, Venice, 2017. 2070–2079
Liu P P, Lyu M, King I, et al. SelFlow: self-supervised learning of optical flow. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 4566–4575
Meister S, Hur J, Roth S. UnFlow: unsupervised learning of optical flow with a bidirectional census loss. In: Proceedings of American Association for Artificial, New Orleans, 2018
Brox T, Bruhn A, Papenberg N. High accuracy optical flow estimation based on a theory for warping. In: Proceedings of European Conference on Computer Vision, Berlin, 2004
Dosovitskiy A, Fischer P, Springenberg J T, et al. Discriminative unsupervised feature learning with exemplar convolutional neural networks. IEEE Trans Pattern Anal Mach Intell, 2016, 38: 1734–1747
Jia S, Jiang S G, Lin Z J, et al. A semisupervised siamese network for hyperspectral image classification. IEEE Trans Geosci Remote Sens, 2022, 60: 1–17
Liu L, Hong D F, Ni L, et al. Multilayer cascade screening strategy for semi-supervised change detection in hyperspectral images. IEEE J Sel Top Appl Earth Observations Remote Sens, 2022, 15: 1926–1940
Bergen J R, Burt P J, Hingorani R, et al. A three-frame algorithm for estimating two-component image motion. IEEE Trans Pattern Anal Machine Intell, 1992, 14: 886–896
Zeiler M D, Krishnan D, Taylor G W, et al. Deconvolutional networks. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, 2010. 2528–2535
Deng J, Dong W, Socher R, et al. ImageNet: a large-scale hierarchical image database. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Miami, 2009. 248–255
Horn B K P, Schunck B G. Determining optical flow. Artif Intelligence, 1981, 17: 185–203
Wang X L, Jabri A, Efros A A. Learning correspondence from the cycle-consistency of time. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 2561–2571
Gao L R, Han Z, Hong D F, et al. CyCU-Net: cycle-consistency unmixing network by learning cascaded autoencoders. IEEE Trans Geosci Remote Sens, 2022, 60: 1–14
Dwibedi D, Aytar Y, Tompson J, et al. Temporal cycle-consistency learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, 2019. 1801–1810
Baker S, Roth S, Scharstein D, et al. A database and evaluation methodology for optical flow. In: Proceedings of IEEE International Conference on Computer Vision, Rio de Janeiro, 2007. 1–8
Acknowledgements
This work was supported in part by National Natural Science Foundation of Key International Cooperation (Grant No. 61720106002) and National Natural Science Foundation for Outstanding Scholars (Grant No. 62025107).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, T., Gu, Y. & Li, S. A multi-frame sparse self-learning PWC-Net for motion estimation in satellite video scenes. Sci. China Inf. Sci. 66, 192301 (2023). https://doi.org/10.1007/s11432-022-3634-x
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-022-3634-x