Leveraging Motion Priors in Videos for Improving Human Segmentation

  • Yu-Ting ChenEmail author
  • Wen-Yen ChangEmail author
  • Hai-Lun LuEmail author
  • Tingfan WuEmail author
  • Min SunEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11211)


Despite many advances in deep-learning based semantic segmentation, performance drop due to distribution mismatch is often encountered in the real world. Recently, a few domain adaptation and active learning approaches have been proposed to mitigate the performance drop. However, very little attention has been made toward leveraging information in videos which are naturally captured in most camera systems. In this work, we propose to leverage “motion prior” in videos for improving human segmentation in a weakly-supervised active learning setting. By extracting motion information using optical flow in videos, we can extract candidate foreground motion segments (referred to as motion prior) potentially corresponding to human segments. We propose to learn a memory-network-based policy model to select strong candidate segments (referred to as strong motion prior) through reinforcement learning. The selected segments have high precision and are directly used to finetune the model. In a newly collected surveillance camera dataset and a publicly available UrbanStreet dataset, our proposed method improves the performance of human segmentation across multiple scenes and modalities (i.e., RGB to Infrared (IR)). Last but not least, our method is empirically complementary to existing domain adaptation approaches such that additional performance gain is achieved by combining our weakly-supervised active learning approach with domain adaptation approaches.


Active learning Domain adaptation Human segmentation 



We thank Umbo CV, MediaTek, MOST 107-2634-F-007-007 for their support.


  1. 1.
    Settles, B.: Active learning literature survey (2010)Google Scholar
  2. 2.
    Sener, O., Savarese, S.: Active learning for convolutional neural networks: a core-set approach. In: ICLR (2018)Google Scholar
  3. 3.
    Dragon, R., Rosenhahn, B., Ostermann, J.: Multi-scale clustering of frame-to-frame correspondences for motion segmentation. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, pp. 445–458. Springer, Heidelberg (2012). Scholar
  4. 4.
    Ochs, P., Malik, J., Brox, T.: Segmentation of moving objects by long term video analysis. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1187–1200 (2014)CrossRefGoogle Scholar
  5. 5.
    Elhamifar, E., Vidal, R.: Sparse subspace clustering. In: CVPR. IEEE (2009)Google Scholar
  6. 6.
    Yang, M.Y., Ackermann, H., Lin, W., Feng, S., Rosenhahn, B.: Motion segmentation via global and local sparse subspace optimization. arXiv preprint arXiv:1701.06944 (2017)
  7. 7.
    Yan, J., Pollefeys, M.: A general framework for motion segmentation: independent, articulated, rigid, non-rigid, degenerate and non-degenerate. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 94–106. Springer, Heidelberg (2006). Scholar
  8. 8.
    Tsai, Y.H., Yang, M.H., Black, M.J.: Video segmentation via object flow. In: CVPR (2016)Google Scholar
  9. 9.
    Cheng, J., Tsai, Y.H., Wang, S., Yang, M.H.: SegFlow: joint learning for video object segmentation and optical flow. In: ICCV (2017)Google Scholar
  10. 10.
    Nirkin, Y., Masi, I., Tuan, A.T., Hassner, T., Medioni, G.: On face segmentation, face swapping, and face perception. In: Automatic Face & Gesture Recognition IEEE International Conference (2018)Google Scholar
  11. 11.
    Zhao, T., Nevatia, R.: Stochastic human segmentation from a static camera. In: Motion and Video Computing, Workshop (2002)Google Scholar
  12. 12.
    Zhao, T., Nevatia, R.: Bayesian human segmentation in crowded situations. In: CVPR (2003)Google Scholar
  13. 13.
    Spina, T.V., et al.: Video human segmentation using fuzzy object models and its application to body pose estimation of toddlers for behavior studies. arXiv preprint arXiv:1305.6918 (2013)
  14. 14.
    Song, C., Huang, Y., Wang, Z., Wang, L.: 1000 fps human segmentation with deep convolutional neural networks. In: ACPR. IEEE (2015)Google Scholar
  15. 15.
    Guo, L.J., Cheng, T.T., Xiao, B., Zhang, R., Zhao, J.Y.: Video human segmentation based on multiple-cue integration. Signal Process. Image Commun. 30, 166–177 (2015)CrossRefGoogle Scholar
  16. 16.
    Lu, J., Corso, J.J., et al.: Human action segmentation with hierarchical supervoxel consistency. In: CVPR (2015)Google Scholar
  17. 17.
    Tan, Y., Guo, Y., Gao, C.: Background subtraction based level sets for human segmentation in thermal infrared surveillance systems. Infrared Phys. Technol. 61, 230–240 (2013)CrossRefGoogle Scholar
  18. 18.
    He, F., Guo, Y., Gao, C.: Human segmentation of infrared image for mobile robot search. Multimedia Tools and Applications, pp. 1–14 (2017)Google Scholar
  19. 19.
    Yarin Gal, R.I., Ghahramani, Z.: Deep Bayesian active learning with image data. In: ICML (2017)Google Scholar
  20. 20.
    Colwell, S.R., Joshi, A.W.: Multi-item scale development for measuring institutional pressures in the context of corporate environmental action. In: IABS (2009)Google Scholar
  21. 21.
    Brinker, K.: Incorporating diversity in active learning with support vector machines. In: ICML (2003)Google Scholar
  22. 22.
    Ducoffe, M., Precioso, F.: Adversarial active learning for deep networks: a margin based approach. arXiv preprint arXiv:1802.09841 (2018)
  23. 23.
    Li, X., Guo, R., Cheng, J.: Incorporating incremental and active learning for scene classification. In: ICMLA (2012)Google Scholar
  24. 24.
    Elhamifar, E., Sapiro, G., Yang, A., Sasrty, S.S.: A convex optimization framework for active learning. In: ICCV (2013)Google Scholar
  25. 25.
    Yang, Y., Loog, M.: A variance maximization criterion for active learning. arXiv preprint arXiv:1706.07642 (2017)
  26. 26.
    Kading, C., Freytag, A., Rodner, E., Perino, A., Denzler, J.: Large-scale active learning with approximations of expected model output changes. In: GCPR (2016)Google Scholar
  27. 27.
    Kuwadekar, A., Neville, J.: Relational active learning for joint collective classification models. In: ICML (2011)Google Scholar
  28. 28.
    Paul, S., Bappy, J.H., Roy-Chowdhury, A.K.: Non-uniform subset selection for active learning in structured data. In: CVPR (2017)Google Scholar
  29. 29.
    Fang, M., Li, Y., Cohn, T.: Learning how to active learn: a deep reinforcement learning approach. In: EMNLP (2017)Google Scholar
  30. 30.
    Philip Bachman, A.S., Trischler, A.: Learning algorithms for active learning. In: ICML (2017)Google Scholar
  31. 31.
    Tzeng, E., Hoffman, J., Darrell, T., Saenko, K.: Simultaneous deep transfer across domains and tasks. In: ICCV (2015)Google Scholar
  32. 32.
    Long, M., Cao, Y., Wang, J., Jordan, M.: Learning transferable features with deep adaptation networks. In: ICML (2015)Google Scholar
  33. 33.
    Zellinger, W., Grubinger, T., Lughofer, E., Natschläger, T., Saminger-Platz, S.: Central moment discrepancy (CMD) for domain-invariant representation learning. In: ICLR (2017)Google Scholar
  34. 34.
    Goodfellow, I., et al.: Generative adversarial nets. In: NIPS (2014)Google Scholar
  35. 35.
    Liu, M.Y., Tuzel, O.: Coupled generative adversarial networks. In: NIPS (2016)Google Scholar
  36. 36.
    Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: ICML (2015)Google Scholar
  37. 37.
    Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. arXiv preprint arXiv:1702.05464 (2017)
  38. 38.
    Bousmalis, K., Trigeorgis, G., Silberman, N., Krishnan, D., Erhan, D.: Domain separation networks. In: NIPS (2016)Google Scholar
  39. 39.
    Hoffman, J., Wang, D., Yu, F., Darrell, T.: FCNs in the wild: pixel-level adversarial and constraint-based adaptation. arXiv preprint arXiv:1612.02649 (2016)
  40. 40.
    Chen, Y.H., Chen, W.Y., Chen, Y.T., Tsai, B.C., Wang, Y.C.F., Sun, M.: No more discrimination: cross city adaptation of road scene segmenters. In: ICCV (2017)Google Scholar
  41. 41.
    Zhang, Y., David, P., Gong, B.: Curriculum domain adaptation for semantic segmentation of urban scenes. In: ICCV (2017)Google Scholar
  42. 42.
    Sankaranarayanan, S., Balaji, Y., Jain, A., Lim, S.N., Chellappa, R.: Unsupervised domain adaptation for semantic segmentation with GANs. arXiv preprint arXiv:1711.06969 (2017)
  43. 43.
    Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: FlowNet 2.0: evolution of optical flow estimation with deep networks. arXiv preprint arXiv:1612.01925 (2016)
  44. 44.
    Brox, T., Malik, J.: Large displacement optical flow: descriptor matching in variational motion estimation. TPAMI 33(3), 500–513 (2011)CrossRefGoogle Scholar
  45. 45.
    Weston, J., Chopra, S., Bordes, A.: Memory networks. In: ICLR (2015)Google Scholar
  46. 46.
    Oh, J., Chockalingam, V., Singh, S., Lee, H.: Control of memory, active perception, and action in minecraft. In: ICML (2016)Google Scholar
  47. 47.
    Fragkiadaki, K., Zhang, W., Zhang, G., Shi, J.: Two-granularity tracking: mediating trajectory and detection graphs for tracking under occlusions. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 552–565. Springer, Heidelberg (2012). Scholar
  48. 48.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). Scholar
  49. 49.
    Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. IJCV 111(1), 98–136 (2015)CrossRefGoogle Scholar
  50. 50.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)Google Scholar
  51. 51.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.National Tsing Hua UniversityTaiwanChina
  2. 2.Umbo Computer VisionTaiwanChina

Personalised recommendations