Advertisement

Realtime Human Segmentation in Video

  • Tairan ZhangEmail author
  • Congyan Lang
  • Junliang Xing
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11296)

Abstract

Human segmentation from a single image using deep learning models has obtained significant performance improvements. However, when directly adopting a deep human segmentation model on video human segmentation, the performance is unsatisfactory due to some issues, e.g., the segmentation results of video frames are discontinuous, and the speed of segmentation process is slow. To address these issues, we propose a new real-time video-based human segmentation framework which is designed for the single person from videos to produces smoothing, accurate and fast human segmentation results. The proposed framework for video human segmentation consists of a fully convolutional network and a tracking module based on a level set algorithm, where the fully convolutional network segments the human part in the first frame of the video sequence, and the tracking module obtains the segmentation results of other frames using the segmentation result of the last frame as the initial segmentation. The fully convolutional network is trained using human images datasets. To evaluate the proposed framework for video human segmentation, we have created and annotated a new single person video dataset. The experimental results demonstrate very accurate and smoothing human segmentation with very higher speed only using a deep human segmentation model.

Keywords

Human segmentation Video segmentation Deep learning Level set 

References

  1. 1.
    Bi, S., Liang, D.: Human segmentation in a complex situation based on properties of the human visual system. In: 2006 6th World Congress on Intelligent Control and Automation, vol. 2, pp. 9587–9590 (2006)Google Scholar
  2. 2.
    Chopp, D.L.: Computing minimal surfaces via level set curvature flow. J. Comput. Phys. 106, 77–91 (1993)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Dai, J., He, K., Sun, J.: Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: 2015 IEEE International Conference on Computer Vision (ICCV). pp. 1635–1643, December 2015Google Scholar
  4. 4.
    Gu, D., Zhao, Y., Yuan, Y., Hu, G.: Human segmentation based on disparity map and grabcut. In: 2012 International Conference on Computer Vision in Remote Sensing, pp. 67–71, December 2012Google Scholar
  5. 5.
    Heo, S., Koo, H.I., Kim, H.I., Cho, N.I.: Human segmentation algorithm for real-time video-call applications. In: 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, pp. 1–4, October 2013Google Scholar
  6. 6.
    Hernandez-Vela, A., et al.: Graph cuts optimization for multi-limb human segmentation in depth maps. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 726–732, June 2012Google Scholar
  7. 7.
    Jia, Y., et al.: Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia. MM 2014, pp. 675–678. ACM, New York (2014)Google Scholar
  8. 8.
    Junior, J.C.S.J., Jung, C.R., Musse, S.R.: Skeleton-based human segmentation in still images. In: 2012 19th IEEE International Conference on Image Processing, pp. 141–144, September 2012Google Scholar
  9. 9.
    Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1867–1874, June 2014Google Scholar
  10. 10.
    Kim, K., Oh, C., Sohn, K.: Non-parametric human segmentation using support vector machine. In: 2016 IEEE International Conference on Consumer Electronics (ICCE), pp. 131–132, January 2016Google Scholar
  11. 11.
    Kim, Y.S., Yoon, J.C., Lee, I.K.: Real-time human segmentation from RGB-d video sequence based on adaptive geodesic distance computation. In: Multimedia Tools and Applications, November 2017Google Scholar
  12. 12.
    King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)Google Scholar
  13. 13.
    Kohli, P., Rihan, J., Bray, M., Torr, P.H.: Simultaneous segmentation and pose estimation of humans using dynamic graph cuts. Int. J. Comput. Vision 79(3), 285–298 (2008)CrossRefGoogle Scholar
  14. 14.
    Kumar, R., Kumar, R., Gopalakrishnan, V., Iyer, K.N.: Fast human segmentation using color and depth. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1922–1926, March 2017Google Scholar
  15. 15.
    Lee, Y.T., Su, T.F., Su, H.R., Lai, S.H., Lee, T.C., Shih, M.Y.: Human segmentation from video by combining random walks with human shape prior adaption. In: 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, pp. 1–4, October 2013Google Scholar
  16. 16.
    Li, C., Xu, C., Gui, C., Fox, M.D.: Level set evolution without re-initialization: a new variational formulation. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005) (CVPR), vol. 01, pp. 430–436, June 2005Google Scholar
  17. 17.
    Li, J., et al.: Multiple-Human Parsing in the Wild. ArXiv e-prints, May 2017Google Scholar
  18. 18.
    Liang, X., et al.: Human parsing with contextualized convolutional neural network. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1386–1394, December 2015Google Scholar
  19. 19.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440, June 2015Google Scholar
  20. 20.
    Mostajabi, M., Yadollahpour, P., Shakhnarovich, G.: Feedforward semantic segmentation with zoom-out features. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 00, pp. 3376–3385, June 2015Google Scholar
  21. 21.
    Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: 2015 IEEE International Conference on Computer Vision (ICCV), vol. 00, pp. 1520–1528, December 2015Google Scholar
  22. 22.
    Osher, S., Sethian, J.A.: Fronts propagating with curvature-dependent speed: algorithms based on hamilton-jacobi formulations. J. Comput. Phys. 79, 12–49 (1988)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Park, S., Yoo, J.H.: Human segmentation based on grabcut in real-time video sequences. In: 2014 IEEE International Conference on Consumer Electronics (ICCE), pp. 111–112, January 2014Google Scholar
  24. 24.
    Ramadan, H., Tairi, H.: Automatic human segmentation in video using convex active contours. In: 2016 13th International Conference on Computer Graphics, Imaging and Visualization (CGiV), pp. 184–189, March 2016Google Scholar
  25. 25.
    Shen, X., et al.: Automatic portrait segmentation for image stylization. In: Proceedings of the 37th Annual Conference of the European Association for Computer Graphics (2016)Google Scholar
  26. 26.
    Shi, Y., Karl, W.C.: Real-time tracking using level sets. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 2, pp. 34–41, June 2005Google Scholar
  27. 27.
    Song, C., Huang, Y., Wang, Z., Wang, L.: 1000fps human segmentation with deep convolutional neural networks. In: Pattern Recognition, pp. 474–478 (2016)Google Scholar
  28. 28.
    Tan, Y., Guo, Y., Gao, C.: Background subtraction based level sets for human segmentation in thermal infrared surveillance systems. Infrared Phys. Technol. 61(5), 230–240 (2013)CrossRefGoogle Scholar
  29. 29.
    Wu, X., Du, M., Chen, W., Li, Z.: Exploiting deep convolutional network and patch-level CRFs for indoor semantic segmentation. In: 2016 IEEE 11th Conference on Industrial Electronics and Applications (ICIEA), pp. 150–155, June 2016Google Scholar
  30. 30.
    Wu, Z., Huang, Y., Yu, Y., Wang, L., Tan, T.: Early Hierarchical Contexts Learned by Convolutional Networks for Image Segmentation. In: Proceedings of the 22nd International Conference on Pattern Recognition, pp. 1538–1543. IEEE (2014)Google Scholar
  31. 31.
    Zhao, T., Nevatia, R.: Stochastic human segmentation from a static camera. In: Proceedings of the Workshop on Motion and Video Computing, pp. 9–14, December 2002Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyBeijing Jiaotong UniversityBeijingPeople’s Republic of China
  2. 2.National Laboratory of Pattern RecognitionInstitute of Automation, Chinese Academy of SciencesBeijingPeople’s Republic of China

Personalised recommendations