Advertisement

A Semi-supervised Data Augmentation Approach Using 3D Graphical Engines

  • Shuangjun Liu
  • Sarah OstadabbasEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11130)

Abstract

Deep learning approaches have been rapidly adopted across a wide range of fields because of their accuracy and flexibility, but require large labeled training datasets. This presents a fundamental problem for applications with limited, expensive, or private data (i.e. small data), such as human pose and behavior estimation/tracking which could be highly personalized. In this paper, we present a semi-supervised data augmentation approach that can synthesize large scale labeled training datasets using 3D graphical engines based on a physically-valid low dimensional pose descriptor. To evaluate the performance of our synthesized datasets in training deep learning-based models, we generated a large synthetic human pose dataset, called ScanAva using 3D scans of only 7 individuals based on our proposed augmentation approach. A state-of-the-art human pose estimation deep learning model then was trained from scratch using our ScanAva dataset and could achieve the pose estimation accuracy of 91.2% at PCK0.5 criteria after applying an efficient domain adaptation on the synthetic images, in which its pose estimation accuracy was comparable to the same model trained on large scale pose data from real humans such as MPII dataset and much higher than the model trained on other synthetic human dataset such as SURREAL.

Keywords

Data augmentation Deep learning Domain adaptation Human pose estimation Low dimensional subspace learning 

Notes

Acknowledgement

This research was supported by NSF grant #1755695. The authors would also like to thank Naveen Sehgal who actively participated in the ScanAva dataset formation procedure at the Augmented Cognition Lab (ACLab) in the Electrical and Computer Engineering Department at Northeastern University.

References

  1. 1.
    MeshLab. http://www.meshlab.net/. Accessed 2018
  2. 2.
    CMU graphics lab motion capture database (2018). http://mocap.cs.cmu.edu/
  3. 3.
    Skanect 3D Scanning Software By Occipital. http://skanect.occipital.com/. Accessed 2018
  4. 4.
    Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, June 2014Google Scholar
  5. 5.
    Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: SCAPE: shape completion and animation of people. ACM Trans. Graph. 24(3), 408–416 (2005)CrossRefGoogle Scholar
  6. 6.
    Aubry, M., Maturana, D., Efros, A.A., Russell, B.C., Sivic, J.: Seeing 3D chairs: exemplar part-based 2D-3D alignment using a large dataset of CAD models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3762–3769 (2014)Google Scholar
  7. 7.
    Bengio, Y.: Deep learning of representations for unsupervised and transfer learning. In: Proceedings of ICML Workshop on Unsupervised and Transfer Learning, pp. 17–36 (2012)Google Scholar
  8. 8.
    Bengio, Y., et al.: Deep learners benefit more from out-of-distribution examples. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 164–172 (2011)Google Scholar
  9. 9.
    Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Lechevallier, Y., Saporta, G. (eds.) Proceedings of COMPSTAT, pp. 177–186. Physica-Verlag, Heidelberg (2010).  https://doi.org/10.1007/978-3-7908-2604-3_16CrossRefGoogle Scholar
  10. 10.
    Caruana, R.: Learning many related tasks at the same time with backpropagation. In: Advances in Neural Information Processing Systems, pp. 657–664 (1995)Google Scholar
  11. 11.
    Chen, W., et al.: Synthesizing training images for boosting human 3D pose estimation. In: 2016 Fourth International Conference on 3D Vision, 3DV, pp. 479–488 (2016)Google Scholar
  12. 12.
    Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2016)Google Scholar
  13. 13.
    Craig, J.J.: Introduction to Robotics: Mechanics and Control, vol. 3. Pearson Prentice Hall, Upper Saddle River (2005)Google Scholar
  14. 14.
    Dosovitskiy, A., et al.: FlowNet: learning optical flow with convolutional networks. In: IEEE International Conference on Computer Vision, pp. 2758–2766 (2015)Google Scholar
  15. 15.
    Du, Y., et al.: Marker-less 3D human motion capture with monocular image sequence and height-maps. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 20–36. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_2CrossRefGoogle Scholar
  16. 16.
    Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRefGoogle Scholar
  17. 17.
    Gaidon, A., Wang, Q., Cabon, Y., Vig, E.: Virtual worlds as proxy for multi-object tracking analysis. arXiv preprint arXiv:1605.06457 (2016)
  18. 18.
    Ghezelghieh, M.F., Kasturi, R., Sarkar, S.: Learning camera viewpoint using CNN to improve 3D body pose estimation. In: 2016 Fourth International Conference on 3D Vision, 3DV, pp. 685–693 (2016)Google Scholar
  19. 19.
    Gong, B., Shi, Y., Sha, F., Grauman, K.: Geodesic flow kernel for unsupervised domain adaptation. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 2066–2073 (2012)Google Scholar
  20. 20.
    Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: Proceedings of the British Machine Vision Conference (2010).  https://doi.org/10.5244/C.24.12
  21. 21.
    Kajita, S., Hirukawa, H., Harada, K., Yokoi, K.: Introduction to Humanoid Robotics, vol. 101. Springer, Heidelberg (2014).  https://doi.org/10.1007/978-3-642-54536-8CrossRefGoogle Scholar
  22. 22.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  23. 23.
    Lee, C.Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: Artificial Intelligence and Statistics, pp. 562–570 (2015)Google Scholar
  24. 24.
    Liebelt, J., Schmid, C.: Multi-view object class detection with a 3D geometric model. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1688–1695 (2010)Google Scholar
  25. 25.
    Liu, S., Yin, Y., Ostadabbas, S.: In-bed pose estimation: deep learning with shallow dataset. arXiv preprint arXiv:1711.01005 (2018)
  26. 26.
    Marin, J., Vázquez, D., Gerónimo, D., López, A.M.: Learning appearance in virtual scenarios for pedestrian detection. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 137–144 (2010)Google Scholar
  27. 27.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_29CrossRefGoogle Scholar
  28. 28.
    Okada, R., Soatto, S.: Relevant feature selection for human pose estimation and localization in cluttered images. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5303, pp. 434–445. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-88688-4_32CrossRefGoogle Scholar
  29. 29.
    Pishchulin, L., Jain, A., Andriluka, M., Thormählen, T., Schiele, B.: Articulated people detection and pose estimation: reshaping the future. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 3178–3185 (2012)Google Scholar
  30. 30.
    Qiu, W.: Generating human images and ground truth using computer graphics. Ph.D. thesis. University of California, Los Angeles (2016)Google Scholar
  31. 31.
    Romero, J., Loper, M., Black, M.J.: FlowCap: 2D human pose from optical flow. In: Gall, J., Gehler, P., Leibe, B. (eds.) GCPR 2015. LNCS, vol. 9358, pp. 412–423. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24947-6_34CrossRefGoogle Scholar
  32. 32.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  33. 33.
    Stark, M., Goesele, M., Schiele, B.: Back to the future: learning shape models from 3D CAD data. In: BMVC, vol. 2, no. 4, p. 5 (2010)Google Scholar
  34. 34.
    Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2686–2694 (2015)Google Scholar
  35. 35.
    Sun, B., Feng, J., Saenko, K.: Correlation alignment for unsupervised domain adaptation. arXiv preprint arXiv:1612.01939 (2016)
  36. 36.
    Sun, B., Peng, X., Saenko, K.: Generating large scale image datasets from 3D CAD models. In: CVPR 2015 Workshop on the Future of Datasets in Vision (2015)Google Scholar
  37. 37.
    Sun, M., Su, H., Savarese, S., Fei-Fei, L.: A multi-view probabilistic model for 3D object classes. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1247–1254 (2009)Google Scholar
  38. 38.
    Varol, G., et al.: Learning from synthetic humans. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 (2017)Google Scholar
  39. 39.
    Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724–4732 (2016)Google Scholar
  40. 40.
    Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems, pp. 3320–3328 (2014)Google Scholar
  41. 41.
    Yu, F., Zhang, Y., Song, S., Seff, A., Xiao, J.: LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015)
  42. 42.
    Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ADE20K dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Augmented Cognition Lab, Electrical and Computer Engineering DepartmentNortheastern UniversityBostonUSA

Personalised recommendations