Carnegie-Mellon Mocap Database. http://mocap.cs.cmu.edu/.
Badler, N. I., Phillips, C. B., & Webber, B. L. (1993). Simulating Humans: Computer Graphics Animation and Control. New York, NY, USA: Oxford University Press Inc.
Book
Google Scholar
Baradel, F., Wolf, C., & Mille, J. (2017). Pose-conditioned spatio-temporal attention for human action recognition. CoRR. (abs/1703.10106).
Baradel, F., Wolf, C., Mille, J., & Taylor, G.W. (2018). Glimpse clouds: Human activity recognition from unstructured feature points. In: CVPR.
Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., & Black, M.J. (2016). Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In: ECCV.
Carreira, J., & Zisserman, A. (2017). Quo vadis, action recognition? A new model and the Kinetics dataset. In: CVPR.
Chen, C.F., Panda, R., Ramakrishnan, K., Feris, R., Cohn, J., Oliva, A., & Fan, Q. (2020). Deep analysis of CNN-based spatio-temporal representations for action recognition. arXiv preprint arXiv:2010.11757.
Chen, W., Wang, H., Li, Y., Su, H., Wang, Z., Tu, C., Lischinski, D., Cohen-Or, D., & Chen, B. (2016). Synthesizing training images for boosting human 3D pose estimation. In: 3DV.
Crasto, N., Weinzaepfel, P., Alahari, K., & Schmid, C. (2019). MARS: Motion-augmented RGB stream for action recognition. In: CVPR.
De Souza, C.R., Gaidon, A., Cabon, Y., & López Peña, A.M. (2017) Procedural generation of videos to train deep action recognition networks. In: CVPR.
Doersch, C., & Zisserman, A. (2019). Sim2real transfer learning for 3D pose estimation: Motion to the rescue. CoRR. (abs/1907.02499).
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D., & Brox, T. (2015). FlowNet: Learning optical flow with convolutional networks. In: ICCV.
Fang, H.S., Xie, S., Tai, Y.W., & Lu, C. (2017). RMPE: Regional multi-person pose estimation. In: ICCV.
Farhadi, A., & Tabrizi, M.K. (2008). Learning to recognize activities from the wrong view point. In: ECCV.
Feichtenhofer, C., Fan, H., Malik, J., & He, K. (2019). SlowFast networks for video recognition. In: ICCV.
Feichtenhofer, C., Pinz, A., & Zisserman, A. (2016). Convolutional two-stream network fusion for video action recognition. In: CVPR.
Ghezelghieh, M.F., Kasturi, R., & Sarkar, S. (2016). Learning camera viewpoint using CNN to improve 3D body pose estimation. In: 3DV.
Hara, K., Kataoka, H., & Satoh, Y. (2018). Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet? In: CVPR.
Hasson, Y., Varol, G., Tzionas, D., Kalevatykh, I., Black, M.J., Laptev, I., Schmid, C. (2019). Learning joint reconstruction of hands and manipulated objects. In: CVPR.
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep residual learning for image recognition. In: CVPR.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Article
Google Scholar
Hoffmann, D.T., Tzionas, D., Black, M.J., & Tang, S. (2019). Learning to train with synthetic humans. In: GCPR.
Hu, J. F., Zheng, W. S., Lai, J., & Jianguo, Z. (2017). Jointly learning heterogeneous features for RGB-D activity recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11), 2186–2200.
Article
Google Scholar
Ji, Y., Xu, F., Yang, Y., Shen, F., Shen, H.T., & Zheng, W. (2018). A large-scale RGB-D database for arbitrary-view human action recognition. In: ACMMM.
Jingtian, Z., Shum, H., Han, J., & Shao, L. (2018). Action recognition from arbitrary views using transferable dictionary learning. IEEE Transactions on Image Processing, 27, 4709–4723.
MathSciNet
MATH
Google Scholar
Junejo, I. N., Dexter, E., Laptev, I., & Perez, P. (2011). View-independent action recognition from temporal self-similarities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 172–185.
Article
Google Scholar
Kanazawa, A., Black, M.J., Jacobs, D.W., & Malik, J.(2018). End-to-end recovery of human shape and pose. In: CVPR.
Kanazawa, A., Zhang, J.Y., Felsen, P., & Malik, J. (2019) Learning 3D human dynamics from video. In: CVPR.
Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., et al. (2017). The Kinetics human action video dataset. CoRR. (abs/1705.06950).
Ke, Q., Bennamoun, M., An, S., Sohel, F., & Boussaid, F. (2017). A new representation of skeleton sequences for 3D action recognition. In: CVPR.
Kocabas, M., Athanasiou, N., & Black, M.J. (2020). VIBE: Video inference for human body pose and shape estimation. In: CVPR.
Kolotouros, N., Pavlakos, G., Black, M.J., & Daniilidis, K. (2019) Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: ICCV.
Kolotouros, N., Pavlakos, G., & Daniilidis, K. (2019) Convolutional mesh regression for single-image human shape reconstruction. In: CVPR.
Kong, Y., Ding, Z., Li, J., & Fu, Y. (2017). Deeply learned view-invariant features for cross-view action recognition. IEEE Transactions on Image Processing, 26(6), 3028–3037.
MathSciNet
Article
Google Scholar
Kong, Y., & Fu, Y. (2018). Human action recognition and prediction: A survey. CoRR. (abs/1806.11230).
Kortylewski, A., Egger, B., Schneider, A., Gerig, T., Morel-Forster, A., & Vetter, T. (2018). Empirically analyzing the effect of dataset biases on deep face recognition systems. In: CVPRW.
Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M.J., & Gehler, P.V. (2017) Unite the people: Closing the loop between 3D and 2D human representations. In: CVPR.
LeCun, Y., Boser, B. E., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W. E., & Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4), 541–551.
Article
Google Scholar
Li, J., Wong, Y., Zhao, Q., & Kankanhalli, M. (2018). Unsupervised learning of view-invariant action representations. In: NeurIPS.
Li, W., Xu, Z., Xu, D., Dai, D., & Gool, L. V. (2018). Domain generalization and adaptation using low rank exemplar SVMs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(5), 1114–1127.
Article
Google Scholar
Lin, J., Gan, C., Han, & S. (2019). TSM: Temporal shift module for efficient video understanding. In: ICCV.
Liu, J., Akhtar, N., & Mian, A. (2019). Temporally coherent full 3D mesh human pose recovery from monocular video. CoRR. (abs/1906.00161).
Liu, J., Rahmani, H., Akhtar, N., & Mian, A. (2019) Learning human pose models from synthesized data for robust RGB-D action recognition. International Journal of Computer Vision (IJCV), 127, 1545-1564.
Article
Google Scholar
Liu, J., Shah, M., Kuipers, B., & avarese, S. (2011). Cross-view action recognition via view knowledge transfer. In: CVPR.
Liu, J., Shahroudy, A., Xu, D., & Wang, G. (2016). Spatio-temporal LSTM with trust gates for 3D human action recognition. In: ECCV.
Liu, J., Wang, G., Hu, P., Duan, L.Y., & Kot, A.C. (2017). Global context-aware attention LSTM networks for 3D action recognition. In: CVPR.
Liu, M., Liu, H., & Chen, C. (2017). Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognition, 68(C), 346–362.
Liu, M., & Yuan, J. (2018). Recognizing human actions as the evolution of pose estimation maps. In: CVPR.
Liu, Y., Lu, Z., Li, J., & Yang, T. (2019). Hierarchically learned view-invariant representations for cross-view action recognition. IEEE Transactions on Circuits and Systems for Video Technology, 29, 2416–2430.
Article
Google Scholar
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., & Black, M.J. (2015). SMPL: A skinned multi-person linear model. ACM transactions on graphics (TOG), 34(6), 1-16.
Article
Google Scholar
Luo, Z., Hsieh, J.T., Jiang, L., Niebles, J.C., & Fei-Fei, L. (2018). Graph distillation for action detection with privileged information. In: ECCV.
Luvizon, D.C., Picard, D., & Tabia, H. (2018). 2D/3D pose estimation and action recognition using multitask deep learning. In: CVPR.
Lv, F., & Nevatia, R.(2007). Single view human action recognition using key pose matching and viterbi path searching. In: CVPR.
Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., & Black, M.J. (2019). AMASS: Archive of motion capture as surface shapes. In: ICCV.
Marin, J., Vazquez, D., Geronimo, D., & Lopez, A.M. (2010). Learning appearance in virtual scenarios for pedestrian detection. In: CVPR.
Masi, I., Tran, A.T., Hassner, T., Sahin, G., & Medioni, G. (2019). Face-specific data augmentation for unconstrained face recognition. International Journal of Computer Vision (IJCV), 127, 642-667.
Article
Google Scholar
Newell, A., Yang, K., & Deng, J. (2016). Stacked hourglass networks for human pose estimation. In: ECCV.
Omran, M., Lassner, C., Pons-Moll, G., Gehler, P.V., & Schiele, B. (2018). Neural body fitting: Unifying deep learning and model-based human pose and shape estimation. In: 3DV.
Pavlakos, G., Zhu, L., Zhou, X., & Daniilidis, K. (2018). Learning to estimate 3D human pose and shape from a single color image. In: CVPR.
Pishchulin, L., Jain, A., Andriluka, M., Thormählen, T., & Schiele, B. (2012).Articulated people detection and pose estimation: Reshaping the future. In: CVPR.
Price, W., & Damen, D. (2019). Retro-Actions: Learning ’close’ by time-reversing ’open’ videos.
Puig, X., Ra, K., Boben, M., Li, J., Wang, T., Fidler, S., & Torralba, A. (2018). VirtualHome: Simulating household activities via programs. In: CVPR.
Qian, X., Fu, Y., Xiang, T., Wang, W., Qiu, J., Wu, Y., Jiang, Y.G., & Xue, X. (2018). Pose-normalized image generation for person re-identification. In: ECCV.
Rahmani, H., Mahmood, A., Huynh, D., & Mian, A. (2016). Histogram of oriented principal components for cross-view action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(12), 2430–2443.
Article
Google Scholar
Rahmani, H., & Mian, A. (2015) Learning a non-linear knowledge transfer model for cross-view action recognition. In: CVPR.
Rahmani, H., & Mian, A. (2016). 3D action recognition from novel viewpoints. In: CVPR.
Rahmani, H., Mian, A., & Shah, M. (2018). Learning a deep model for human action recognition from novel viewpoints. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(3), 667–681.
Article
Google Scholar
Sakoe, H., & Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(1), 43–49.
Article
Google Scholar
Shahroudy, A., Liu, J., Ng, T.T., & Wang, G. (2016). NTU RGB+D: A large scale dataset for 3D human activity analysis. In: CVPR.
Shi, L., Zhang, Y., Cheng, J., & Lu, H. (2019). Skeleton-based action recognition with directed graph neural networks. In: CVPR.
Shi, L., Zhang, Y., Cheng, J., & Lu, H. (2019). Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: CVPR.
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., & Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In: CVPR.
Si, C., Chen, W., Wang, W., Wang, L.,&Tan, T. (2019). An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In: CVPR.
Simonyan, K.,&Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. In: NeurIPS.
Soomro, K., Roshan Zamir, A., & Shah, M. (2012). UCF101: A dataset of 101 human actions classes from videos in the wild. CRCV-TR-12-01.
Su, H., Qi, C.R., Li, Y.,&Guibas, L.J. (2015). Render for CNN: Viewpoint estimation in images using CNNs trained with rendered 3D model views. In: ICCV.
SURREACT project page. https://www.di.ens.fr/willow/research/surreact/.
Tieleman, T.,&Hinton, G. (2012). Lecture 6.5—RMSprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning.
Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. (2015). Learning spatiotemporal features with 3D convolutional networks. In: ICCV.
Tung, H.Y.F., Tung, H.W., Yumer, E., & Fragkiadaki, K. (2017). Self-supervised learning of motion capture. In: NeurIPS.
Varol, G., Ceylan, D., Russell, B., Yang, J., Yumer, E., Laptev, I., Schmid, C. (2018). BodyNet: Volumetric inference of 3D human body shapes. In: ECCV.
Varol, G., Laptev, I., & Schmid, C. (2018). Long-term temporal convolutions for action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6), 1510–1517.
Article
Google Scholar
Varol, G., Romero, J., Martin, X., Mahmood, N., Black, M.J., Laptev, I., & Schmid, C.(2017). Learning from synthetic humans. In: CVPR.
Wang, D., Ouyang, W., Li, W., & Xu, D. (2018). Dividing and aggregating network for multi-view action recognition. In: ECCV.
Wang, J., Nie, X., Xia, Y., Wu, Y., & Zhu, S.C.(2014). Cross-view action modeling, learning, and recognition. In: CVPR.
Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., & Van Gool, L.:(2016) Temporal segment networks: Towards good practices for deep action recognition. In: ECCV.
Weinland, D., Boyer, E., & Ronfard, R. (2007). Action recognition from arbitrary views using 3D exemplars. In: ICCV.
Xie, S., Sun, C., Huang, J., Tu, Z., & Murphy, K. (2017). Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. In: ECCV.
Yan, S., Xiong, Y., & Lin, D. (2018). Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI.
Yu, F., Zhang, Y., Song, S., Seff, A., & Xiao, J. (2015). LSUN: Construction of a large-scale image dataset using deep learning with humans in the loop. CoRR. (abs/1506.03365).
Yuille, A.L., Liu, C.: Deep nets: What have they ever done for vision? CoRR abs/1805.04025 (2018).
Zhang, D., Guo, G., Huang, D., & Han, J. (2018). PoseFlow: A deep motion representation for understanding human behaviors in videos. In: CVPR.
Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., & Zheng, N. (2017). View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: ICCV.
Zheng, J., & Jiang, Z. (2013). Learning view-invariant sparse representations for cross-view action recognition. In: ICCV.
Zheng, J., Jiang, Z., & Chellappa, R. (2016). Cross-view action recognition via transferable dictionary learning. IEEE Transactions on Image Processing, 25(6), 2542–2556.
MathSciNet
Article
Google Scholar
Zhu, Y., & Newsam, S. (2018). Random temporal skipping for multirate video analysis. In: ACCV.
Zimmermann, C., & Brox, T. (2017). Learning to estimate 3D hand pose from single RGB images. In: ICCV.
Zolfaghari, M., Oliveira, G.L., Sedaghat, N., & Brox, T.(2017). Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection. In: ICCV.
Zolfaghari, M., Singh, K., & Brox, T. (2018). ECO: efficient convolutional network for online video understanding. In: ECCV.