Abstract
This paper presents a proposed multistage deep neural network pipeline for sports action recognition. The proposed pipeline is based on the classification of stroke types of table tennis using spatiotemporal features. The proposed network predicts the final class with different aspects of the final class at each stage. Outcomes of each stage are then fused together to obtain the final prediction. We utilize four different methods that are used in each stage, namely RGB image-based, optical flow-based, pose-based, and region-of-interest-based methods. We conducted our experiments on the TTSTROKE-21 dataset, which has been introduced in MediaEval Challenge 2020. Experimental results show that our proposed methodology obtains 90.7% test accuracy using a combination of RGB images and optical flow-based methods together.
Similar content being viewed by others
References
Soomro, K., Zamir, A.R., Shah, M.: Ucf101: A dataset of 101 human actions classes from videos in the wild (2012). arXiv:1212.0402
Safdarnejad, S.M., Liu, X., Udpa, L., Andrus, B., Wood, J., Craven, D.: Sports videos in the wild (svw): a video dataset for sports analysis. In: 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), vol. 1. pp. 1–7. IEEE (2015)
Piergiovanni, A., Ryoo, M.S.: Fine-grained activity recognition in baseball videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1740–1748 (2018)
Pettersen, S.A., Johansen, D., Johansen, H., Berg-Johansen, V., Gaddam, V.R., Mortensen, A., Langseth, R., Griwodz, C., Stensland, H.K., Halvorsen, P.: Soccer video and player position dataset. In: Proceedings of the 5th ACM Multimedia Systems Conference, pp. 18–23 (2014)
Liu, R., Wang, Z., Shi, X., Zhao, H., Qiu, S., Li, J., Yang, N.: Table tennis stroke recognition based on body sensor network. In: International Conference on Internet and Distributed Computing Systems, pp. 1–10. Springer (2019)
Blank, P., Hoßbach, J., Schuldhaus, D., Eskofier, B.M.: Sensor-based stroke detection and stroke type classification in table tennis. In: Proceedings of the 2015 ACM International Symposium on Wearable Computers, pp. 93–100 (2015)
Dokic, K., Mesic, T., Martinovic, M.: Table tennis forehand and backhand stroke recognition based on neural network. In: International Conference on Advances in Computing and Data Sciences, pp. 24–35. Springer (2020)
Hegazy, H., Abdelsalam, M., Hussien, M., Elmosalamy, S., Hassan, Y.M., Nabil, A.M., Atia, A.: Online detection and classification of in-corrected played strokes in table tennis using IR depth camera. Proc. Comput. Sci. 170, 555–562 (2020)
Hegazy, H., Abdelsalam, M., Hussien, M., Elmosalamy, S., Hassan, Y.M., Nabil, A.M., Atia, A.: Ipingpong: a real-time performance analyzer system for table tennis stroke’s movements. Proc. Comput. Sci. 175, 80–87 (2020)
Pierre-Etienne, M., B.-P. J, P. R, M. J.: Fine grained sport action recognition with twin spatio-temporal convolutional neural networks. Multimed. Tools Appl. 79, no. 20429–20447, pp. 85–97 (2020)
Martin, P.-E., Benois-Pineau, J., Péteri, R., Morlier, J.: Optimal choice of motion estimation methods for fine-grained action classification with 3d convolutional networks. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 554–558. IEEE (2019)
Xia, K., Wang, H., Xu, M., Li, Z., He, S., Tang, Y.: Racquet sports recognition using a hybrid clustering model learned from integrated wearable sensor. Sensors 20(6), 1638 (2020)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. Adv. Neural Inf. Process. Syst. 27, 568–576 (2014)
Tammvee, M., Anbarjafari, G.: Human activity recognition-based path planning for autonomous vehicles. In: Signal, Image and Video Processing, pp. 1–8 (2020)
Lüsi, I., Jr., J. C. J., Gorbova, J., Baró, X., Escalera, S., Demirel, H., Allik, J., Ozcinar, C., Anbarjafari, G.: Joint challenge on dominant and complementary emotion recognition using micro emotion features and head-pose estimation: Databases. In: 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pp. 809–813. IEEE (2017)
Sato, S., Aono, M.: Mediaeval 2020: Leveraging human pose estimation model for stroke classification in table tennis. In: MediaEval (2020)
Oved, D., Alvarado, I., Gallo, A.: Real-time human pose estimation in the browser with tensorflow. js. In: TensorFlow Medium, May (2018)
Nguyen-Truong, H., Cao, S., Nguyen, K.N.A., Pham, B.-D., Dao, H., Le, M.-Q., Nguyen-Dinh, H.-P., Nguyen, H.-D., Tran, M.-T.: Mediaeval 2020: Hcmus at mediaeval 2020: Ensembles of temporal deep neural networks for table tennis strokes classification task. In: MediaEval (2020)
Alp Güler, R., Neverova, N., Kokkinos, I.: Densepose: Dense human pose estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7297–7306 (2018)
Martin, P.-E., Benois-Pineau, J., Mansencal, B., Péteri, R., Morlier, J.: Mediaeval 2020: Classification of strokes in table tennis with a three stream spatio-temporal cnn for mediaeval 2020. In: MediaEval, (2020)
Ahmadi, A., Mitchell, E., Richter, C., Destelle, F., Gowing, M., O’Connor, N.E., Moran, K.: Toward automatic activity classification and movement assessment during a sports training session. IEEE Internet Things J. 2(1), 23–32 (2014)
Papandreou G., Zhu, T., Chen, L.-C., Gidaris, S., Tompson, J., Murphy, K.: Personlab: person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 269–286 (2018)
Sriraman, S., Srinivasan, S., Krishnan, V.K., B.J, Mirnalinee, T.T.: Mediaeval 2019: Lrcns for stroke detection in table tennis. In: MediaEval (2019)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv:1409.1556
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Aktas, K., Demirel, M., Moor, M., Olesk, J., Anbarjafari, G.: Mediaeval 2020: Spatio-temporal based table tennis hit assessment using lstm algorithm. In: MediaEval (2020)
Polikar, R.: Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 6(3), 21–45 (2006)
Demirel, H., Anbarjafari, G.: Data fusion boosted face recognition based on probability distribution functions in different colour channels. EURASIP J. Adv. Signal Process. 2009(1), 482585 (2009)
Horn, B.K., Schunck, B.G.: Determining optical flow. In: Techniques and Applications of Image Understanding, vol. 281, pp. 319–331 . International Society for Optics and Photonics (1981)
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., Van Der Smagt, P., Cremers, D., Brox, T.: Flownet: Learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2758–2766 (2015)
Liu, Z., Zhu, J., Bu, J., Chen, C.: A survey of human pose estimation: the body parts parsing based methods. J. Vis. Commun. Image Represent. 32, 10–19 (2015)
Wang, J., Qiu, K., Peng, H., Fu, J., Zhu, J.: Ai coach: Deep human pose estimation and analysis for personalized athletic training assistance. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 374–382 (2019)
Fang, H.-S., Xie, S., Tai, Y.-W., Lu, C.: RMPE: Regional multi-person pose estimation. In: ICCV (2017)
Jocher, G., Nishimura, K., Mineeva, T., Vilariño, R.: Yolov5 (2020). https://github.com/ultralytics/yolov5
Soviany, P., Ionescu, R.T.: Optimizing the trade-off between single-stage and two-stage deep object detectors using image difficulty prediction. In: 2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC). pp. 209–214. IEEE (2018)
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. In: 1999 Ninth International Conference on Artificial Neural Networks ICANN 99. (Conf. Publ. No. 470), vol. 2, pp. 850–855 (1999)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift (2015). arXiv:1502.03167
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. In: COURSERA: Neural Networks for Machine Learning, vol. 4, no. 2, pp. 26–31 (2012)
Martin, P.-E., Benois-Pineau, J., Peteri, R., Morlier, J.: Fine grained sport action recognition with twin spatio-temporal convolutional neural networks: application to table tennis. Multimed. Tools Appl. 79, 07 (2020)
Zhang, Q., Sun, S.: A centroid k-nearest neighbor method. In: International Conference on Advanced Data Mining and Applications, pp. 278–285. Springer (2010)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work has been partially supported by the Estonian Centre of Excellence in IT (EXCITE) funded by the European Regional Development Fund. The authors also gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU.
Rights and permissions
About this article
Cite this article
Aktas, K., Demirel, M., Moor, M. et al. Spatiotemporal based table tennis stroke-type assessment. SIViP 15, 1593–1600 (2021). https://doi.org/10.1007/s11760-021-01893-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-021-01893-7