Modalities Combination for Italian Sign Language Extraction and Recognition

  • Bassem SeddikEmail author
  • Sami Gazzah
  • Najoua Essoukri Ben Amara
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9280)


We propose in this work an approach for the automatic extraction and recognition of the Italian sign language using the RGB, depth and skeletal-joint modalities offered by Microsoft’s Kinect sensor. We investigate the best modality combination that improves the human-action spotting and recognition in a continuous stream scenario. For this purpose, we define per modality a complementary feature representation and fuse the decisions of multiple SVM classifiers with probability outputs. We contribute by proposing a multi-scale analysis approach that combines a global Fisher vector representation with a local frame-wise one. In addition we define a temporal segmentation strategy that allows the generation of multiple specialized classifiers. The final decision is obtained using the combination of their results. Our tests have been carried out on the Chalearn gesture challenge dataset, and promising results have been obtained on primary experiments.


Motion spotting Action recognition Fisher vector Modalities combination Classification fusion 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alippi, C., Boracchi, G., Roveri, M.: Just-In-Time Classifiers for Recurrent Concepts. IEEE Transactions on Neural Networks and Learning Systems 24, 620–634 (2013)CrossRefGoogle Scholar
  2. 2.
    Bao, L., Intille, S.S.: Activity recognition from user-annotated acceleration data. In: Ferscha, A., Mattern, F. (eds.) PERVASIVE 2004. LNCS, vol. 3001, pp. 1–17. Springer, Heidelberg (2004) CrossRefGoogle Scholar
  3. 3.
    Belongie, S., Malik, J., Puzicha, J.: Shape Matching and Object Recognition Using Shape Contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 509–522 (2002)CrossRefGoogle Scholar
  4. 4.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. 886–893. IEEE Press, San Diego (2005)Google Scholar
  5. 5.
    Escalera, S., Baró, X., Gonzàlez, J., Bautista, M.A., Madadi, M., Reyes, M., Ponce-López, V., Escalante, H.J., Shotton, J., Guyon, I.: ChaLearn looking at people challenge 2014: dataset and results. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014 Workshops. LNCS, vol. 8925, pp. 459–473. Springer, Heidelberg (2015) Google Scholar
  6. 6.
    Evangelidis, G.D., Singh, G., Horaud, R.: Continuous gesture recognition from articulated poses. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014 Workshops. LNCS, vol. 8925, pp. 595–607. Springer, Heidelberg (2015) Google Scholar
  7. 7.
    Gazzah, S., Essoukri Ben Amara, N.: Writer identification using modular MLP classifier and genetic algorithm for optimal features selection. In: Wang, J., Yi, Z., Żurada, J.M., Lu, B.-L., Yin, H. (eds.) ISNN 2006. LNCS, vol. 3972, pp. 271–276. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  8. 8.
    Guyon, I., Athitsos, V., Jangyodsuk, P., Escalante, H.J.: The ChaLearn Gesture Dataset (CGD 2011), MVA (2013)Google Scholar
  9. 9.
    Hernandez-Vela, A., Bautista, M.A., Perez-Sala, X., Ponce-Lpez, V., Escalera, S., Bar, X., Pujol, P., Angulo, C.: Probability-based Dynamic Time Warping and Bag-of-Visual-and-Depth-Words for Human Gesture Recognition in RGB-D. Pattern Recognition Letters 50, 112–121 (2014)CrossRefGoogle Scholar
  10. 10.
    Ibanez, R., Soria, A., Teyseyre, A., Campo, M.: Easy gesture recognition for kinect. AES 76, 171–180 (2014)Google Scholar
  11. 11.
    Ortiz Laguna, J., Olaya, A.G., Borrajo, D.: A dynamic sliding window approach for activity recognition. In: Konstan, J.A., Conejo, R., Marzo, J.L., Oliver, N. (eds.) UMAP 2011. LNCS, vol. 6787, pp. 219–230. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  12. 12.
    Laptev, I.: On space-time interest points. IJCV 64(2–3), 107–123 (2005)CrossRefGoogle Scholar
  13. 13.
    Liang, B., Zheng, L.: Multi-modal gesture recognition using skeletal joints and motion trail model. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014 Workshops. LNCS, vol. 8925, pp. 623–638. Springer, Heidelberg (2015) Google Scholar
  14. 14.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)CrossRefGoogle Scholar
  15. 15.
    Mori, G., Malik, J.: Recovering 3d Human Body Configurations Using Shape Contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence 7, 1052–1062 (2006)CrossRefGoogle Scholar
  16. 16.
    Neverova, N., Wolf, C., Taylor, G.W., Nebout, F.: Multi-scale deep learning for gesture detection and localization. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014 Workshops. LNCS, vol. 8925, pp. 474–490. Springer, Heidelberg (2015) Google Scholar
  17. 17.
    Oreifej, O., Zicheng, L.: HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences. In: CVPR, pp. 716–723. IEEE Press, Los Alamitos (2013)Google Scholar
  18. 18.
    Peng, X., Wang, L., Cai, Z., Qiao, Y.: Action and gesture temporal spotting with super vector representation. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014 Workshops. LNCS, vol. 8925, pp. 518–527. Springer, Heidelberg (2015) Google Scholar
  19. 19.
    Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  20. 20.
    Pigou, L., Dieleman, S., Kindermans, P.-J., Schrauwen, B.: Sign language recognition using convolutional neural networks. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014 Workshops. LNCS, vol. 8925, pp. 572–578. Springer, Heidelberg (2015) Google Scholar
  21. 21.
    Rostamzadeh, N., Zen, G., Mironică, I., Uijlings, J., Sebe, N.: Daily living activities recognition via efficient high and low level cues combination and fisher kernel representation. In: Petrosino, A. (ed.) ICIAP 2013, Part I. LNCS, vol. 8156, pp. 431–441. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  22. 22.
    Seddik, B., Gazzah, S., Essoukri Ben Amara, N.: Hands, face and joints for multi-modal human-actions spotting and recognition. In: EUSIPCO (2015)Google Scholar
  23. 23.
    Seddik, B., Gazzah, S., Chateau, T., Essoukri Ben Amara, N.: Augmented skeletal joints for temporal segmentation of sign language actions. In: IPAS, pp. 1–6. Hammamet (2014)Google Scholar
  24. 24.
    Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R. Kipman, A., Blake, A.: Real-time human pose recognition in parts from a single depth image. In: CVPR (2011)Google Scholar
  25. 25.
    Sung, J., Ponce, C., Selman, B., Saxena, A.: Unstructured human activity detection from RGBD images. In: ICRA, pp. 842–849 (2012)Google Scholar
  26. 26.
    Vedaldi, A., Fulkerson, B.: VLFeat: An Open and Portable Library of Computer Vision Algorithms (2008)Google Scholar
  27. 27.
    Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV, pp. 3551–3558 (2013)Google Scholar
  28. 28.
    Yazid, H., Kalti, K., Essoukri Ben Amara, N.: A performance comparison of the Bayesian graphical model and the possibilistic graphical model applied in a brain MRI cases retrieval contribution. In: SSD, pp. 16. IEEE Press, Hammamet (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Bassem Seddik
    • 1
    Email author
  • Sami Gazzah
    • 1
  • Najoua Essoukri Ben Amara
    • 1
  1. 1.SAGE Laboratory, National Engineering School of SousseSousse UniversitySousseTunisia

Personalised recommendations