Abstract
Current linguistic research on sign language is often based on analysing large corpora of video recordings. The videos must be annotated either manually or automatically. Automatic methods for estimating the signer body configuration—especially the hand positions and shapes—would thus be of great practical interest. Methods based on rigorous 3D and 2D modelling of the body parts have been presented. However, they face insurmountable problems of computational complexity due to the large sizes of modern linguistic corpora. In this paper we look at an alternative approach and investigate what can be achieved with the use of straightforward local 2D appearance based methods: template matching-based tracking of local image neighbourhoods and supervised skin blob category detection based on local appearance features. After describing these techniques, we construct a signer configuration estimation system using the described techniques among others, and demonstrate the system in the video material of Suvi dictionary of Finnish Sign Language.
This work has been funded by the following grants of the Academy of Finland: 140245, Content-based video analysis and annotation of Finnish Sign Language (CoBaSiL); 251170, Finnish Centre of Excellence in Computational Inference Research (COIN).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Buehler, P., Everingham, M., Huttenlocher, D.P., Zisserman, A.: Long term arm and hand tracking for continuous sign language TV broadcasts. In: Proceedings of the British Machine Vision Conference (2008)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 886–893 (2005)
de La Gorce, M., Fleet, D., Paragios, N.: Model-based 3D hand pose estimation from monocular video. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(9), 1793–1805 (2011)
Dreuw, P., Forster, J., Ney, H.: Tracking benchmark databases for video-based sign language recognition. In: Kutulakos, K.N. (ed.) ECCV 2010 Workshops, Part I. LNCS, vol. 6553, pp. 286–297. Springer, Heidelberg (2012)
Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. Int. J. Comput. Vision 61(1), 55–79 (2005)
Karppa, M., Viitaniemi, V., Luzardo, M., Laaksonen, J., Jantunen, T.: SLMotion - an extensible sign language oriented video analysis tool. In: Proceedings of 9th Language Resources and Evaluation Conference (LREC 2014), Reykjavík, Iceland. European Language Resources Association (May 2014)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)
Miche, Y., Sorjamaa, A., Bas, P., Simula, O., Jutten, C., Lendasse, A.: OP-ELM: Optimally-pruned extreme learning machine. IEEE Transactions on Neural Networks 21(1), 158–162 (2010)
Ojala, T., Pietikäinen, M., Harwood, D.: A comparative study of texture measures with classification based on feature distributions. Pattern Recognition 29(1), 51–59 (1996)
Pfister, T., Charles, J., Everingham, M., Zisserman, A.: Automatic and efficient long term arm and hand tracking for continuous sign language TV broadcasts. In: British Machine Vision Conference (2012)
Shi, J., Tomasi, C.: Good features to track. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 1994), pp. 593–600 (June 1994)
Suvi, the on-line dictionary of Finnish Sign Language (2013), http://suvi.viittomat.net, The online service was opened in 2003 and the user interface has been renewed in 2013
van de Sande, K.E.A., Gevers, T., Snoek, C.G.M.: Evaluation of color descriptors for object and scene recognition. In: Proc. of IEEE CVPR 2008, Anchorage. Alaska, USA (June 2008)
Viitaniemi, V., Jantunen, T., Savolainen, L., Karppa, M., Laaksonen, J.: S-pot - a benchmark in spotting signs within continuous signing. In Proceedings of 9th Language Resources and Evaluation Conference (LREC 2014), Reykjavík, Iceland. European Language Resources Association (May 2014)
Viitaniemi, V., Karppa, M., Laaksonen, J.: Experiments on recognising the handshape in blobs extracted from sign language videos. In: Proceedings of 22th International Conference on Pattern Recognition (ICPR), Stockholm, Sweden (August 2014)
Viitaniemi, V., Karppa, M., Laaksonen, J., Jantunen, T.: Detecting hand-head occlusions in sign language video. In: Kämäräinen, J.-K., Koskela, M. (eds.) SCIA 2013. LNCS, vol. 7944, pp. 361–372. Springer, Heidelberg (2013)
Viitaniemi, V., Laaksonen, J.: Spatial extensions to bag of visual words. In: Proceedings of ACM International Conference on Image and Video Retrieval (CIVR 2009), Fira, Greece (July 2009)
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), pp. I:511–I:518 (2001)
Wu, J., Rehg, J.M.: CENTRIST: A visual descriptor for scene categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(8), 1489–1501 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Viitaniemi, V., Karppa, M., Laaksonen, J. (2014). 2D Appearance Based Techniques for Tracking the Signer Configuration in Sign Language Video Recordings. In: Campilho, A., Kamel, M. (eds) Image Analysis and Recognition. ICIAR 2014. Lecture Notes in Computer Science(), vol 8815. Springer, Cham. https://doi.org/10.1007/978-3-319-11755-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-11755-3_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11754-6
Online ISBN: 978-3-319-11755-3
eBook Packages: Computer ScienceComputer Science (R0)