Skip to main content

Semantically Related Gestures Move Alike: Towards a Distributional Semantics of Gesture Kinematics

  • Conference paper
  • First Online:
Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management. Human Body, Motion and Behavior (HCII 2021)


Most manual communicative gestures that humans produce cannot be looked up in a dictionary, as these manual gestures inherit their meaning in large part from the communicative context and are not conventionalized. However, it is understudied to what extent the communicative signal as such—bodily postures in movement, or kinematics—can inform about gesture semantics. Can we construct, in principle, a distribution-based semantics of gesture kinematics, similar to how word vectorization methods in NLP (Natural language Processing) are now widely used to study semantic properties in text and speech? For such a project to get off the ground, we need to know the extent to which semantically similar gestures are more likely to be kinematically similar. In study 1 we assess whether semantic word2vec distances between the conveyed concepts participants were explicitly instructed to convey in silent gestures, relate to the kinematic distances of these gestures as obtained from Dynamic Time Warping (DTW). In a second director-matcher dyadic study we assess kinematic similarity between spontaneous co-speech gestures produced between interacting participants. Participants were asked before and after they interacted how they would name the objects. The semantic distances between the resulting names were related to the gesture kinematic distances of gestures that were made in the context of conveying those objects in the interaction. We find that the gestures’ semantic relatedness is reliably predictive of kinematic relatedness across these highly divergent studies, which suggests that the development of an NLP method of deriving semantic relatedness from kinematics is a promising avenue for future developments in automated multimodal recognition. Deeper implications for statistical learning processes in multimodal language are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others


  1. 1.

    The model used for word2vec can be downloaded here:

  2. 2.

    For a visual example of how time series are compared by Dynamic Time Warping, see our supplemental figure This example from study 1, shows the vertical displacement of the left hand tip for three compared gestures that conveyed the concept “airplane”.

  3. 3.

    Due to time constraints, participants only performed gestures for five randomly selected concepts. The repetition rate due to the robot’s failure to recognize the gesture was 79%.


  1. Motamedi, Y., Schouwstra, M., Smith, K., Culbertson, J., Kirby, S.: Evolving artificial sign languages in the lab: from improvised gesture to systematic sign. Cognition 192, (2019).

    Article  Google Scholar 

  2. Ortega, G., Özyürek, A.: Types of iconicity and combinatorial strategies distinguish semantic categories in silent gesture across cultures. Lan. Cogn. 12, 84–113 (2020).

    Article  Google Scholar 

  3. Ortega, G., Özyürek, A.: Systematic mappings between semantic categories and types of iconic representations in the manual modality: a normed database of silent gesture. Behav. Res. 52, 51–67 (2020).

    Article  Google Scholar 

  4. Gerwing, J., Bavelas, J.: Linguistic influences on gesture’s form. Gesture 4, 157–195 (2004).

    Article  Google Scholar 

  5. Rasenberg, M., Özyürek, A., Dingemanse, M.: Alignment in multimodal interaction: an integrative framework. Cogn. Sci. 44, (2020).

    Article  Google Scholar 

  6. Bernstein, N.: The Co-ordination and Regulations of Movements. Pergamon Press, Oxford (1967)

    Google Scholar 

  7. McNeill, D.: Hand and Mind: What Gestures Reveal about Thought. University of Chicago Press, Chicago (1992)

    Google Scholar 

  8. Kendon, A.: Gesture: Visible Action as Utterance. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  9. Kolorova, Z.: Lexikon der bulgarischen Alltagsgesten (2011)

    Google Scholar 

  10. Gentner, D., Brem, S.K.: Is snow really like a shovel? Distinguishing similarity from thematic relatedness. In: Hahn, M., Stoness, S.C. (eds.) Proceedings of the Twenty-first Annual Meeting of the Cognitive Science Society, pp. 179–184. Lawrence Erlbaum Associates, Mahwa (1999)

    Google Scholar 

  11. Müller, C.: Gestural modes of representation as techniques of depiction. In: Müller, C. (ed.) Body–Language–Communication: An International Handbook on Multimodality in Human Interaction, pp. 1687–1701. De Gruyter Mouton, Berlin (2013)

    Google Scholar 

  12. Streeck, J.: Depicting by gesture. Gesture 8, 285–301 (2008).

    Article  Google Scholar 

  13. Karuza, E.A., Thompson-Schill, S.L., Bassett, D.S.: Local patterns to global architectures: influences of network topology on human learning. Trends Cogn. Sci. 20, 629–640 (2016).

    Article  Google Scholar 

  14. Gleitman, L.R.: Verbs of a feather flock together II: the child’s discovery of words and their meanings. In: Nevin, B.E. (ed.) The Legacy of Zellig Harris: Language and Information Into the 21st Century, pp. 209–229 (2002)

    Google Scholar 

  15. Fowler, C.A.: Embodied, embedded language use. Ecol. Psychol. 22, 286 (2010).

    Article  Google Scholar 

  16. Pouw, W., Dixon, J.A.: Gesture networks: Introducing dynamic time warping and network analysis for the kinematic study of gesture ensembles. Discourse Processes 57, 301–319 (2019).

    Article  Google Scholar 

  17. Giorgino, T.: Computing and visualizing dynamic time warping alignments in R: the dtw package. J. Stat. Softw. 31 (2009).

  18. Muller, M.: Information Retrieval for Music and Motion. Springer, Heidelberg (2007).

    Book  Google Scholar 

  19. Beecks, C., et al.: Efficient query processing in 3D motion capture gesture databases. Int. J. Semant. Comput. 10, 5–25 (2016).

    Article  Google Scholar 

  20. Pouw, W., Dingemanse, M., Motamedi, Y., Ozyurek, A.: A systematic investigation of gesture kinematics in evolving manual languages in the lab. OSF Preprints (2020).

  21. de Wit, J., Krahmer, E., Vogt, P.: Introducing the NEMO-Lowlands iconic gesture dataset, collected through a gameful human–robot interaction. Behav. Res. (2020).

    Article  Google Scholar 

  22. Müller, C.: Gesture and sign: cataclysmic break or dynamic relations? Front. Psychol. 9 (2018).

  23. Rasenberg, M., Dingemanse, M., Özyürek, A.: Lexical and gestural alignment in interaction and the emergence of novel shared symbols. In: Ravignani, A., et al. (eds.) Evolang13, pp. 356–358 (2020)

    Google Scholar 

  24. Barry, T.J., Griffith, J.W., De Rossi, S., Hermans, D.: Meet the Fribbles: novel stimuli for use within behavioural research. Front. Psychol. 5 (2014).

  25. Mandera, P., Keuleers, E., Brysbaert, M.: Explaining human performance in psycholinguistic tasks with models of semantic similarity based on prediction and counting: a review and empirical validation. J. Mem. Lan. 92, 57–78 (2017).

    Article  Google Scholar 

  26. Zeman, D., et al.: CoNLL 2017 shared task: Multilingual parsing from raw text to universal dependencies. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, Vancouver, Canada, pp. 1–19. Association for Computational Linguistics (2017).

  27. Silva, D.F., Batista, G.A.E.P.A., Keogh, E.: On the effect of endpoints on dynamic time warping. Presented at the Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco (2016)

    Google Scholar 

  28. Donaldson, J.: tsne: T-Distributed Stochastic Neighbor Embedding for R (t-SNE) (2016)

    Google Scholar 

  29. Pouw, W., Dixon, J.A.: Entrainment and modulation of gesture–speech synchrony under delayed auditory feedback. Cogn. Sci. 43, (2019).

    Article  Google Scholar 

  30. Pouw, W., Dixon, J.A.: Quantifying gesture-speech synchrony. In: Proceedings of the 6th meeting of Gesture and Speech in Interaction, pp. 68–74. Universitaetsbibliothek Paderborn, Paderborn (2019).

  31. Ripperda, J., Drijvers, L., Holler, J.: Speeding up the detection of non-iconic and iconic gestures (SPUDNIG): a toolkit for the automatic detection of hand movements and gestures in video data. Behav. Res. 52, 1783–1794 (2020).

    Article  Google Scholar 

  32. Kenett, Y.N., Levi, E., Anaki, D., Faust, M.: The semantic distance task: quantifying semantic distance with semantic network path length. J. Exp. Psychol. Learn. Mem. Cogn. 43, 1470–1489 (2017).

    Article  Google Scholar 

  33. Kumar, A.A., Balota, D.A., Steyvers, M.: Distant connectivity and multiple-step priming in large-scale semantic networks. J. Exp. Psychol. Learn. Mem. Cogn. 46, 2261–2276 (2020).

    Article  Google Scholar 

  34. Beecks, C., et al.: Spatiotemporal similarity search in 3D motion capture gesture streams. In: Claramunt, C., Schneider, M., Wong, R.C.-W., Xiong, L., Loh, W.-K., Shahabi, C., Li, K.-J. (eds.) SSTD 2015. LNCS, vol. 9239, pp. 355–372. Springer, Cham (2015).

    Chapter  Google Scholar 

  35. Trujillo, J.P., Vaitonyte, J., Simanova, I., Özyürek, A.: Toward the markerless and automatic analysis of kinematic features: a toolkit for gesture and movement research. Behav Res. 51, 769–777 (2019).

    Article  Google Scholar 

  36. Hua, M., Shi, F., Nan, Y., Wang, K., Chen, H., Lian, S.: Towards more realistic human-robot conversation: a Seq2Seq-based body gesture interaction system. arXiv:1905.01641 [cs] (2019)

  37. Alexanderson, S., Székely, É., Henter, G.E., Kucherenko, T., Beskow, J.: Generating coherent spontaneous speech and gesture from text. In: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, pp. 1–3 (2020).

  38. Wu, B., Liu, C., Ishi, C.T., Ishiguro, H.: Modeling the conditional distribution of co-speech upper body gesture jointly using conditional-GAN and unrolled-GAN. Electronics 10, 228 (2021).

    Article  Google Scholar 

  39. Romberg, A.R., Saffran, J.R.: Statistical learning and language acquisition. WIREs Cogn. Sci. 1, 906–914 (2010).

    Article  Google Scholar 

  40. Saffran, J.R., Aslin, R.N., Newport, E.L.: Statistical learning by 8-month-old infants. Science 274, 1926–1928 (1996).

    Article  Google Scholar 

  41. Steyvers, M., Tenenbaum, J.B.: The large-scale structure of semantic networks: statistical analyses and a model of semantic growth. Cogn. Sci. 29, 41–78 (2005).

    Article  Google Scholar 

  42. Goldstein, R., Vitevitch, M.S.: The influence of clustering coefficient on word-learning: how groups of similar sounding words facilitate acquisition. Front. Psychol. 5 (2014).

  43. Nielsen, A.K., Dingemanse, M.: Iconicity in word learning and beyond: a critical review. Lang. Speech, 0023830920914339 (2020).

  44. Forbus, K.D., Ferguson, R.W., Lovett, A., Gentner, D.: Extending SME to handle large-scale cognitive modeling. Cogn. Sci. 41, 1152–1201 (2017).

    Article  Google Scholar 

  45. Siew, C.S.Q., Wulff, D.U., Beckage, N.M., Kenett, Y.N.: Cognitive network science: a review of research on cognition through the lens of network representations, processes, and dynamics. Accessed 29 Jan 2021

Download references


For study 2, we would like to thank Mark Dingemanse for his contributions in the CABB project to assess optimality of different word2vec models. For study 2, we would like to thank James Trujillo for his contributions to setting up the Kinect data collection. Study 2 came about in the context of a multidisciplinary research project within the Language in Interaction consortium, called Communicative Alignment in Brain and Behaviour (CABB). We wish to make explicit that the work has been shaped by contributions of CABB team members, especially (alphabetical order): Mark Blokpoel, Mark Dingemanse, Lotte Eijk, Iris van Rooij. The authors remain solely responsible for the contents of the paper. This work was supported by the Netherlands Organisation for Scientific Research (NWO) Gravitation Grant 024.001.006 to the Language in Interaction Consortium and is further supported by the Donders Fellowship awarded to Wim Pouw and Asli Ozyurek.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Wim Pouw .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pouw, W., de Wit, J., Bögels, S., Rasenberg, M., Milivojevic, B., Ozyurek, A. (2021). Semantically Related Gestures Move Alike: Towards a Distributional Semantics of Gesture Kinematics. In: Duffy, V.G. (eds) Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management. Human Body, Motion and Behavior. HCII 2021. Lecture Notes in Computer Science(), vol 12777. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-77816-3

  • Online ISBN: 978-3-030-77817-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics