A Roadmap for Technological Innovation in Multimodal Communication Research

Gregori, Alina; Amici, Federica; Brilmayer, Ingmar; Ćwiek, Aleksandra; Fritzsche, Lennart; Fuchs, Susanne; Henlein, Alexander; Herbort, Oliver; Kügler, Frank; Lemanski, Jens; Liebal, Katja; Lücking, Andy; Mehler, Alexander; Nguyen, Kim Tien; Pouw, Wim; Prieto, Pilar; Rohrer, Patrick Louis; Sánchez-Ramón, Paula G.; Schulte-Rüther, Martin; Schumacher, Petra B.; Schweinberger, Stefan R.; Struckmeier, Volker; Trettenbrein, Patrick C.; von Eiff, Celina I.

doi:10.1007/978-3-031-35748-0_30

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14029))

Included in the following conference series:

International Conference on Human-Computer Interaction

932 Accesses
11 Altmetric

Abstract

Multimodal communication research focuses on how different means of signalling coordinate to communicate effectively. This line of research is traditionally influenced by fields such as cognitive and neuroscience, human-computer interaction, and linguistics. With new technologies becoming available in fields such as natural language processing and computer vision, the field can increasingly avail itself of new ways of analyzing and understanding multimodal communication. As a result, there is a general hope that multimodal research may be at the “precipice of greatness” due to technological advances in computer science and resulting extended empirical coverage. However, for this to come about there must be sufficient guidance on key (theoretical) needs of innovation in the field of multimodal communication. Absent such guidance, the research focus of computer scientists might increasingly diverge from crucial issues in multimodal communication. With this paper, we want to further promote interaction between these fields, which may enormously benefit both communities. The multimodal research community (represented here by a consortium of researchers from the Visual Communication [ViCom] Priority Programme) can engage in the innovation by clearly stating which technological tools are needed to make progress in the field of multimodal communication. In this article, we try to facilitate the establishment of a much needed common ground on feasible expectations (e.g., in terms of terminology and measures to be able to train machine learning algorithms) and to critically reflect possibly idle hopes for technical advances, informed by recent successes and challenges in computer science, social signal processing, and related domains.

Supported by the DFG priority program Visual Communication (ViCom).

I. Brilmayer and P. L. Rohrer—External collaborator

A. Gregori, F. Amici, I. Brilmayer, A. Ćwiek, L. Fritzsche, S. Fuchs, A. Henlein, O. Herbort, F. Kügler, J. Lemanski, K. Liebal, A. Lücking, A. Mehler, K. T. Nguyen, W. Pouw, P. Prieto, P. L. Rohrer, P. G. Sánchez-Ramón, M. Schulte-Rüther, P. B. Schumacher, S. R. Schweinberger, V. Struckmeier, P. C. Trettenbrein, C. I. von Eiff—For the ViCom Consortium, alphabetical order except lead author.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/sccn/labstreaminglayer.
2.
https://github.com/sccn/xdf.
3.
https://github.com/labstreaminglayer/App-LabRecorder.git.
4.
www.talkbank.org.
5.
https://archive.mpi.nl/tla/.
6.
https://www.ortolang.fr/.
7.
https://www.voiceprivacychallenge.org/.
8.
https://github.com/quarto-dev/quarto.
9.
https://huggingface.co/ (last visited 27.01.2023).
10.
For a recent overview of transfer learning, see [195].

References

Abner, N., Cooperrider, K., Goldin-Meadow, S.: Gesture for linguists: a handy primer. Lang. Linguist. Compass 9(11), 437–451 (2015). https://doi.org/10.1111/lnc3.12168
Article Google Scholar
Abzaliev, A., Owens, A., Mihalcea, R.: Towards understanding the relation between gestures and language. In: Proceedings of the 29th International Conference on Computational Linguistics, pp. 5507–5520 (2022)
Google Scholar
Ahmed, F., Bari, A.H., Gavrilova, M.L.: Emotion recognition from body movement. IEEE Access 8, 11761–11781 (2019). https://doi.org/10.1109/ACCESS.2019.2963113
Article Google Scholar
Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., Paggio, P.: The mumin coding scheme for the annotation of feedback, turn management and sequencing phenomena. Lang. Resour. Eval. 41(3), 273–287 (2007). https://doi.org/10.1007/s10579-007-9061-5
Article Google Scholar
Alviar, C., Dale, R., Dewitt, A., Kello, C.: Multimodal coordination of sound and movement in music and speech. Discourse Process. 57(8), 682–702 (2020). https://doi.org/10.1080/0163853X.2020.1768500
Article Google Scholar
Alviar, C., Kello, C.T., Dale, R.: Multimodal coordination and pragmatic modes in conversation. Language Sciences, p. 101524 (2023). https://doi.org/10.1016/j.langsci.2022.101524
Amici, F., Oña, L., Liebal, K.: Compositionality in primate gestural communication and multicomponent signal displays. Int. J. Primatol. (2022). https://doi.org/10.1007/s10764-022-00316-9
Article Google Scholar
Anderson, C.A., Wiggins, I.M., Kitterick, P.T., Hartley, D.E.H.: Adaptive benefit of cross-modal plasticity following cochlear implantation in deaf adults. Proc. Natl. Acad. Sci. U.S.A. 114(38), 10256–10261 (2017). https://doi.org/10.1073/pnas.1704785114
Article Google Scholar
Aranyi, G., Pecune, F., Charles, F., Pelachaud, C., Cavazza, M.: Affective interaction with a virtual character through an fNIRS brain-computer interface. Front. Comput. Neurosci. 10, 70 (Jul 2016). https://doi.org/10.3389/fncom.2016.00070
Baladrin, J.B., et al.: Imaging brain function with functional near-infrared spectroscopy in unconstrained environments. Front. Hum. Neurosci. 11, 258 (2017). https://doi.org/10.3389/fnhum.2017.00258
Article Google Scholar
Balconi, M., Fronda, G., Bartolo, A.: Affective, social, and informative gestures reproduction in human interaction: hyperscanning and brain connectivity. J. Mot. Behav. 53(3), 296–315 (2021). https://doi.org/10.1080/00222895.2020.1774490
Article Google Scholar
Baltrusaitis, T., Zadeh, A., Lim, Y.C., Morency, L.P.: Openface 2.0: Facial behavior analysis toolkit. In: 2018 13th IEEE International Conference On Automatic Face & Gesture Recognition (FG 2018), pp. 59–66. IEEE (2018). https://doi.org/10.1109/FG.2018.00019
Baroni, M.: Grounding distributional semantics in the visual world. Lang. Linguist. Compass 10(1), 3–13 (2016). https://doi.org/10.1111/lnc3.12170
Article Google Scholar
Barros, P., Parisi, G.I., Fu, D., Liu, X., Wermter, S.: Expectation learning for adaptive crossmodal stimuli association. In: EUCog Meeting Proceedings. EUCog, EUCog Meeting (Nov 2017). ARXIV:1801.07654
Baur, T., et al.: eXplainable cooperative machine learning with NOVA. KI - Künstliche Intelligenz 34(2), 143–164 (2020). https://doi.org/10.1007/s13218-020-00632-3
Article Google Scholar
Becker, J.T., Boller, F., Lopez, O.L., Saxton, J., McGonigle, K.L.: The natural history of Alzheimer’s disease: description of study cohort and accuracy of diagnosis. Arch. Neurol. 51(6), 585–594 (1994). https://doi.org/10.1001/archneur.1994.00540180063015
Article Google Scholar
Bierman, A.K.: That there are no iconic signs. Res. 23(2), 243–249 (1962). https://doi.org/10.2307/2104916
Article Google Scholar
Birdwhistell, R.L.: Kinesics and Context. Conduct and Communication Series, University of Pennsylvania Press, Philadelphia (1970). https://doi.org/10.9783/9780812201284
Blache, P., Bertrand, R., Ferré, G., Pallaud, B., Prévot, L., Rauzy, S.: The corpus of interactional data: A large multimodal annotated resource. In: Handbook of linguistic annotation, pp. 1323–1356. Springer (2017). https://doi.org/10.1007/978-94-024-0881-2_51
Boersma, P.: The use of Praat in corpus research. In: Durand, J., Gut, U., Kristoffersen, G. (eds.) The Oxford handbook of corpus phonology, pp. 342–360. Oxford handbooks in linguistics, Oxford University Press, Oxford (2014). https://doi.org/10.1093/oxfordhb/9780199571932.013.016
Boersma, P., Weenink, D.: Praat: doing phonetics by computer [computer program] version 6.3.03. https://www.praat.org/ (2022)
Bohannon, R.W., Harrison, S., Kinsella-Shaw, J.: Reliability and validity of pendulum test measures of spasticity obtained with the polhemus tracking system from patients with chronic stroke. J. Neuroeng. Rehabil. 6(1), 1–7 (2009). https://doi.org/10.1186/1743-0003-6-30
Article Google Scholar
Bolly, C.T.: CorpAGEst annotation manual (ii. speech annotation guidelines) (2016). https://corpagest.wordpress.com/working-papers/
Bressem, J.: A linguistic perspective on the notation of form features in gestures. In: Müller, C., Cienki, A., Fricke, E., Ladewig, S.H., McNeill, David und Bressem, J. (eds.) Body - Language - Communication. An International Handbook on Multimodality in Human Interaction, Handbooks of Linguistics and Communication Science, vol. 1, chap. 70, pp. 1079–1089. De Gruyter Mouton, Berlin and Boston (2013). https://doi.org/10.1515/9783110261318.1079
Burks, A.W.: Icon, index, and symbol. Res. 9(4), 673–689 (1949). https://doi.org/10.2307/2103298
Article Google Scholar
Caeiro, C.C., Waller, B.M., Zimmermann, E., Burrows, A.M., Davila-Ross, M.: OrangFACS: A muscle-based facial movement coding system for orangutans (Pongo spp.). Int. J. Primatol. 34(1), 115–129 (2013). https://doi.org/10.1007/s10764-012-9652-x
Caliskan, A., Bryson, J., Narayanan, A.: Semantics derived automatically from language corpora contain human-like biases. Science 356(6334), 183–186 (2017). https://doi.org/10.1126/science.aal4230
Article Google Scholar
Cao, Z., Hidalgo Martinez, G., Simon, T., Wei, S., Sheikh, Y.A.: OpenPose: realtime multi-person 2d pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. (2019). https://doi.org/10.1109/TPAMI.2019.2929257
Article Google Scholar
Cavallo, A., Koul, A., Ansuini, C., Capozzi, F., Becchio, C.: Decoding intentions from movement kinematics. Sci. Rep. 6(1), 1–8 (2016). https://doi.org/10.1038/srep37036
Article Google Scholar
Chételat-Pelé, E., Braffort, A., Véronis, J.: Annotation of non manual gestures: Eyebrow movement description. In: sign-lang@ LREC 2008, pp. 28–32. European Language Resources Association (ELRA) (2008)
Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Measur. 20, 37–46 (1960). https://doi.org/10.1177/001316446002000104
Article Google Scholar
Contributors, M.: Openmmlab pose estimation toolbox and benchmark. https://github.com/open-mmlab/mmpose (2020)
Cormier, K., Crasborn, O., Bank, R.: Digging into signs: Emerging annotation standards for sign language corpora. In: Efthimiou, E., Fotinea, S.E., Hanke, T., Hochgesang, J.A., Kristoffersen, J., Mesch, J. (eds.) Proceedings of the LREC2016 7th Workshop on the Representation and Processing of Sign Languages: Corpus Mining, pp. 35–40. European Language Resources Association (ELRA), Portorož, Slovenia (May 2016)
Google Scholar
Crasborn, O., Bank, R.: An annotation scheme for the linguistic study of mouth actions in sign languages (2014). https://hdl.handle.net/2066/132960
Crasborn, O., Zwitserlood, I., van der Kooij, E., Ormel, E.: Global SignBank manual, version 2 (11 2020). https://doi.org/10.13140/RG.2.2.16205.67045/1
Cutler, A., Dahan, D., Van Donselaar, W.: Prosody in the comprehension of spoken language: a literature review. Lang. Speech 40(2), 141–201 (1997)
Article Google Scholar
Dale, R.: The possibility of a pluralist cognitive science. J. Exp. Theor. Artif. Intell. 20(3), 155–179 (2008). https://doi.org/10.1080/09528130802319078
Article Google Scholar
Dale, R., Warlaumont, A., Johnson, K.: The fundamental importance of method to theory. Nature Rev. Psychol. 2, 55–66 (2022). https://doi.org/10.1038/s44159-022-00120-5
Article Google Scholar
Danner, S.G., Barbosa, A.V., Goldstein, L.: Quantitative analysis of multimodal speech data. J. Phon. 71, 268–283 (2018). https://doi.org/10.1016/j.wocn.2018.09.007
Article Google Scholar
Dogdu, C., Kessler, T., Schneider, D., Shadaydeh, M., Schweinberger, S.R.: A comparison of machine learning algorithms and feature sets for automatic vocal emotion recognition in speech. Sensors 22(19), (2022). https://doi.org/10.3390/s22197561
Drimalla, H., Baskow, I., Behnia, B., Roepke, S., Dziobek, I.: Imitation and recognition of facial emotions in autism: A computer vision approach. Molecular Autism 12(1) (2021). https://doi.org/10.1186/s13229-021-00430-0
Ebert, C., Ebert, C.: Gestures, demonstratives, and the attributive/referential distinction. Talk at Semantics and Philosophy in Europe 7, ZAS, Berlin (2014)
Google Scholar
Ebert, C., Ebert, C., Hörnig, R.: Demonstratives as dimension shifters. Proc. Sinn und Bedeutung 24(1), 161–178 (2020)
Google Scholar
Ehinger, B.V., Dimigen, O.: Unfold: an integrated toolbox for overlap correction, non-linear modeling, and regression-based EEG analysis. PeerJ 7, e7838 (2019). https://doi.org/10.7717/peerj.7838
Article Google Scholar
von Eiff, C.I., Frühholz, S., Korth, D., Guntinas-Lichius, O., Schweinberger, S.R.: Crossmodal benefits to vocal emotion perception in cochlear implant users. iScience 25(12), 105711 (2022). https://doi.org/10.1016/j.isci.2022.105711
Ekman, P., Friesen, W.V.: Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, Palo Alto, CA (1978). https://doi.org/10.1037/t27734-000
Erard, M.: Why sign-language gloves don’t help deaf people. The Atlantic. https://www.theatlantic.com/technology/archive/2017/11/why-sign-language-gloves-dont-help-deaf-people/545441/ (2017)
Esteve-Gibert, N., Prieto, P.: Prosodic structure shapes the temporal realization of intonation and manual gesture movements. J. Speech Lang. Hear. Res. 56(3), 850–864 (2013). https://doi.org/10.1044/1092-4388(2012/12-0049)
Article Google Scholar
Fernandez-Lopez, A., Sukno, F.M.: Survey on automatic lip-reading in the era of deep learning. Image Vis. Comput. 78, 53–72 (2018). https://doi.org/10.1016/j.imavis.2018.07.002
Article Google Scholar
Ferstl, Y., Neff, M., McDonnell, R.: Understanding the predictability of gesture parameters from speech and their perceptual importance. In: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, pp. 1–8 (2020). https://doi.org/10.1145/3383652.3423882
Filippeschi, A., Schmitz, N., Miezal, M., Bleser, G., Ruffaldi, E., Stricker, D.: Survey of motion tracking methods based on inertial sensors: a focus on upper limb human motion. Sensors 17(6), 1257 (2017). https://doi.org/10.3390/s17061257
Article Google Scholar
Frühholz, S., Schweinberger, S.R.: Nonverbal auditory communication - evidence for integrated neural systems for voice signal production and perception. Prog. Neurobiol. 199, 101948 (2021). https://doi.org/10.1016/j.pneurobio.2020.101948
Article Google Scholar
Geng, J., Huang, D., De la Torre, F.: Densepose from wifi. arXiv preprint arXiv:2301.00250 (2022)
Gerloff, C., Konrad, K., Kruppa, J., Schulte-Rüther, M., Reindl, V.: Autism Spectrum Disorder Classification Based on Interpersonal Neural Synchrony: Can Classification be Improved by Dyadic Neural Biomarkers Using Unsupervised Graph Representation Learning? In: Abdulkadir, A., et al. (eds.) Machine Learning in Clinical Neuroimaging, vol. 13596, pp. 147–157. Springer Nature Switzerland, Cham (2022). https://doi.org/10.1007/978-3-031-17899-3_15
Ginosar, S., Bar, A., Kohavi, G., Chan, C., Owens, A., Malik, J.: Learning individual styles of conversational gesture. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3497–3506 (2019)
Google Scholar
Ginzburg, J., Poesio, M.: Grammar is a system that characterizes talk in interaction. Front. Psychol. 7, 1938 (2016). https://doi.org/10.3389/fpsyg.2016.01938
Article Google Scholar
Goodman, N.: Languages of Art, 2nd edn. An Approach to a Theory of Symbols. Hackett Publishing Company Inc, Idianapolis (1976)
Book Google Scholar
Goodwin, C.: Pointing as situated practice. In: Kita, S. (ed.) Pointing: Where Language, Culture, and Cognition Meet, chap. 2, pp. 217–241. Lawrence Erlbaum Associates Inc, Mahwah, New Jersey (2003). https://doi.org/10.4324/9781410607744
Gregori, A., Kügler, F.: Multimodal marking of focus: Articulatory and visual hyperarticulation (submitted)
Google Scholar
Gussenhoven, C.: The phonology of tone and intonation. Cambridge: Cambridge University Press (2004). https://doi.org/10.1017/CBO9780511616983
Gwet, K.: Handbook of Inter-Rater Reliability. STATAXIS Publishing Company, Gaithersburg, MD (2001)
Google Scholar
Hammadi, Y., Grondin, F., Ferland, F., Lebel, K.: Evaluation of various state of the art head pose estimation algorithms for clinical scenarios. Sensors 22(18), 6850 (2022). https://doi.org/10.3390/s22186850
Article Google Scholar
Hanke, T.: HamNoSys - representing sign language data in language resources and language processing contexts. In: LREC. vol. 4, pp. 1–6 (2004)
Google Scholar
Hartz, A., Guth, B., Jording, M., Vogeley, K., Schulte-Rüther, M.: Temporal behavioral parameters of on-going gaze encounters in a virtual environment. Front. Psychol. 12 (2021). https://doi.org/10.3389/fpsyg.2021.673982
Herrmann, A., Pendzich, N.K.: Nonmanual gestures in sign languages. In: Müller, C., Cienki, A., Fricke, E., Ladewig, S.H., McNeill, D., Bressem, J. (eds.) Handbook Body - Language - Communication, pp. 2147–2160. DeGruyter Mouton, Berlin, Boston (2014)
Google Scholar
Hobaiter, C., Byrne, R.W.: The meanings of chimpanzee gestures. Curr. Biol. 24, 1596–1600 (2014)
Article Google Scholar
Holler, J., Levinson, S.C.: Multimodal language processing in human communication. Trends Cogn. Sci. 23(8), 639–652 (2019). https://doi.org/10.1016/j.tics.2019.05.006
Article Google Scholar
Hosemann, J., Herrmann, A., Steinbach, M., Bornkessel-Schlesewsky, I., Schlesewsky, M.: Lexical prediction via forward models: N400 evidence from German sign language. Neuropsychologia 51(11), 2224–2237 (2013). https://doi.org/10.1016/j.neuropsychologia.2013.07.013
Article Google Scholar
Höhle, T.N.: Über Komposition und Derivation: zur Konstituentenstruktur von Wortbildungsprodukten im Deutschen. Z. Sprachwiss. 1(1), 76–112 (1982). https://doi.org/10.1515/zfsw.1982.1.1.76
Article Google Scholar
Ide, N., Pustejovsky, J. (eds.): Handbook of Linguistic Annotation. Springer, Netherlands, Dordrecht (2017). https://doi.org/10.1007/978-94-024-0881-2_1
Ienaga, N., Cravotta, A., Terayama, K., Scotney, B.W., Saito, H., Busà, M.G.: Semi-automation of gesture annotation by machine learning and human collaboration. Language Resources and Evaluation, pp. 1–28 (2022). https://doi.org/10.1007/s10579-022-09586-4
Jaimes, A., Sebe, N.: Multimodal human-computer interaction: a survey. Comput. Vis. Image Underst. 108(1), 116–134 (2007). https://doi.org/10.1016/j.cviu.2006.10.019
Article Google Scholar
Jiang, Z., Moryossef, A., Müller, M., Ebling, S.: Machine translation between spoken languages and signed languages represented in signwriting. arXiv preprint arXiv:2210.05404 (2022). https://doi.org/10.48550/arXiv.2210.05404
Jun, S.A.: The ToBI transcription system: conventions, strengths, and challenges. In: Barnes, J., Shattuck-Hufnagel, S. (eds.) Prosodic Theory and Practice, pp. 151–181. MIT Press, Cambridge (2022)
Chapter Google Scholar
Kano, F., Tomonaga, M.: How chimpanzees look at pictures: a comparative eye-tracking study. Proc. Royal Society B: Biol. Sci. 276(1664), 1949–1955 (2009)
Article Google Scholar
Kelly, S., Healey, M., Özyürek, A., Holler, J.: The processing of speech, gesture, and action during language comprehension. Psychonom. Bull. Rev. 22(2), 517–523 (2014). https://doi.org/10.3758/s13423-014-0681-7
Article Google Scholar
Kempson, R., Cann, R., Gregoromichelaki, E., Chatzikyriakidis, S.: Language as mechanisms for interaction. Theor. Linguist. 42(3–4), 203–276 (2016). https://doi.org/10.1515/tl-2016-0011
Article Google Scholar
Kendon, A.: Some relationships between body motion and speech. An analysis of an example. In: Siegman, A.W., Pope, B. (eds.) Studies in Dyadic Communication, chap. 9, pp. 177–210. Pergamon Press, Elmsford, NY (1972)
Google Scholar
Kendon, A.: Gesticulation and speech: Two aspects of the process of utterance. In: Key, M.R. (ed.) The Relationship of Verbal and Nonverbal Communication, pp. 207–227. No. 25 in Contributions to the Sociology of Language, Mouton, The Hague (1980)
Google Scholar
Kendon, A.: Gesture: Visible Action as Utterance. Cambridge University Press, Cambridge, MA (2004). https://doi.org/10.1017/CBO9780511807572
Khalil, R.A., Jones, E., Babar, M.I., Jan, T., Zafar, M.H., Alhussain, T.: Speech emotion recognition using deep learning techniques: a review. IEEE Access 7, 117327–117345 (2019). https://doi.org/10.1109/ACCESS.2019.2936124
Article Google Scholar
Khasbage, Y., et al.: The Red Hen Anonymizer and the Red Hen Protocol for de-identifying audiovisual recordings. Linguist. Vanguard (0) (2022). https://doi.org/10.1515/lingvan-2022-0017
Kipp, M.: Anvil-a generic annotation tool for multimodal dialogue. In: Seventh European Conference on Speech Communication and Technology, pp. 2001–354 (2001). https://doi.org/10.21437/Eurospeech.
Kipp, M., Neff, M., Albrecht, I.: An annotation scheme for conversational gestures: how to economically capture timing and form. J. Lang. Resour. Eval. - Special Issue Multimodal Corpora 41(3–4), 325–339 (2007). https://doi.org/10.1007/s10579-007-9053-5
Article Google Scholar
Kowallik, A.E., Schweinberger, S.R.: Sensor-based technology for social information processing in autism: a review. Sensors 19(21), 4787 (2019). https://doi.org/10.3390/s19214787
Article Google Scholar
Krauss, R.M., Hadar, U.: The role of speech-related arm/hand gestures in word retrieval. In: Campbell, R., Messing, L.S. (eds.) Gesture, speech, and sign, pp. 93–116. Oxford University Press, Oxford (1999). https://doi.org/10.1093/acprof:oso/9780198524519.003.0006
Krippendorff, K.: Content Analysis: An Introduction to Its Methodology, 4th edn. SAGE Publications, Thousand Oaks, CA (2018)
Google Scholar
Kruppa, J.A., et al.: Brain and motor synchrony in children and adolescents with ASD-a fNIRS hyperscanning study. Social Cogn. Affect. Neurosci.16(1–2), 103–116 (07 2020). https://doi.org/10.1093/scan/nsaa092
Kubina, P., Abramov, O., Lücking, A.: Barrier-free communication. In: Mehler, A., Romary, L. (eds.) Handbook of Technical Communication, chap. 19, pp. 645–706. No. 8 in Handbooks of Applied Linguistics, De Gruyter Mouton, Berlin and Boston (2012)
Google Scholar
Kuhnke, P., Beaupain, M.C., Arola, J., Kiefer, M., Hartwigsen, G.: Meta-analytic evidence for a novel hierarchical model of conceptual processing. Neurosci. Biobehav. Rev. 144, 104994 (2023). https://doi.org/10.1016/j.neubiorev.2022.104994
Article Google Scholar
Köpüklü, O., Gunduz, A., Kose, N., Rigoll, G.: Real-time hand gesture detection and classification using convolutional neural networks. In: Proceedings of the 14th IEEE International Conference on Automatic Face & Gesture Recognition, pp. 1–8. FG 2019 (2019). https://doi.org/10.1109/FG.2019.8756576
Ladd, D.: Intonational phonology. Cambridge: Cambridge University Press, 2 edn. (2012). https://doi.org/10.1017/CBO9780511808814
Ladefoged, P.: The revised international phonetic alphabet. Language 66(3), 550–552 (1990). https://doi.org/10.2307/414611
Article Google Scholar
Lascarides, A., Stone, M.: Discourse coherence and gesture interpretation. Gesture 9(2), 147–180 (2009). https://doi.org/10.1075/gest.9.2.01las
Article Google Scholar
Latash, M.L.: Synergy. Oxford University Press (2008). https://doi.org/10.1093/acprof:oso/9780195333169.001.0001
Lausberg, H., Sloetjes, H.: Coding gestural behavior with the neuroges-elan system. Behav. Res. Methods 41(3), 841–849 (2009). https://doi.org/10.3758/BRM.41.3.841
Article Google Scholar
Levelt, W.J.M.: Monitoring and self-repair in speech. Cognition 14(1), 41–104 (1983). https://doi.org/10.1016/0010-0277(83)90026-4
Article Google Scholar
Li, S., Deng, W.: Deep facial expression recognition: a survey. IEEE Trans. Affect. Comput. 13(3), 1195–1215 (2020). https://doi.org/10.1109/TAFFC.2020.2981446
Article MathSciNet Google Scholar
Liebal, K., Oña, L.: Different approaches to meaning in primate gestural and vocal communication. Front. Psychol. 9, 478 (2018)
Article Google Scholar
Liebherr, M., et al.: Eeg and behavioral correlates of attentional processing while walking and navigating naturalistic environments. Sci. Rep. 11(1), 1–13 (2021). https://doi.org/10.1038/s41598-021-01772-8
Article Google Scholar
Liszkowski, U., Brown, P., Callaghan, T., Takada, A., De Vos, C.: A prelinguistic gestural universal of human communication. Cogn. Sci. 36(4), 698–713 (2012). https://doi.org/10.1111/j.1551-6709.2011.01228.x
Article Google Scholar
Loehr, D.P.: Temporal, structural, and pragmatic synchrony between intonation and gesture. Lab. Phonol.: J. Assoc. Lab. Phonol. 3(1), 71–89 (2012). https://doi.org/10.1515/lp-2012-0006
Article Google Scholar
Lopez, A., Liesenfeld, A., Dingemanse, M.: Evaluation of automatic speech recognition for conversational speech in Dutch, English, and German: What goes missing? In: Proceedings of the 18th Conference on Natural Language Processing, pp. 135–143. KONVENS 2022 (2022)
Google Scholar
Lozano-Goupil, J., Raffard, S., Capdevielle, D., Aigoin, E., Marin, L.: Gesture-speech synchrony in schizophrenia: a pilot study using a kinematic-acoustic analysis. Neuropsychologia 174, 108347 (2022). https://doi.org/10.1016/j.neuropsychologia.2022.108347
Article Google Scholar
Lücking, A.: Gesture. In: Müller, S., Abeillé, A., Borsley, R.D., Koenig, J.P. (eds.) Head Driven Phrase Structure Grammar: The handbook, chap. 27, pp. 1201–1250. No. 9 in Empirically Oriented Theoretical Morphology and Syntax, Language Science Press, Berlin (2021). https://doi.org/10.5281/zenodo.5543318
Lücking, A., Bergman, K., Hahn, F., Kopp, S., Rieser, H.: Data-based analysis of speech and gesture: the Bielefeld speech and gesture alignment corpus (SaGA) and its applications. J. Multimodal User Interfaces 7(1), 5–18 (2013)
Article Google Scholar
Lücking, A., Mehler, A., Menke, P.: Taking fingerprints of speech-and-gesture ensembles: Approaching empirical evidence of intrapersonal alignmnent in multimodal communication. In: Proceedings of the 12th Workshop on the Semantics and Pragmatics of Dialogue, pp. 157–164. LonDial’08, King’s College London (2008)
Google Scholar
Lücking, A., Ptock, S., Bergmann, K.: Assessing agreement on segmentations by means of Staccato, the Segmentation Agreement Calculator according to Thomann. In: Efthimiou, E., Kouroupetroglou, G., Fotina, S.E. (eds.) Gesture and Sign Language in Human-Computer Interaction and Embodied Communication, pp. 129–138. No. 7206 in LNAI, Springer, Berlin and Heidelberg (2012). https://doi.org/10.1007/978-3-642-34182-3_12
MacWhinney, B.: The CHILDES Project: Tools for Analyzing Talk, 3rd edn. Lawrence Erlbaum Associates, Mahwah, NJ (2000)
Google Scholar
Magnee, M., Stekelenburg, J.J., Kemner, C., de Gelder, B.: Similar facial electromyographic responses to faces, voices, and body expressions. NeuroReport 18(4), 369–372 (2007). https://doi.org/10.1097/WNR.0b013e32801776e6
Article Google Scholar
Marschik, P.B., et al.: Open video data sharing in developmental and behavioural science (2022). 10.48550/ARXIV.2207.11020
Google Scholar
Mathis, A., et al.: DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat. Neurosci. 21(9), 1281–1289 (2018). https://doi.org/10.1038/s41593-018-0209-y
Article Google Scholar
McNeill, D.: Hand and Mind - What Gestures Reveal about Thought. Chicago University Press, Chicago (1992). https://doi.org/10.2307/1576015
McNeill, D.: Gesture: A psycholinguistic approach. In: Brown, K. (ed.) The encyclopedia of language and linguistics, pp. 58–66. Elsevier (2006)
Google Scholar
Mehler, A., Lücking, A.: Pathways of alignment between gesture and speech: Assessing information transmission in multimodal ensembles. In: Giorgolo, G., Alahverdzhieva, K. (eds.) Proceedings of the International Workshop on Formal and Computational Approaches to Multimodal Communication under the auspices of ESSLLI 2012, Opole, Poland, 6–10 August (2012)
Google Scholar
Mlakar, I., Verdonik, D., Majhenič, S., Rojc, M.: Understanding conversational interaction in multiparty conversations: the EVA Corpus. Lang. Resour. Eval. (2022). https://doi.org/10.1007/s10579-022-09627-y
Article Google Scholar
Monarch, R.M.: Human-in-the-Loop Machine Learning: Active learning and annotation for human-centered AI. Simon and Schuster (2021)
Google Scholar
Mondada, L.: The local constitution of multimodal resources for social interaction. J. Pragmat. 65, 137–156 (2014). https://doi.org/10.1016/j.pragma.2014.04.004
Article Google Scholar
Mondada, L.: Challenges of multimodality: language and the body in social interaction. J. Socioling. 20(3), 336–366 (2016). https://doi.org/10.1111/josl.1_12177
Article Google Scholar
Montague, P.: Hyperscanning: simultaneous fMRI during linked social interactions. Neuroimage 16(4), 1159–1164 (2002). https://doi.org/10.1006/nimg.2002.1150
Article Google Scholar
Morgenstern, A., Caët, S.: Signes en famille [corpus] (2021)
Google Scholar
Munea, T.L., Jembre, Y.Z., Weldegebriel, H.T., Chen, L., Huang, C., Yang, C.: The progress of human pose estimation: a survey and taxonomy of models applied in 2d human pose estimation. IEEE Access 8, 133330–133348 (2020). https://doi.org/10.1109/ACCESS.2020.3010248
Article Google Scholar
Narayanan, S., et al.: Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research (TC). J. Acoust. Society Am. 136, 1307 (2014). https://doi.org/10.1121/1.4890284
Article Google Scholar
Nenna, F., Do, C.T., Protzak, J., Gramann, K.: Alteration of brain dynamics during dual-task overground walking. Eur. J. Neurosci. 54(12), 8158–8174 (2021). https://doi.org/10.1111/ejn.14956
Article Google Scholar
Ng, E., Ginosar, S., Darrell, T., Joo, H.: Body2hands: Learning to infer 3d hands from conversational gesture body dynamics. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11865–11874 (2021)
Google Scholar
Nguyen, T., Schleihauf, H., Kayhan, E., Matthes, D., Vrtička, P., Hoehl, S.: The effects of interaction quality on neural synchrony during mother-child problem solving. Cortex 124, 235–249 (2020). https://doi.org/10.1016/j.cortex.2019.11.020
Article Google Scholar
Noah, J.A., et al.: fMRI Validation of fNIRS Measurements During a Naturalistic Task. J. Visualized Experiments 100, 52116 (Jun 2015). https://doi.org/10.3791/52116
Nyatsanga, S., Kucherenko, T., Ahuja, C., Henter, G.E., Neff, M.: A comprehensive review of data-driven co-speech gesture generation. arXiv preprint 2301.05339 (2023). https://doi.org/10.48550/arXiv.2301.05339
Núñez, R., Allen, M., Gao, R., Miller Rigoli, C., Relaford-Doyle, J., Semenuks, A.: What happened to cognitive science? Nat. Hum. Behav. 3(8), 782–791 (2019). https://doi.org/10.1038/s41562-019-0626-2
Article Google Scholar
Offrede, T., Fuchs, S., Mooshammer, C.: Multi-speaker experimental designs: methodological considerations. Lang. Linguist. Compass 15(12), e12443 (2021). https://doi.org/10.1111/lnc3.12443
Article Google Scholar
Oudah, M., Al-Naji, A., Chahl, J.: Hand gesture recognition based on computer vision: a review of techniques. J. Imag. 6(8), 73 (2020). https://doi.org/10.3390/jimaging6080073
Oviatt, S.: Ten myths of multimodal interaction. Commun. ACM 42(11), 74–81 (1999). https://doi.org/10.1145/319382.319398
Article Google Scholar
Owoyele, B., Trujillo, J., De Melo, G., Pouw, W.: Masked-Piper: masking personal identities in visual recordings while preserving multimodal information. SoftwareX 20, 101236 (2022). https://doi.org/10.1016/j.softx.2022.101236
Article Google Scholar
PaddlePaddle: PaddleDetection, object detection and instance segmentation toolkit based on PaddlePaddle. https://github.com/PaddlePaddle/PaddleDetection (2019)
Paggio, P., Navarretta, C.: Integration and representation issues in the annotation of multimodal data. In: Navarretta, C., Paggio, P., Allwood, J., Alsén, E., Katagiri, Y. (eds.) Proceedings of the NODALIDA 2009 workshop: Multimodal Communication - from Human Behaviour to Computational Models, pp. 25–31. Northern European Association for Language Technology (2009)
Google Scholar
Pan, X.N., Hamilton, A.F.D.: Why and how to use virtual reality to study human social interaction: the challenges of exploring a new research landscape. Br. J. Psychol. 109(3), 395–417 (2018). https://doi.org/10.1111/bjop.12290
Article Google Scholar
Pan, Y., Cheng, X., Zhang, Z., Li, X., Hu, Y.: Cooperation in lovers: an fNIRS-based hyperscanning study: cooperation in lovers. Hum. Brain Mapp. 38(2), 831–841 (2017). https://doi.org/10.1002/hbm.23421
Article Google Scholar
Paquot, M., Gries, S.T.: A practical handbook of corpus linguistics. Springer Nature (2021)
Google Scholar
Parisi, G.I., Kemker, R., Part, J.L., Kanan, C., Wermter, S.: Continual lifelong learning with neural networks: a review. Neural Netw. 113, 54–71 (2019). https://doi.org/10.1016/j.neunet.2019.01.012
Article Google Scholar
Parr, L., Waller, B., Burrows, A., Gothard, K., Vick, S.J.: Brief communication: MaqFACS: a muscle-based facial movement coding system for the rhesus macaque. Am. J. Phys. Anthropol. 143(4), 625–630 (2010)
Article Google Scholar
Peer, A., Ullich, P., Ponto, K.: Vive tracking alignment and correction made easy. In: 2018 IEEE conference on virtual reality and 3D user interfaces (VR), pp. 653–654. IEEE (2018). https://doi.org/10.1109/VR.2018.8446435
Peikert, A., Brandmaier, A.M.: A Reproducible Data Analysis Workflow With R Markdown, Git, Make, and Docker. Quantitative and Computational Methods in Behavioral Sciences, pp. 1–27 (2021). https://doi.org/10.5964/qcmb.3763
Perniss, P.: Why we should study multimodal language. Front. Psychol. 9, 1109 (2018). https://doi.org/10.3389/fpsyg.2018.01109
Article Google Scholar
Pezzulo, G., Donnarumma, F., Dindo, H., D’Ausilio, A., Konvalinka, I., Castelfranchi, C.: The body talks: sensorimotor communication and its brain and kinematic signatures. Phys. Life Rev. 28, 1–21 (2019). https://doi.org/10.1016/j.plrev.2018.06.014
Article Google Scholar
Pickering, M.J., Garrod, S.: An integrated theory of language production and comprehension. Behav. Brain Sci. 4, 329–347 (2013). https://doi.org/10.1017/s0140525x12001495
Article Google Scholar
Pierrehumbert, J.B.: The phonology and phonetics of English intonation. Ph.D. thesis, Massachusetts Institute of Technology (1980)
Google Scholar
Pinti, P., et al.: The present and future use of functional near-infrared spectroscopy (fNIRS) for cognitive neuroscience. Ann. N. Y. Acad. Sci. 1464(1), 5–29 (2020). https://doi.org/10.1111/nyas.13948
Article Google Scholar
Posner, R., Robering, K., Sebeok, T.A., Wiegand, H.E. (eds.): Semiotik : ein Handbuch zu den zeichentheoretischen Grundlagen von Natur und Kultur = Semiotics. No. 13 in Handbücher zur Sprach- und Kommunikationswissenschaft, de Gruyter, Berlin (1997)
Google Scholar
Pouw, W., Dingemanse, M., Motamedi, Y., Özyürek, A.: A systematic investigation of gesture kinematics in evolving manual languages in the lab. Cogn. Sci. 45(7), e13014 (2021). https://doi.org/10.1111/cogs.13014
Article Google Scholar
Pouw, W., Dixon, J.A.: Gesture networks: introducing dynamic time warping and network analysis for the kinematic study of gesture ensembles. Discourse Process. 57(4), 301–319 (2020). https://doi.org/10.1080/0163853X.2019.1678967
Article Google Scholar
Pouw, W., Fuchs, S.: Origins of vocal-entangled gesture. Neuroscience & Biobehavioral Reviews, p. 104836 (2022). https://doi.org/10.1016/j.neubiorev.2022.104836
Power, S.D., Falk, T.H., Chau, T.: Classification of prefrontal activity due to mental arithmetic and music imagery using hidden Markov models and frequency domain near-infrared spectroscopy. J. Neural Eng. 7(2), 026002 (2010). https://doi.org/10.1088/1741-2560/7/2/026002
Article Google Scholar
Prieto, P.: Intonational meaning. WIRES. Cogn. Sci. 6(4), 371–381 (2015). https://doi.org/10.1002/wcs.1352
Article Google Scholar
Prillwitz, S., Hanke, T., König, S., Konrad, R., Langer, G., Schwarz, A.: DGS corpus project-development of a corpus based electronic dictionary German Sign Language/German. In: sign-lang@ LREC 2008, pp. 159–164. European Language Resources Association (ELRA) (2008)
Google Scholar
Quer, J., Pfau, R., Herrmann, A.: The Routledge Handbook of Theoretical and Experimental Sign Language Research. Routledge (2021)
Google Scholar
Rachow, M., Karnowski, T., O’Toole, A.J.: Identity masking effectiveness and gesture recognition: Effects of eye enhancement in seeing through the mask. arXiv preprint 2301.08408 (2023). 10.48550/arXiv. 2301.08408
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Google Scholar
Radford, A., Kim, J.W., Xu, T., Brockman, G., McLeavey, C., Sutskever, I.: Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2212.04356 (2022). 10.48550/arXiv. 2212.04356
Ramscar, M., Port, R.F.: How spoken languages work in the absence of an inventory of discrete units. Lang. Sci. 53, 58–74 (2016). https://doi.org/10.1016/j.langsci.2015.08.002
Article Google Scholar
Ren, Y., Wang, Z., Wang, Y., Tan, S., Chen, Y., Yang, J.: Gopose: 3d human pose estimation using wifi 6(2) (jul 2022). https://doi.org/10.1145/3534605
Richard, A., Zollhöfer, M., Wen, Y., de la Torre, F., Sheikh, Y.: Meshtalk: 3d face animation from speech using cross-modality disentanglement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1173–1182 (October 2021)
Google Scholar
Rieser, H., Lawler, I.: Multi-modal meaning - an empirically-founded process algebra approach. Semantics Pragmatics 13(8), 1–55 (2020). https://doi.org/10.3765/sp.13.8
Article Google Scholar
Ripperda, J., Drijvers, L., Holler, J.: Speeding up the detection of non-iconic and iconic gestures (SPUDNIG): a toolkit for the automatic detection of hand movements and gestures in video data. Behav. Res. Methods 52(4), 1783–1794 (2020). https://doi.org/10.3758/s13428-020-01350-2
Article Google Scholar
Rohrer, P.: A temporal and pragmatic analysis of gesture-speech association: A corpus-based approach using the novel MultiModal MultiDimensional (M3D) labeling system. Ph.D. thesis (2022)
Google Scholar
Rohrer, P.L., et al.: The multimodal multidimensional (m3d) labeling system (Jan 2023). https://doi.org/10.17605/OSF.IO/ANKDX
Sassenhagen, J.: How to analyse electrophysiological responses to naturalistic language with time-resolved multiple regression. Lang., Cogn. Neurosci. 34(4), 474–490 (2019). https://doi.org/10.1080/23273798.2018.1502458
Article Google Scholar
Schegloff, E.A.: On some gestures’ relation to talk. In: Atkinson, J.M., Heritage, J. (eds.) Structures of Social Action. Studies in Conversational Analysis, chap. 12, pp. 266–296. Studies in Emotion and Social Interaction, Cambridge University Press, Cambridge, MA (1984)
Google Scholar
Schmidt, T., Wörner, K.: EXMARaLDA - creating, analysing and sharing spoken language corpora for pragmatic research. Pragmatics 19(4), 565–582 (2009)
Google Scholar
Scholkmann, F., et al.: A review on continuous wave functional near-infrared spectroscopy and imaging instrumentation and methodology. NeuroImage 85, 6–27 (2014). https://doi.org/10.1016/j.neuroimage.2013.05.004, https://linkinghub.elsevier.com/retrieve/pii/S1053811913004941
Schuller, B.W.: Speech emotion recognition: two decades in a nutshell, benchmarks, and ongoing trends. Commun. ACM 61(5), 90–99 (2018). https://doi.org/10.1145/3129340
Article Google Scholar
Schulte-Ruether, M., et al.: Using machine learning to improve diagnostic assessment of ASD in the light of specific differential and co-occurring diagnoses. J. Child Psychol. Psychiatry 64(1), 16–26 (2023). https://doi.org/10.1111/jcpp.13650
Article Google Scholar
Schulte-Ruether, M., et al.: Intact mirror mechanisms for automatic facial emotions in children and adolescents with autism spectrum disorder. Autism Res. 10(2), 298–310 (2017). https://doi.org/10.1002/aur.1654
Article Google Scholar
Selting, M., Auer, P., et al.: Gesprächsanalytisches Transkriptionssystem 2 (GAT 2). Gesprächsforschung - Online-Zeitschrift zur verbalen Interaktion 10, 353–402 (2009). https://www.gespraechsforschung-ozs.de
Shattuck-Hufnagel, S., Turk, A.E.: A prosody tutorial for investigators of auditory sentence processing. J. Psycholinguist. Res. 25, 193–247 (1996)
Article Google Scholar
Shattuck-Hufnagel, S., Yasinnik, Y., Veilleux, N., Renwick, M.: A method for studying the time-alignment of gestures and prosody in American English: ‘Hits’ and pitch accents in academic-lecture-style speech. In: Esposito, A., Bratanic, M., Keller, E., Marinaro, M. (eds.) Fundamentals of Verbal And Nonverbal Communication And The Biometric Issue, pp. 34–44. IOS Press, Amsterdam (2007)
Google Scholar
Shoemark, P., Liza, F.F., Nguyen, D., Hale, S., McGillivray, B.: Room to Glo: A systematic comparison of semantic change detection approaches with word embeddings. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 66–76. EMNLP-IJCNLP, Association for Computational Linguistics, Hong Kong, China (2019). https://doi.org/10.18653/v1/D19-1007
Sitaram, R., et al.: Temporal classification of multichannel near-infrared spectroscopy signals of motor imagery for developing a brain-computer interface. Neuroimage 34(4), 1416–1427 (2007). https://doi.org/10.1016/j.neuroimage.2006.11.005
Article Google Scholar
Streeck, J.: Gesture as communication I: its coordination with gaze and speech. Commun. Monogr. 60(4), 275–299 (1993)
Article Google Scholar
Struckmeier, V.: Attribute im Deutschen: Zu ihren Eigenschaften und ihrer Position im grammatischen System. No. 65 in studia grammatica, Akademie Verlag, Berlin (2007)
Google Scholar
Thomann, B.: Oberservation and judgment in psychology: assessing agreement among markings of behavioral events. Behav. Res. Methods, Instruments, Comput. 33(3), 248–339 (2001)
Article Google Scholar
Tiku, N.: The Google engineer who thinks the company’s AI has come to life (2022)
Google Scholar
Tkachman, O., Hall, K.C., Xavier, A., Gick, B.: Sign language phonetic annotation meets phonological corpustools: Towards a sign language toolset for phonetic notation and phonological analysis. In: Proceedings of the Annual Meetings on Phonology, vol. 3 (2016)
Google Scholar
Torricelli, F., Tomassini, A., Pezzulo, G., Pozzo, T., Fadiga, L., D’Ausilio, A.: Motor invariants in action execution and perception. Physics of Life Reviews (2022)
Google Scholar
Trettenbrein, P.C., Pendzich, N.-K., Cramer, J.-M., Steinbach, M., Zaccarella, E.: Psycholinguistic norms for more than 300 lexical signs in German Sign Language (DGS). Behav. Res. Methods 53(5), 1817–1832 (2020). https://doi.org/10.3758/s13428-020-01524-y
Article Google Scholar
Trettenbrein, P.C., Papitto, G., Friederici, A.D., Zaccarella, E.: Functional neuroanatomy of language without speech: an ale meta-analysis of sign language. Hum. Brain Mapp. 42(3), 699–712 (2021). https://doi.org/10.1002/hbm.25254
Article Google Scholar
Trettenbrein, P.C., Zaccarella, E.: Controlling video stimuli in sign language and gesture research: the openposer package for analyzing openpose motion-tracking data in r. Front. Psychol. 12 (2021). https://doi.org/10.3389/fpsyg.2021.628728
Trujillo, J.P., Holler, J.: Interactionally embedded gestalt principles of multimodal human communication. Perspectives on Psychological Science 17456916221141422 (2023)
Google Scholar
Trujillo, J.P., Simanova, I., Bekkering, H., Özyürek, A.: Communicative intent modulates production and comprehension of actions and gestures: A Kinect study. Cognition 180, 38–51 (2018)
Article Google Scholar
Uddén, J.: Supramodal Sentence Processing in the Human Brain: fMRI Evidence for the Influence of Syntactic Complexity in More Than 200 Participants. Neurobiol. Lang. 3(4), 575–598 (2022). https://doi.org/10.1162/nol_a_00076
Article Google Scholar
Uljarevic, M., Hamilton, A.: Recognition of emotions in autism: a formal meta-analysis. J. Autism Dev. Disord. 43(7), 1517–1526 (2013). https://doi.org/10.1007/s10803-012-1695-5
Article Google Scholar
Valtakari, N.V., Hooge, I.T.C., Viktorsson, C., Nyström, P., Falck-Ytter, T., Hessels, R.S.: Eye tracking in human interaction: possibilities and limitations. Behav. Res. Methods 53(4), 1592–1608 (2021). https://doi.org/10.3758/s13428-020-01517-x
Article Google Scholar
Vick, S.J., Waller, B.M., Parr, L.A., Smith Pasqualini, M.C., Bard, K.A.: A cross-species comparison of facial morphology and movement in humans and chimpanzees using the facial action coding system (FACS). J. Nonverbal Behav. 31(1), 1–20 (2007)
Article Google Scholar
Vilhjálmsson, H., et al.: The behavior markup language: Recent developments and challenges. In: Pelachaud, C., Martin, J.C., André, E., Chollet, G., Karpouzis, K., Pelé, D. (eds.) Intelligent Virtual Agents. pp. 99–111. Springer, Berlin and Heidelberg (2007). https://doi.org/10.1007/978-3-540-74997-4_10
Waller, B.M., Lembeck, M., Kuchenbuch, P., Burrows, A.M., Liebal, K.: GibbonFACS: a muscle-based facial movement coding system for hylobatids. Int. J. Primatol. 33(4), 809–821 (2012)
Article Google Scholar
Weiss, K., Khoshgoftaar, T.M., Wang, D.D.: A survey of transfer learning. J. Big Data 3(1), 1–40 (2016). https://doi.org/10.1186/s40537-016-0043-6
Article Google Scholar
Winkler, A., Won, J., Ye, Y.: Questsim: Human motion tracking from sparse sensors with simulated avatars. In: SIGGRAPH Asia 2022 Conference Papers, pp. 1–8 (2022). https://doi.org/10.1145/3550469.3555411
Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., Sloetjes, H.: Elan: A professional framework for multimodality research. In: 5th International Conference on Language Resources and Evaluation (LREC 2006), pp. 1556–1559 (2006), https://hdl.handle.net/11858/00-001M-0000-0013-1E7E-4
Wu, X., Xiao, L., Sun, Y., Zhang, J., Ma, T., He, L.: A survey of human-in-the-loop for machine learning. Futur. Gener. Comput. Syst. (2022). https://doi.org/10.1016/j.future.2022.05.014
Article Google Scholar
Youmshajekian, L.: Springer nature retracts chapter on sign language deaf scholars called “extremely offensive”. Retraction Watch. https://retractionwatch.com/2023/01/23/springer-nature-retracts-chapter-on-sign-language-deaf-scholars-called-extremely-offensive/ (2023)
Young, A.W., Frühholz, S., Schweinberger, S.R.: Face and voice perception: understanding commonalities and differences. Trends Cogn. Sci. 24(5), 398–410 (2020). https://doi.org/10.1016/j.tics.2020.02.001
Article Google Scholar
Yu, C., Ballard, D.H.: A multimodal learning interface for grounding spoken language in sensory perceptions. ACM Trans. Appl. Percept. 1(1), 57–80 (2004). https://doi.org/10.1145/1008722.1008727
Article Google Scholar
Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., Xiao, J.: Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015)
Yunus, F., Clavel, C., Pelachaud, C.: Sequence-to-sequence predictive model: From prosody to communicative gestures. In: Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management. Human Body, Motion and Behavior: 12th International Conference, DHM 2021, Held as Part of the 23rd HCI International Conference, HCII 2021, Virtual Event, July 24–29, 2021, Proceedings, Part I, pp. 355–374. Springer (2021). https://doi.org/10.1007/978-3-030-77817-0_25
Zeng, Q., Zheng, G., Liu, Q.: Pe-dls: a novel method for performing real-time full-body motion reconstruction in vr based on vive trackers. Virtual Reality, pp. 1–17 (2022). https://doi.org/10.1007/s10055-022-00635-5
Zhang, H.B., et al.: A comprehensive survey of vision-based human action recognition methods. Sensors 19(5), 1005 (2019). https://doi.org/10.3390/s19051005
Article Google Scholar
Zhou, H., Hu, H.: Human motion tracking for rehabilitation-a survey. Biomed. Signal Process. Control 3(1), 1–18 (2008). https://doi.org/10.1016/j.bspc.2007.09.001
Article Google Scholar

Download references

Author information

Authors and Affiliations

Goethe University Frankfurt/M., Frankfurt am Main, Germany
Alina Gregori, Lennart Fritzsche, Alexander Henlein, Frank Kügler, Andy Lücking, Alexander Mehler, Kim Tien Nguyen, Paula G. Sánchez-Ramón & Volker Struckmeier
University of Leipzig, Leipzig, Germany
Federica Amici & Katja Liebal
University of Cologne, Cologne, Germany
Ingmar Brilmayer & Petra B. Schumacher
Leibniz Centre General Linguistics, Berlin, Germany
Aleksandra Ćwiek & Susanne Fuchs
Julius-Maximilians-University of Würzburg, Würzburg, Germany
Oliver Herbort
WWU Münster, Münster, Germany
Jens Lemanski
University of Hagen, Hagen, Germany
Jens Lemanski
Donders Institute for Brain, Cognition, and Behaviour, Radboud University Nijmegen, Nijmegen, The Netherlands
Wim Pouw
ICREA (Institució de Recerca i Estudis Avançats), Barcelona, Spain
Pilar Prieto
Universitat Pompeu Fabra, Barcelona, Spain
Pilar Prieto, Patrick Louis Rohrer & Paula G. Sánchez-Ramón
Nantes Université, Nantes, France
Patrick Louis Rohrer
University Medical Center Göttingen, Göttingen, Germany
Martin Schulte-Rüther
Friedrich Schiller University of Jena, Jena, Germany
Stefan R. Schweinberger & Celina I. von Eiff
Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
Patrick C. Trettenbrein

Authors

Alina Gregori
View author publications
You can also search for this author in PubMed Google Scholar
Federica Amici
View author publications
You can also search for this author in PubMed Google Scholar
Ingmar Brilmayer
View author publications
You can also search for this author in PubMed Google Scholar
Aleksandra Ćwiek
View author publications
You can also search for this author in PubMed Google Scholar
Lennart Fritzsche
View author publications
You can also search for this author in PubMed Google Scholar
Susanne Fuchs
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Henlein
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Herbort
View author publications
You can also search for this author in PubMed Google Scholar
Frank Kügler
View author publications
You can also search for this author in PubMed Google Scholar
Jens Lemanski
View author publications
You can also search for this author in PubMed Google Scholar
Katja Liebal
View author publications
You can also search for this author in PubMed Google Scholar
Andy Lücking
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Mehler
View author publications
You can also search for this author in PubMed Google Scholar
Kim Tien Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Wim Pouw
View author publications
You can also search for this author in PubMed Google Scholar
Pilar Prieto
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Louis Rohrer
View author publications
You can also search for this author in PubMed Google Scholar
Paula G. Sánchez-Ramón
View author publications
You can also search for this author in PubMed Google Scholar
Martin Schulte-Rüther
View author publications
You can also search for this author in PubMed Google Scholar
Petra B. Schumacher
View author publications
You can also search for this author in PubMed Google Scholar
Stefan R. Schweinberger
View author publications
You can also search for this author in PubMed Google Scholar
Volker Struckmeier
View author publications
You can also search for this author in PubMed Google Scholar
Patrick C. Trettenbrein
View author publications
You can also search for this author in PubMed Google Scholar
Celina I. von Eiff
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alina Gregori .

Editor information

Editors and Affiliations

Purdue University, West Lafayette, IN, USA
Vincent G. Duffy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gregori, A. et al. (2023). A Roadmap for Technological Innovation in Multimodal Communication Research. In: Duffy, V.G. (eds) Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management. HCII 2023. Lecture Notes in Computer Science, vol 14029. Springer, Cham. https://doi.org/10.1007/978-3-031-35748-0_30

Download citation

DOI: https://doi.org/10.1007/978-3-031-35748-0_30
Published: 09 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35747-3
Online ISBN: 978-3-031-35748-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Roadmap for Technological Innovation in Multimodal Communication Research