Journal on Multimodal User Interfaces

, Volume 12, Issue 4, pp 271–272 | Cite as

Speech communication integrated with other modalities

  • Alexey KarpovEmail author
  • Iosif Mporas


This brief paper is an editorial for the special issue on “Speech communication integrated with other modalities”. This special issue contains extended versions of selected topical papers from the 19th International Conference on Speech and Computer SPECOM-2017, organized on 12–16 September 2017 in Hatfield, UK. Five extended articles were selected for this special issue, all of which deal with speech-based human–computer communication jointly with visual, textual and/or other interaction modalities. The first three papers study various aspects of multimodal human–computer interaction, and in the remaining two papers, the authors study the video components of audio-visual speech recognition systems. In this editorial, we present an overview of the accepted articles and the selection process.


Speech communication Multimodal interaction Audio-visual speech SPECOM 



The guest editors are grateful to the Editor-in-Chief Prof. Jean-Claude Martin for the cooperation and support of this Special Issue, as well as to all outstanding reviewers, who provided detailed and insightful reviews of the extended papers submitted for this Special Issue (in the alphabetical order): Gerard Bailly, Marie-Luce Bourguet, Nick Campbell, Nikos Fakotakis, Kristiina Jokinen, Oliver Jokisch, Irina Kipyatkova, Wolfgang Minker, and Milos Zelezny.


  1. 1.
    Karpov A, Potapova R, Mporas I (eds) Proceedings of the 19th international conference on speech and computer SPECOM 2017, Hatfield, UK, 2017, vol 10458. Springer LNCS. Google Scholar
  2. 2.
    Schuller B, Zhang Y, Weninger F (2018) Three recent trends in paralinguistics on the way to omniscient machine intelligence. J Multimodal User Interfaces. CrossRefGoogle Scholar
  3. 3.
    Schuller BW (2017) Big data, deep learning—at the edge of X-ray speaker analysis. In: Speech and computer. SPECOM 2017. Lecture notes in computer science, vol 10458. Springer, Cham. CrossRefGoogle Scholar
  4. 4.
    Salim FA, Haider F, Conlan O et al (2018) An approach for exploring a video via multimodal feature extraction and user interactions. J Multimodal User Interfaces. CrossRefGoogle Scholar
  5. 5.
    Gilmartin E, Cowan B, Vogel C, Campbell N (2018) Explorations in multiparty casual social talk and its relevance for social human machine dialogue. J Multimodal User Interfaces. CrossRefGoogle Scholar
  6. 6.
    Paleček K (2018) Experimenting with lipreading for large vocabulary continuous speech recognition. J Multimodal User Interfaces. CrossRefGoogle Scholar
  7. 7.
    Ivanko D, Karpov A, Fedotov D et al (2018) Multimodal speech recognition: increasing accuracy using high speed video data. J Multimodal User Interfaces. CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.St. Petersburg Institute for Informatics and Automation of the Russian Academy of SciencesSt. PetersburgRussian Federation
  2. 2.School of Engineering and TechnologyUniversity of HertfordshireHatfieldUK

Personalised recommendations