A Universal Assistive Technology with Multimodal Input and Multimedia Output Interfaces

  • Alexey Karpov
  • Andrey Ronzhin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8513)


In this paper, we present a universal assistive technology with multimodal input and multimedia output interfaces. The conceptual model and the software-hardware architecture with levels and components of the universal assistive technology are described. The architecture includes five main interconnected levels: computer hardware, system software, application software of digital signal processing, application software of human-computer interfaces, software of assistive information technologies. The universal assistive technology proposes several multimodal systems and interfaces to the people with disabilities: audio-visual Russian speech recognition system (AVSR), “Talking head” synthesis system (text-to-audiovisual speech), “Signing avatar” synthesis system (sign language visual synthesis), ICANDO multimodal system (hands-free PC control system), and the control system of an assistive smart space.


Assistive Technology Multimodal User Interfaces Multimedia Universal Access Audio-Visual Speech Assistive Applications 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ambient Assisted Living Joint Programme,
  2. 2.
    The Convention on the Rights of Persons with Disabilities of the United Nations,
  3. 3.
    The Russian State Programme “Accessible Environment”,
  4. 4.
  5. 5.
    Argyropoulos, S., Moustakas, K., Karpov, A., Aran, O., Tzovaras, D., Tsakiris, T., Varni, G., Kwon, B.: A Multimodal Framework for the Communication of the Disabled. Journal on Multimodal User Interfaces 2(2), 105–116 (2008)CrossRefGoogle Scholar
  6. 6.
    Hruz, M., Campr, P., Dikici, E., Kindirouglu, A., Krňoul, Z., Ronzhin, A., Sak, H., Schorno, D., Akarun, L., Aran, O., Karpov, A., Saraclar, M., Železný, M.: Automatic Fingersign to Speech Translation System. Journal on Multimodal User Interfaces 4(2), 61–79 (2011)CrossRefGoogle Scholar
  7. 7.
    Stephanidis, C., Akoumianakis, D., Sfyrakis, M., Paramythis, A.: Universal accessibility in HCI: Process-oriented design guidelines an tool requirements. In: Proc. 4th ERCIM Workshop on User Interfaces for All, Stockholm, Sweden, pp. 19–21 (1998)Google Scholar
  8. 8.
    Savidis, A., Stephanidis, C.: Unified user interface design: designing universally accessible interfaces. Interacting with Computers 16(2), 243–270 (2004)CrossRefGoogle Scholar
  9. 9.
    De Marsico, M., Kimani, S., Mirabella, V., Norman, K.L., Catarci, T.: A Proposal toward the Development of Accessible e-Learning Content by Human Involvement. Universal Access in the Information Society 5(2), 150–169 (2006)CrossRefGoogle Scholar
  10. 10.
    Obrenovic, Z., Abascal, J., Starcevic, D.: Universal Accessibility as a Multimodal Design Issue. Communications of the ACM 50(5), 83–88 (2007)CrossRefGoogle Scholar
  11. 11.
    Oviatt, S., Cohen, P.: Perceptual user interfaces: multimodal interfaces that process what comes naturally. Communications of the ACM 43(3), 45–53 (2000)CrossRefGoogle Scholar
  12. 12.
    Martin, J.-C.: Towards “intelligent” cooperation between modalities. The example of a system enabling multimodal interaction with a map. In: Proc. IJCAI 1997 Workshop on Intelligent Multimodal Systems, Nagoya, Japan (1997)Google Scholar
  13. 13.
    Ong, S., Ranganath, S.: Automatic Sign Language Analysis: A Survey and the Future beyond Lexical Meaning. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(6), 873–891 (2005)CrossRefGoogle Scholar
  14. 14.
    Grauman, K.: Communication via Eye Blinks and Eyebrow Raises: Video-Based Human-Computer Interfaces. In: Grauman, K., Betke, M., Lombardi, J., Gips, J., Bradski, G. (eds.) Universal Access in the Information Society, vol. 4, pp. 359–373 (2003)Google Scholar
  15. 15.
    Graimann, B., Allison, B., Pfurtscheller, G.: Brain–Computer Interfaces: A Gentle Introduction. In: Brain-Computer Interfaces. The Frontiers Collection, pp. 1–27. Springer (2010)Google Scholar
  16. 16.
    Colwell, C., Petrie, H., Kornbrot, D., Hardwick, A., Furner, S.: Haptic Virtual Reality for Blind Computer Users. In: Proc. Annual ACM Conference on Assistive Technologies, ASSETS 1998, Marina del Rey, CA, USA, pp. 92–99 (1998)Google Scholar
  17. 17.
    Karpov, A., Ronzhin, A., Markov, K., Zelezny, M.: Viseme-Dependent Weight Optimization for CHMM-Based Audio-Visual Speech Recognition. In: Proc. INTERSPEECH 2010 International Conference, ISCA Association, Makuhari, Japan, pp. 2678–2681 (2010)Google Scholar
  18. 18.
    Karpov, A., Ronzhin, A., Kipyatkova, I., Zelezny, M.: Influence of Phone-viseme Temporal Correlations on Audiovisual STT and TTS Performance. In: Proc. 17th International Congress of Phonetic Sciences, ICPhS 2011, Hong Kong, China, pp. 1030–1033 (2011)Google Scholar
  19. 19.
    Karpov, A., Markov, K., Kipyatkova, I., Vazhenina, D., Ronzhin, A.: Large vocabulary Russian speech recognition using syntactico-statistical language modeling. Speech Communication 56, 213–228 (2014)CrossRefGoogle Scholar
  20. 20.
    Nefian, A., Liang, L., Pi, X., Xiaoxiang, X., Mao, C., Murphy, K.: A Coupled HMM for Audio-Visual Speech Recognition. In: Proc. International Conference on Acoustics, Speech and Signal Processing, ICASSP 2002, Orlando, USA, pp. 2013–2016 (2002)Google Scholar
  21. 21.
    Karpov, A., Tsirulnik, L., Krňoul, Z., Ronzhin, A., Lobanov, B., Železný, M.: Audio-Visual Speech Asynchrony Modeling in a Talking Head. In: Proc. INTERSPEECH 2009 International Conference, Brighton, UK, pp. 2911–2914 (2009)Google Scholar
  22. 22.
    Karpov, A., Tsirulnik, L., Zelezny, M., Krnoul, Z., Ronzhin, A., Lobanov, B.: Study of Audio-Visual Asynchrony of Russian Speech for Improvement of Talking Head Naturalness. In: Proc. 13th International Conference SPECOM 2009, St. Petersburg, pp. 130–135 (2009)Google Scholar
  23. 23.
    Morales-Rodriguez, M.L., Pavard, B.: Embodied Conversational Agents: A New Kind of Tool for Motor Rehabilitation? In: Proc. 11th Annual International Workshop on Presence, PRESENCE 2008, Padova, Italy, pp. 95–99 (2008)Google Scholar
  24. 24.
    Multimedia demonstration of “Talking head” for audio-visual Russian speech synthesis,
  25. 25.
    Karpov, A., Krnoul, Z., Zelezny, M., Ronzhin, A.: Multimodal Synthesizer for Russian and Czech Sign Languages and Audio-Visual Speech. In: Stephanidis, C., Antona, M. (eds.) UAHCI 2013, Part I. LNCS, vol. 8009, pp. 520–529. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  26. 26.
    Karpov, A., Železný, M.: Towards Russian Sign Language Synthesizer: Lexical Level. In: Proc. 5th International Workshop on Representation and Processing of Sign Languages at the LREC 2012, Istanbul, Turkey, pp. 83–86 (2012)Google Scholar
  27. 27.
    Hanke, T.: HamNoSys - Representing sign language data in language resources and language processing contexts. In: Proc. International Conference on Language Resources and Evaluation, LREC 2004, Lisbon, Portugal, pp. 1–6 (2004)Google Scholar
  28. 28.
    Multimedia demonstration of 3D “Signing avatar” for Russian sign language synthesis,
  29. 29.
    Kindiroglu, A., Yalcın, H., Aran, O., Hruz, M., Campr, P., Akarun, L., Karpov, A.: Automatic Recognition of Fingerspelling Gestures in Multiple Languages for a Communication Interface for the Disabled. Pattern Recognition and Image Analysis 22(4), 527–536 (2012)CrossRefGoogle Scholar
  30. 30.
    Kindiroglu, A., Yalcın, H., Aran, O., Hruz, M., Campr, P., Akarun, L., Karpov, A.: Multi-lingual Fingerspelling Recognition in a Handicapped Kiosk. Pattern Recognition and Image Analysis 21(3), 402–406 (2011)CrossRefGoogle Scholar
  31. 31.
    Karpov, A., Ronzhin, A., Kipyatkova, I.: An Assistive Bi-modal User Interface Integrating Multi-channel Speech Recognition and Computer Vision. In: Jacko, J.A. (ed.) Human-Computer Interaction, Part II, HCII 2011. LNCS, vol. 6762, pp. 454–463. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  32. 32.
    Karpov, A., Ronzhin, A.: ICANDO: Low Cost Multimodal Interface for Hand Disabled People. Journal on Multimodal User Interfaces 1(2), 21–29 (2007)CrossRefGoogle Scholar
  33. 33.
    Demonstration of multimodal hands-free PC control system (ICANDO),
  34. 34.
    Demiröz, B., Ari, I., Ronzhin, A., Çoban, A., Yalçın, H., Karpov, A., Akarun, L.: Multimodal Assisted Living Environment. Report on research project at eNTERFACE-2011 Summer Workshop on Multimodal Interfaces, Pilsen, Czech Republic (2011),
  35. 35.
    De Marsico, M., Sterbini, A., Temperini, M.: A Framework to Support Social-Collaborative Personalized e-Learning. In: Kurosu, M. (ed.) HCII/HCI 2013, Part II. LNCS, vol. 8005, pp. 351–360. Springer, Heidelberg (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Alexey Karpov
    • 1
    • 2
  • Andrey Ronzhin
    • 2
  1. 1.University ITMOSt. PetersburgRussia
  2. 2.St. Petersburg Institute for Informatics and Automation of RAS (SPIIRAS)Russia

Personalised recommendations