Skip to main content
Log in

Multimodal user interaction with in-car equipment in real conditions based on touch and speech modes in the Persian language

  • 1224: New Frontiers in Multimedia-based and Multimodal HCI
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Nowadays, communication with in-car equipment is performed via a large number of buttons or a touch screen. This increases the need for driver’s visual attention and leads to reduce the concentration of drivers while driving. Speech-based interaction has been introduced in recent years as a way to reduce driver distractions. This input mode faces several technical challenges such as the need to memorize voice commands and the difficulties of canceling them. This paper focuses on presenting a multimodal user interface design based on touch and speech modes, for controlling five in-car devices (radio, CD player or music player, fan, heater, and driver-side window). The research is designed to collect a dataset of in-car voice commands in the Persian language in real conditions (in a real car and in the presence of background noises) to firstly create a dataset of Persian voice commands (due to lack of research in this area in Persian speaking countries) and secondly intending to solve the mentioned challenges. To evaluate the proposed user interface, 15 participants performed ten different tasks based on the speech and touch modes, with and without driving simulation. The evaluation results indicated that the speech input mode with and without driving simulation has had in average smaller number of clicks for performing tasks (0.2 and 0.6), smaller task completion time (7.37 and 3.3 seconds), smaller time intervals between clicks (8.2 and 5 seconds) and smaller driver’s distraction rate (25.08%) in comparison to the touch input mode, respectively. Moreover, using two different input modes in designing the in-vehicle user interface leads to increased accessibility.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Data availability

The datasets generated during the current study are not publicly available. However, further information about the data and conditions for access are available from the corresponding author on reasonable request.

References

  1. Aftab AR (2019) Multimodal driver interaction with gesture, gaze and speech. In: International conference on multimodal interaction, pp 487–492

  2. Azargoshasb S, Korayem A, Tabibian SA (2018) voice command detection system for controlling movement of SCOUT robot. In: The 6th RSI International Conference on Robotics and Mechatronics (IcRoM), pp 326–330

  3. Bellegarda JR (2014) Spoken language understanding for natural interaction: the Siri experience. In: Natural interaction with robots, knowbots and smartphones, pp 3–14

  4. Bourlard HA, Morgan N (2012) Connectionist speech recognition: a hybrid approach, vol 247. Springer Science & Business Media

    Google Scholar 

  5. Braun M, Broy N, Pfleging B, Alt F (2019) Visualizing natural language interaction for conversational in-vehicle information systems to minimize driver distraction. J Multimodal User Interfaces 13(2):71–88

    Article  Google Scholar 

  6. Buchhop K, Edel L, Kenaan S, Raab U, Böhm P, Isemann D (2017) In-vehicle touchscreen interaction: can a head-down display give a heads-up on obstacles on the road? In: Proceedings of the 9th international conference on automotive user interfaces and interactive vehicular applications, pp 21–30

  7. Burnett G, Hazzard A, Crundall E, Crundall D (2017) Altering speed perception through the subliminal adaptation of music within a vehicle. In: Proceedings of the 9th international conference on automotive user interfaces and interactive vehicular applications, pp 164–172

  8. Castronovo S, Mahr A, Pentcheva M, Müller C (2010) Multimodal dialog in the car: combining speech and turn-and-push dial to control comfort functions. In: Eleventh Annual Conference of the International Speech Communication Association, pp 510–513

  9. Diaconu C, Freedman C, Larson P, Zwilling M (2016) Inventors; Microsoft technology licensing, Llc, assignee. US Patent US9,251,214

  10. Endsley MR (2016) Designing for situation awareness: an approach to user-centered design. CRC press

    Book  Google Scholar 

  11. Fischer P, Nurnberger A (2008) Adaptive and multimodal interaction in the vehicle. In: IEEE international conference on systems, man and cybernetics, pp 1512–1516

  12. Green P (1999) The 15-second rule for driver information systems. In: Proceedings of the ITS America Ninth Annual Meeting, pp 1–9

  13. Hidden Markov Model Toolkit (HTK) (2015) Speech vision and robotics group of the Cambridge university engineering department

  14. Hossan MA, Memon S, Gregory MA (2010) A novel approach for MFCC feature extraction. In: The 4th international conference on signal processing and communication systems, pp 1–5

  15. Kalkhoran LS, Tabibian S, Homayounvala E (2020) Improving the accuracy of Persian HMM-based voice command detection system in smart homes based on ontology method. In: 6th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), pp 1–5

  16. Khare A, Sinha A, Bhowmick B, Kumar K, Gosh H, Wattamar S, Kopparapu SK (2009) Multimodal interaction in modern automobiles. In: Multimodal interfaces for automotive applications, pp 1–4

  17. Klakow D, Peters J (2002) Testing the correlation of word error rate and perplexity. Speech Comm 38(1–2):19–28

    Article  MATH  Google Scholar 

  18. Korayem M, Azargoshasb S, Korayem A, Tabibian S (2021) Design and implementation of the voice command recognition and the sound source localization system for human–robot interaction. Robotica 39(10):1779–1790

    Article  Google Scholar 

  19. Kujala T (2013) Browsing the information highway while driving: three in-vehicle touch screen scrolling methods and driver distraction. Pers Ubiquit Comput 17(5):815–823

    Article  Google Scholar 

  20. Kujala T, Grahn H (2017) Visual distraction effects of in-car text entry methods: comparing keyboard, handwriting and voice recognition. In: Proceedings of the 9th international conference on automotive user interfaces and interactive vehicular applications, pp 1–10

  21. Marcus A (1995) Principles of effective visual communication for graphical user interface design. In: Readings in human–computer interaction. Elsevier, pp 425–441

  22. McCallum MC, Campbell JL, Richman JB, Brown JL, Wiese E (2004) Speech recognition and in-vehicle telematics devices: potential reductions in driver distraction. Int J Speech Technol 7(1):25–33

    Article  Google Scholar 

  23. Miller R (2004) User interface design and implementation. Lecture notes. Massachusetts institute of technology

    Google Scholar 

  24. Naseri MM, Tabibian S (2020) Improving the robustness of persian spoken isolated digit recognition based on LSTM. In: 6th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), pp 1–6

  25. Nielsen J (1994) Enhancing the explanatory power of usability heuristics. In: Proceedings of the SIGCHI conference on Human Factors in Computing Systems, pp 152–158

  26. Pfleging B, Schneegass S, Schmidt A (2012) Multimodal interaction in the car: combining speech and gestures on the steering wheel. In: Proceedings of the 4th international conference on automotive user interfaces and interactive vehicular applications, pp 155–162

  27. Roider F, Rümelin S, Pfleging B, Gross T (2019) Investigating the effects of modality switches on driver distraction and interaction efficiency in the car. J Multimodal User Interfaces 13(2):89–97

    Article  Google Scholar 

  28. Sameti H, Veisi H, Bahrani M, Babaali B, Hosseinzadeh K (2008) Nevisa, a persian continuous speech recognition system. In: Computer society of Iran computer conference, pp 485–492

  29. Standardization IOf (2018) ISO 9241-11: 2018, ergonomics of human-system interaction, part 11: usability: definitions and concepts

  30. Tabibian S (2017) A voice command detection system for aerospace applications. Int J Speech Technol 20(4):1049–1061

    Article  Google Scholar 

  31. Tabibian S (2018) Design and collection of Persian spoken digits based on cell phone. In: Proceedings of the 4th conference on signal processing and intelligent systems, Tehran, pp 1–5

  32. Tsimhoni O, Green P (2001) Visual demand of driving and the execution of display-intensive in-vehicle tasks. In: Proceedings of the human factors and ergonomics society annual meeting, vol 23. SAGE Publications Sage CA, pp 1586–1590

  33. Veisi H, Haji Mani A (2020) Persian speech recognition using deep learning. Int J Speech Technol 23(4):893–905

    Article  Google Scholar 

  34. Wickens CD, Gordon SE, Liu Y (2003) An introduction to human factors engineering, 2nd edn. Pearson

    Google Scholar 

  35. Yang S, Pan Y (2014) A study on methods of multimodal interaction in vehicle based on wheel gestures and voices. In: International conference on human-computer interaction, pp 484–489

  36. Zhao D, Wang C, Liu Y, Liu T (2019) Implementation and evaluation of touch and gesture interaction modalities for in-vehicle infotainment systems. In: International conference on image and graphics, pp 384–394

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shima Tabibian.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nazari, F., Tabibian, S. & Homayounvala, E. Multimodal user interaction with in-car equipment in real conditions based on touch and speech modes in the Persian language. Multimed Tools Appl 82, 12995–13023 (2023). https://doi.org/10.1007/s11042-022-13784-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13784-1

Keywords

Navigation