Skip to main content

Advertisement

Log in

New wireless connection between user and VE using speech processing

  • Original Article
  • Published:
Virtual Reality Aims and scope Submit manuscript

An Erratum to this article was published on 12 March 2015

Abstract

This paper presents a novel speak-to-VR virtual-reality peripheral network (VRPN) server based on speech processing. The server uses a microphone array as a speech source and streams the results of the process through a Wi-Fi network. The proposed VRPN server provides a handy, portable and wireless human machine interface that can facilitate interaction in a variety interfaces and application domains including HMD- and CAVE-based virtual reality systems, flight and driving simulators and many others. The VRPN server is based on a speech processing software development kits and VRPN library in C++. Speak-to-VR VRPN works well even in the presence of background noise or the voices of other users in the vicinity. The speech processing algorithm is not sensitive to the user’s accent because it is trained while it is operating. Speech recognition parameters are trained by hidden Markov model in real time. The advantages and disadvantages of the speak-to-VR server are studied under different configurations. Then, the efficiency and the precision of the speak-to-VR server for a real application are validated via a formal user study with ten participants. Two experimental test setups are implemented on a CAVE system by using either Kinect Xbox or array microphone as input device. Each participant is asked to navigate in a virtual environment and manipulate an object. The experimental data analysis shows promising results and motivates additional research opportunities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Boyle A et al (2008) Pick your top geek gift-cosmic log. Science 3147:10

    Google Scholar 

  • Day PN, Holt PO, Russell GT (2001) The cognitive effects of delayed visual feedback: working memory disruption while driving in virtual environments. Cognitive technology: instruments of mind. Springer, Berlin, pp 75–82

    Chapter  Google Scholar 

  • DiVerdi S, Rakkolainen I, Höllerer T, Olwal A (2006) A novel walk-through 3d display. In: Electronic imaging 2006, p 605519. International Society for Optics and Photonics

  • Fischbach M, Wiebusch D, Giebler-Schubert A, Latoschik ME, Rehfeld S, Tramberend H (2011) Sixton’s curse-simulator x demonstration. In: 2011 IEEE virtual reality conference (VR), pp 255–256. IEEE

  • Intel. Voice recognition and synthesis using the intel perceptual computing sdk, 2013

  • iSpeech. Speech processing sdk for mobile developer, 8 2011

  • Jinghui G, Zijing J, Jinming H (2005) Implement of speech application program based on speech sdk [j]. J Guangxi Acad Sci 3:169–172

    Google Scholar 

  • Joystiq. Kinect: the company behind the tech explains how it works, March 21 2011

  • Kennedy RS, Lane NE, Berbaum KS, Lilienthal MG (1993) Simulator sickness questionnaire: an enhanced method for quantifying simulator sickness. Int J Aviat Psychol 3(3):203–220

    Article  Google Scholar 

  • Lyngsø R (2012) Hidden Markov models. Narotama 1:1–24

    Google Scholar 

  • Nilsson M, Ejnarsson M (2002) Speech recognition using hidden Markov model. Master’s thesis, Department of Telecommunications and Speech Processing, Blekinge Institute of Technology

  • Pulakka H, Alku P (2011) Bandwidth extension of telephone speech using a neural network and a filter bank implementation for highband mel spectrum. IEEE Trans Audio Speech Lang Process 19(7):2170–2183

    Article  Google Scholar 

  • R. M. T. II. (2008) Vrpn 07.30—http://www.cs.unc.edu/research/vrpn/

  • Retrieved P (2010) Kinect xbox 360 specification, July 2 2010. This information is based on specifications supplied by manufacturers and should be used for guidance only

  • Rodríguez-Andina J, Fagundes RDR, Junior DB (2001) A fpga-based viterbi algorithm implementation for speech recognition systems. In: 2001 IEEE international conference on acoustics, speech, and signal processing, 2001 (Proceedings ICASSP’01), vol 2, pp 1217–1220, IEEE

  • Rubsamen M, Gershman AB (2012) Robust adaptive beamforming using multidimensional covariance fitting. IEEE Trans Signal Process 60(2):740–753

    Article  MathSciNet  Google Scholar 

  • SAR (2005) Sri language modeling toolkit and speech sdk

  • Shao W, Qian Z (2013) A new partially adaptive minimum variance distortionless response beamformer with constrained stability least mean squares algorithm. Adv Sci Lett 19(4):1071–1074

    Article  Google Scholar 

  • Stone JE, Kohlmeyer A, Vandivort KL, Schulten K (2010) Immersive molecular visualization and interactive modeling with commodity hardware. In: Advances in visual computing. Springer, Berlin, pp 382–393

  • Store M (2010) Kinect for xbox 360, 7 July 2010. Array of 4 microphones supporting single speaker voice recognition

  • Suma EA, Lange B, Rizzo A, Krum DM, Bolas M (2011) Faast: The flexible action and articulated skeleton toolkit. In: 2011 IEEE virtual reality conference (VR), pp 247–248, IEEE

  • Taylor II RM, Hudson TC, Seeger A, Weber H, Juliano J, Helser AT (2001) Vrpn: a device-independent, network-transparent vr peripheral system. In: Proceedings of the ACM symposium on Virtual reality software and technology, pp 55–61, ACM

  • Zhu F-W, Li D-Q, Yuan Z-P, Wu J-Q, Cheng X (2004) An ar tracker based on planar marker. J Shanghai Univ (Nat Sci Ed) 5:005

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Ali Mirzaei.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mirzaei, M.A., Merienne, F. & Oliver, J.H. New wireless connection between user and VE using speech processing. Virtual Reality 18, 235–243 (2014). https://doi.org/10.1007/s10055-014-0248-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10055-014-0248-y

Keywords

Navigation