Abstract
This paper presents a novel speak-to-VR virtual-reality peripheral network (VRPN) server based on speech processing. The server uses a microphone array as a speech source and streams the results of the process through a Wi-Fi network. The proposed VRPN server provides a handy, portable and wireless human machine interface that can facilitate interaction in a variety interfaces and application domains including HMD- and CAVE-based virtual reality systems, flight and driving simulators and many others. The VRPN server is based on a speech processing software development kits and VRPN library in C++. Speak-to-VR VRPN works well even in the presence of background noise or the voices of other users in the vicinity. The speech processing algorithm is not sensitive to the user’s accent because it is trained while it is operating. Speech recognition parameters are trained by hidden Markov model in real time. The advantages and disadvantages of the speak-to-VR server are studied under different configurations. Then, the efficiency and the precision of the speak-to-VR server for a real application are validated via a formal user study with ten participants. Two experimental test setups are implemented on a CAVE system by using either Kinect Xbox or array microphone as input device. Each participant is asked to navigate in a virtual environment and manipulate an object. The experimental data analysis shows promising results and motivates additional research opportunities.
Similar content being viewed by others
References
Boyle A et al (2008) Pick your top geek gift-cosmic log. Science 3147:10
Day PN, Holt PO, Russell GT (2001) The cognitive effects of delayed visual feedback: working memory disruption while driving in virtual environments. Cognitive technology: instruments of mind. Springer, Berlin, pp 75–82
DiVerdi S, Rakkolainen I, Höllerer T, Olwal A (2006) A novel walk-through 3d display. In: Electronic imaging 2006, p 605519. International Society for Optics and Photonics
Fischbach M, Wiebusch D, Giebler-Schubert A, Latoschik ME, Rehfeld S, Tramberend H (2011) Sixton’s curse-simulator x demonstration. In: 2011 IEEE virtual reality conference (VR), pp 255–256. IEEE
Intel. Voice recognition and synthesis using the intel perceptual computing sdk, 2013
iSpeech. Speech processing sdk for mobile developer, 8 2011
Jinghui G, Zijing J, Jinming H (2005) Implement of speech application program based on speech sdk [j]. J Guangxi Acad Sci 3:169–172
Joystiq. Kinect: the company behind the tech explains how it works, March 21 2011
Kennedy RS, Lane NE, Berbaum KS, Lilienthal MG (1993) Simulator sickness questionnaire: an enhanced method for quantifying simulator sickness. Int J Aviat Psychol 3(3):203–220
Lyngsø R (2012) Hidden Markov models. Narotama 1:1–24
Nilsson M, Ejnarsson M (2002) Speech recognition using hidden Markov model. Master’s thesis, Department of Telecommunications and Speech Processing, Blekinge Institute of Technology
Pulakka H, Alku P (2011) Bandwidth extension of telephone speech using a neural network and a filter bank implementation for highband mel spectrum. IEEE Trans Audio Speech Lang Process 19(7):2170–2183
R. M. T. II. (2008) Vrpn 07.30—http://www.cs.unc.edu/research/vrpn/
Retrieved P (2010) Kinect xbox 360 specification, July 2 2010. This information is based on specifications supplied by manufacturers and should be used for guidance only
Rodríguez-Andina J, Fagundes RDR, Junior DB (2001) A fpga-based viterbi algorithm implementation for speech recognition systems. In: 2001 IEEE international conference on acoustics, speech, and signal processing, 2001 (Proceedings ICASSP’01), vol 2, pp 1217–1220, IEEE
Rubsamen M, Gershman AB (2012) Robust adaptive beamforming using multidimensional covariance fitting. IEEE Trans Signal Process 60(2):740–753
SAR (2005) Sri language modeling toolkit and speech sdk
Shao W, Qian Z (2013) A new partially adaptive minimum variance distortionless response beamformer with constrained stability least mean squares algorithm. Adv Sci Lett 19(4):1071–1074
Stone JE, Kohlmeyer A, Vandivort KL, Schulten K (2010) Immersive molecular visualization and interactive modeling with commodity hardware. In: Advances in visual computing. Springer, Berlin, pp 382–393
Store M (2010) Kinect for xbox 360, 7 July 2010. Array of 4 microphones supporting single speaker voice recognition
Suma EA, Lange B, Rizzo A, Krum DM, Bolas M (2011) Faast: The flexible action and articulated skeleton toolkit. In: 2011 IEEE virtual reality conference (VR), pp 247–248, IEEE
Taylor II RM, Hudson TC, Seeger A, Weber H, Juliano J, Helser AT (2001) Vrpn: a device-independent, network-transparent vr peripheral system. In: Proceedings of the ACM symposium on Virtual reality software and technology, pp 55–61, ACM
Zhu F-W, Li D-Q, Yuan Z-P, Wu J-Q, Cheng X (2004) An ar tracker based on planar marker. J Shanghai Univ (Nat Sci Ed) 5:005
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mirzaei, M.A., Merienne, F. & Oliver, J.H. New wireless connection between user and VE using speech processing. Virtual Reality 18, 235–243 (2014). https://doi.org/10.1007/s10055-014-0248-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10055-014-0248-y