Abstract
Nowadays, VoIP has become the core communications on the internet. One of the crucial applications on VoIP is the automated IVR system that interacts with the user automatically. Speech recognition plays an important role behind this kind of system. This paper studies the effect of codec bit rate and network packet loss on Thai speech recognition systems over an IP network. We encoded the speech samples of male, female and artificial voice with various bit rates of Speex codec. The speech sample was sent by RTP through the IP network with packet loss simulation. The speech quality was measured by PESQ and compared to word error rate of speech recognition. The results show that the codec bit rate and level of packet loss have a significant impact on the performance of speech recognition over IP.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Holub, J., Slavata, O.: Impact of IP channel parameters on the final quality of the transferred voice. In: Wireless Telecommunications Symposium (WTS), pp. 1–5 (2012)
Karapantazis, S., Pavlidou, F.-N.: VoIP: A comprehensive survey on a promising technology. Computer Networks 53, 2050–2090 (2009)
Blatnik, R., Kandus, G., Šef, T.: Influence of the speech quality in telephony on the automated speaker recognition. In: Proceedings of the 5th WSEAS International Conference on Circuits, Systems, Signal and Telecommunications, pp. 115–120. World Scientific and Engineering Academy and Society (WSEAS), Stevens Point (2011)
Mayorga, P., Besacier, L., Lamy, R., Serignat, J.-F.: Audio packet loss over IP and speech recognition. In: 2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003, pp. 607–612 (2003)
Yip, M.: Tone. Cambridge University Press (2002)
Triyason, T., Kanthamanon, P.: Perceptual Evaluation of Speech Quality Measurement on Speex Codec VoIP with Tonal Language Thai. In: Papasratorn, B., Charoenkitkarn, N., Lavangnananda, K., Chutimaskul, W., Vanijja, V. (eds.) IAIT 2012. CCIS, vol. 344, pp. 181–190. Springer, Heidelberg (2012)
ITU-T P.862: Perceptual evaluation of speech quality (PESQ), an ojective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs (2001)
Daengsi, T., Wutiwiwatchai, C., Preechayasomboon, A., Sukparungsee, S.: A study of VoIP quality evaluation: User perception of voice quality from G.729, G.711 and G.722. In: 2012 IEEE Consumer Communications and Networking Conference (CCNC), pp. 342–345 (2012)
Diller, A.V.N.: The Tai-Kadai languages. Routledge, London (2012)
Chong, F.L., McLoughlin, I.V., Pawlikowski, K.: A Methodology for Improving PESQ accuracy for Chinese Speech. TENCON 2005 IEEE Region 10, 1–6 (2005)
Cooke, J.R., Abramson, A.S.: The Vowels and Tones of Standard Thai: Acoustical Measurements and Experiments. American Anthropologist. 65, 1406–1407 (1963)
Chomphan: Fujisaki’s Model of Fundamental Frequency Contours for Thai Dialects. Journal of Computer Science 6, 1263–1271 (2010)
ITU-T P.800: Methods for subjective determination of transmission quality (1996)
ITU-T P.862.1: Mapping function for transforming P.862 raw result scores to MOS-LQO (2003)
Anusuya, M.A., Katti, S.K.: Speech Recognition by Machine. A Review (2010)
Vaja 6.0, http://vaja.nectec.or.th/
Ispeech-W 1.5, http://tvis.nectec.or.th/speech/index.php
Patcharikra, C., Treepop, S., Sawit, K., Nattanun, T., Chai, W.: LOTUS: Large vocabulary Thai continuous Speech Recognition Corpus. In: NSTDA Annual Conference S&T in Thailand: Towards the Molecular Economy (2005)
Chitode, J.S.: Principles Of Communication. Technical Publications (2008)
Valin, J.M.: The Speex codec manual (version 1.2 Beta 3) (2007)
RTPToolBox: RTP Packet Testing & Simulation Tools, http://www.gl.com/rtptoolbox.html
IPNetSim-(IPNetwork/WANEmulator - 100Mbps, 1Gbps, 4x1Gbps), http://www.gl.com/ipnetsim.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Triyason, T., Kanthamanon, P. (2013). Effect of Codec Bit Rate and Packet Loss on Thai Speech Recognition over IP. In: Papasratorn, B., Charoenkitkarn, N., Vanijja, V., Chongsuphajaisiddhi, V. (eds) Advances in Information Technology. IAIT 2013. Communications in Computer and Information Science, vol 409. Springer, Cham. https://doi.org/10.1007/978-3-319-03783-7_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-03783-7_21
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03782-0
Online ISBN: 978-3-319-03783-7
eBook Packages: Computer ScienceComputer Science (R0)