Abstract
In general, speech recognition is a process that is referred to convert spoken string into machine-understandable string. Speech Recognition consists of 2 processes, i) removal of background noise (background noise is generated due to the stressful noise environment) and ii) phoneme separation word by word (also involves phoneme recognition). In real time situation, sound signals consist of both noises (target noise as well as background noise).
This paper critically evaluates the currently available signal analysis techniques and the modeling of phonemes, as applied to isolated and context-independent phoneme recognition. The proposed methodology introduces the technique of determining the pure speech-signal in a noisy environment (without background noise) and phonemes-isolation word by word using some clustering approach. With the use of proposed methodology, high accuracy of background noise-isolation (obtaining clean speech-signal without background noise) and high accuracy of phoneme isolation from clean speech-signal have been achieved which can be qualitatively compared to previous research done on continuous phoneme recognition. Performance evaluation also shows the improvement to achieve the speech recognition in a stressful noise situation and better quality of phoneme separation process.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Furuichi, C., Aizawa, K., Inoue, K.: Speech recognition using stochastic phonemic segment model based on phoneme segmentation, Faculty of Engineering. Toin University of Yokohama, 1614 Kurogane, Midori, Yokohama, Japan
Engelbrecht, H.A., du Preez, J.A.: The Interplay of Signal Analysis and Phoneme Modelling Techniques on Phoneme Recognition. Telecommunications and Digital Signal Processing Group, Department of Electronic Engineering. University of Stellenbosch, South Africa
Feng, L., Hansen, L.K.: Phonemes as Short Time Cognitive Components. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006, May 14-19, vol. 5 (2006)
Shirai, K., Hosaka, N., Kitagawa, E.: Speaker Adaptive Phoneme Recognition by Multi-level Clustering Based on Mutual Information Criterion, Department of Electrical Engineering. Waseda University, 3-4-1 Ohkubo, Shinjyuku - ku, Tokyo 169, Japan
Hansen, John, H.L., Cairns, D.A.: Source Generator Based Real-time Recognition of Speech in Noisy stressful and Lombard Effect Environments, Robust speech processing laboratory, Department of Electrical Engineering. Duke University, Durham, North Caroline, USA
Frahling, G.A., Sohler, C.: A fast k-means implementation using coresets. In: Proceedings of the Twenty-Second Annual Symposium on Computational Geometry, SCG 2006, Sedona, Arizona, USA, June 5-7, pp. 135–143. ACM, NewYork (2006), http://doi.acm.org/10.1145/1137856.1137879
Johnstone, A., Altmann, G.: Automated speech recognition: a framework for research. In: Proceedings of the Second Conference on European Chapter of the Association For Computational Linguistics, European Chapter Meeting of the ACL, Geneva, Switzerland, March 27-29, pp. 239–243. Association for Computational Linguistics, Morristown (1985), http://dx.doi.org/10.3115/976931.976966
Hincks, R.: Using Speech Recognition to Evaluate skills in spoken English, Department of Speech, Music and Hearing, KTH
Kashima, H., Hu, J., Ray, B., Singh, M.: K-means clustering of proportional data using L1 distance. In: 19th International Conference on Pattern Recognition, ICPR 2008, December 8-11, pp. 1–4 (2008)
Digalakis, V., Ostendorf, M., Rohlicek, J.R.: Improvements in the stochastic segment model for Phoneme recognition. In: Proceedings of the Workshop on Speech and Natural Language, Human Language Technology Conference, Cape Cod, Massachusetts, October 15 - 18, pp. 332–338. Association for Computational Linguistics, Morristown (1989), http://dx.doi.org/10.3115/1075434.1075491
Hincks, R.: Speech technologies for pronunciation feedback and evaluation. ReCALL 15(1), 3–20 (2003), http://dx.doi.org/10.1017/S0958344003000211
De Liang, W., Brown, G.J.: Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. Wiley-IEEE Press (2006), ISBN: 978-0-471-74109-1
Hyvärinen, A., Karhunen, J., Oja, E.: Independent Component Analysis, PG 145-164. In: Adaptive and Learning Systems for Signal Processing, Communications, and Control, Nerural Networks Research Center, Helsinki. University of Technology, Finland (2002), http://dx.doi.org/10.1002/0471221317.ch7
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tak, G.K., Bhargava, V. (2010). Clustering Approach in Speech Phoneme Recognition Based on Statistical Analysis. In: Meghanathan, N., Boumerdassi, S., Chaki, N., Nagamalai, D. (eds) Recent Trends in Network Security and Applications. CNSA 2010. Communications in Computer and Information Science, vol 89. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14478-3_48
Download citation
DOI: https://doi.org/10.1007/978-3-642-14478-3_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14477-6
Online ISBN: 978-3-642-14478-3
eBook Packages: Computer ScienceComputer Science (R0)