Abstract
In this paper a technique for automatic segmentation of spoken word signals is presented for identifying letters for transcription into textual form. Signal patterns for each letter present in different words have been used for the purpose. Voice signals are obtained by taking pronunciations of 1,000 words available in the standard dictionary. After collecting the signals, pre-processing is performed to reduce the noise taking a heuristically determined threshold value. Then the signals are segmented based on Amplitude Variation (AV) in different portions of the signal, each corresponding to an alphabet in that particular word. Signal Peak Value (SPV) is the feature used for recognizing the letters. Accuracy of the method is estimated using Bagging, Bayes Net, J48, Naive Bayes, PART and SVM classifiers available in Weka. The best and the average classification accuracies obtained in this method are 95.15 % (given by J48 classifier) and 86.92 %, respectively, which are quite acceptable.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jalil, M., Butt, F.A., Malik, A.: A survey of different speech synthesis techniques. In: International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE). IEEE (2013)
Tabet, Y., Boughazi, M.: Speech synthesis techniques. A survey. In: 7th International Workshop on Systems, Signal Processing and their Applications (WOSSPA). IEEE (2011)
Breen, A.: Speech synthesis models: a review. Electron. Commun. Eng. J. 4(1), 19–31 (1992)
Sidorov, M., et al.: Survey of automated speaker identification methods. In: 9th International Conference on Intelligent Environments (IE). IEEE (2013)
Lawson, A., et al.: Survey and evaluation of acoustic features for speaker recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2011)
Karim, R., Rahman, M.S., Iqbal, M.Z.: Recognition of spoken letters in Bangla. In: Proceedings of 5th International Conference on Computer and Information Technology (ICCIT02) (2002)
Pelton, G.E.: Voice Processing. McGraw-Hill International Edition, New York (1993)
Holmes, J.N.: Speech Synthesis and Recognition. Taylor & Francis, United Kingdom (2001)
Rudnicky, A.I., Hauptmann, A.G., Lee, K.-F.: Survey of current speech technology. Commun. ACM 37(1), 52–57 (1994)
Verma, P., Rao, P.: Real-time melodic accompaniment system for indian music using TMS320C6713. In: 25th International Conference on VLSI Design (VLSID). IEEE (2012)
Al-Shoshan, A.I.: Speech and music classification and separation: a review. J. King Saud Univ. (2006)
Razak, A.A., Abidin, M.I.Z., Komiya, R.: Emotion pitch variation analysis in Malay and English voice samples. In: The 9th Asia-Pacific Conference on Communications, APCC 2003, vol. 1. IEEE (2003)
Kos, M., et al.: On-line speech/music segmentation for broadcast news domain. In: 16th International Conference on Systems, Signals and Image Processing, IWSSIP 2009. IEEE (2009)
Bachu, R.G., et al.: Separation of voiced and unvoiced using zero crossing rate and energy of the speech signal. In: Zone Conference Proceedings on American Society for Engineering Education (ASEE), 2008
Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals, vol. 100. Prentice-hall, Englewood Cliffs (1978)
Kim, Y.E., Whitman, B.: Singer identification in popular music recordings using voice coding features. In: Proceedings of the 3rd International Conference on Music Information Retrieval, vol. 13 (2002)
Cole, R., Fanty, M.: Spoken letter recognition. In: Proceedings of Third DARPA Speech and Natural Language Workshop, 1990
Cole, R., et al.: Speaker-independent recognition of spoken English letters. In: International Joint Conference on Neural Networks, IJCNN, 1990. IEEE (1990)
Pols, L.: Real-time recognition of spoken words. IEEE Trans. Comput. 100(9), 972–978 (1971)
Nagarajan, T., Murthy, H.A., Hegde, R.M.: Segmentation of speech into syllable-like units. Energy 1(1.5), 2 (2003)
Greibus, M., Telksnys, L.: Rule based speech signal segmentation. J. Telecommun. Inf. Technol., 37–43 (2010)
Most Common Vocabulary Words in English. http://esl.about.com/library/vocabulary/bl1000_list1.htm
Most common words in mp3. https://groups.google.com/forum/#!topic/alt.english.usage/X1XkubTmxo4
Most Common English Words. http://www.rupert.id.au/resources/1-1000.txt
Find local maxima—MATLAB findpeaks—MathWorks India. http://www.mathworks.in/help/signal/ref/findpeaks.html
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer India
About this paper
Cite this paper
Roy, A., Phadikar, S. (2015). Automatic Segmentation of Spoken Word Signals into Letters Based on Amplitude Variation for Speech to Text Transcription. In: Mandal, J., Satapathy, S., Kumar Sanyal, M., Sarkar, P., Mukhopadhyay, A. (eds) Information Systems Design and Intelligent Applications. Advances in Intelligent Systems and Computing, vol 340. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2247-7_63
Download citation
DOI: https://doi.org/10.1007/978-81-322-2247-7_63
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2246-0
Online ISBN: 978-81-322-2247-7
eBook Packages: EngineeringEngineering (R0)