Automatic Segmentation of Spoken Word Signals into Letters Based on Amplitude Variation for Speech to Text Transcription

Roy, Anik; Phadikar, Santanu

doi:10.1007/978-81-322-2247-7_63

Anik Roy⁷ &
Santanu Phadikar⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 340))

2176 Accesses
2 Citations

Abstract

In this paper a technique for automatic segmentation of spoken word signals is presented for identifying letters for transcription into textual form. Signal patterns for each letter present in different words have been used for the purpose. Voice signals are obtained by taking pronunciations of 1,000 words available in the standard dictionary. After collecting the signals, pre-processing is performed to reduce the noise taking a heuristically determined threshold value. Then the signals are segmented based on Amplitude Variation (AV) in different portions of the signal, each corresponding to an alphabet in that particular word. Signal Peak Value (SPV) is the feature used for recognizing the letters. Accuracy of the method is estimated using Bagging, Bayes Net, J48, Naive Bayes, PART and SVM classifiers available in Weka. The best and the average classification accuracies obtained in this method are 95.15 % (given by J48 classifier) and 86.92 %, respectively, which are quite acceptable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Jalil, M., Butt, F.A., Malik, A.: A survey of different speech synthesis techniques. In: International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE). IEEE (2013)
Google Scholar
Tabet, Y., Boughazi, M.: Speech synthesis techniques. A survey. In: 7th International Workshop on Systems, Signal Processing and their Applications (WOSSPA). IEEE (2011)
Google Scholar
Breen, A.: Speech synthesis models: a review. Electron. Commun. Eng. J. 4(1), 19–31 (1992)
Article Google Scholar
Sidorov, M., et al.: Survey of automated speaker identification methods. In: 9th International Conference on Intelligent Environments (IE). IEEE (2013)
Google Scholar
Lawson, A., et al.: Survey and evaluation of acoustic features for speaker recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2011)
Google Scholar
Karim, R., Rahman, M.S., Iqbal, M.Z.: Recognition of spoken letters in Bangla. In: Proceedings of 5th International Conference on Computer and Information Technology (ICCIT02) (2002)
Google Scholar
Pelton, G.E.: Voice Processing. McGraw-Hill International Edition, New York (1993)
Google Scholar
Holmes, J.N.: Speech Synthesis and Recognition. Taylor & Francis, United Kingdom (2001)
Google Scholar
Rudnicky, A.I., Hauptmann, A.G., Lee, K.-F.: Survey of current speech technology. Commun. ACM 37(1), 52–57 (1994)
Article Google Scholar
Verma, P., Rao, P.: Real-time melodic accompaniment system for indian music using TMS320C6713. In: 25th International Conference on VLSI Design (VLSID). IEEE (2012)
Google Scholar
Al-Shoshan, A.I.: Speech and music classification and separation: a review. J. King Saud Univ. (2006)
Google Scholar
Razak, A.A., Abidin, M.I.Z., Komiya, R.: Emotion pitch variation analysis in Malay and English voice samples. In: The 9th Asia-Pacific Conference on Communications, APCC 2003, vol. 1. IEEE (2003)
Google Scholar
Kos, M., et al.: On-line speech/music segmentation for broadcast news domain. In: 16th International Conference on Systems, Signals and Image Processing, IWSSIP 2009. IEEE (2009)
Google Scholar
Bachu, R.G., et al.: Separation of voiced and unvoiced using zero crossing rate and energy of the speech signal. In: Zone Conference Proceedings on American Society for Engineering Education (ASEE), 2008
Google Scholar
Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals, vol. 100. Prentice-hall, Englewood Cliffs (1978)
Google Scholar
Kim, Y.E., Whitman, B.: Singer identification in popular music recordings using voice coding features. In: Proceedings of the 3rd International Conference on Music Information Retrieval, vol. 13 (2002)
Google Scholar
Cole, R., Fanty, M.: Spoken letter recognition. In: Proceedings of Third DARPA Speech and Natural Language Workshop, 1990
Google Scholar
Cole, R., et al.: Speaker-independent recognition of spoken English letters. In: International Joint Conference on Neural Networks, IJCNN, 1990. IEEE (1990)
Google Scholar
Pols, L.: Real-time recognition of spoken words. IEEE Trans. Comput. 100(9), 972–978 (1971)
Article Google Scholar
Nagarajan, T., Murthy, H.A., Hegde, R.M.: Segmentation of speech into syllable-like units. Energy 1(1.5), 2 (2003)
Google Scholar
Greibus, M., Telksnys, L.: Rule based speech signal segmentation. J. Telecommun. Inf. Technol., 37–43 (2010)
Google Scholar
Most Common Vocabulary Words in English. http://esl.about.com/library/vocabulary/bl1000_list1.htm
Most common words in mp3. https://groups.google.com/forum/#!topic/alt.english.usage/X1XkubTmxo4
Most Common English Words. http://www.rupert.id.au/resources/1-1000.txt
Find local maxima—MATLAB findpeaks—MathWorks India. http://www.mathworks.in/help/signal/ref/findpeaks.html

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, West Bengal University of Technology, BF-142, Salt Lake, Kolkata, 700064, West Bengal, India
Anik Roy & Santanu Phadikar

Authors

Anik Roy
View author publications
You can also search for this author in PubMed Google Scholar
Santanu Phadikar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anik Roy .

Editor information

Editors and Affiliations

University of Kalyanai, Kalyanai, West Bengal, India
J. K. Mandal
Department of Computer Science and Engineering, Anil Neerukonda Institute of Technology and Sciences, Vishakapatnam, India
Suresh Chandra Satapathy
Dean, Faculty of Engineering, Technology, University of Kalyani, Kalyani, West Bengal, India
Manas Kumar Sanyal
Engineering and Technological Studies, University of Kalyani, Kalyani, West Bengal, India
Partha Pratim Sarkar
Department Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, India
Anirban Mukhopadhyay

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Roy, A., Phadikar, S. (2015). Automatic Segmentation of Spoken Word Signals into Letters Based on Amplitude Variation for Speech to Text Transcription. In: Mandal, J., Satapathy, S., Kumar Sanyal, M., Sarkar, P., Mukhopadhyay, A. (eds) Information Systems Design and Intelligent Applications. Advances in Intelligent Systems and Computing, vol 340. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2247-7_63

Download citation

DOI: https://doi.org/10.1007/978-81-322-2247-7_63
Published: 21 January 2015
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2246-0
Online ISBN: 978-81-322-2247-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics