Skip to main content

Automatic Segmentation of Spoken Word Signals into Letters Based on Amplitude Variation for Speech to Text Transcription

  • Conference paper
  • First Online:
Information Systems Design and Intelligent Applications

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 340))

Abstract

In this paper a technique for automatic segmentation of spoken word signals is presented for identifying letters for transcription into textual form. Signal patterns for each letter present in different words have been used for the purpose. Voice signals are obtained by taking pronunciations of 1,000 words available in the standard dictionary. After collecting the signals, pre-processing is performed to reduce the noise taking a heuristically determined threshold value. Then the signals are segmented based on Amplitude Variation (AV) in different portions of the signal, each corresponding to an alphabet in that particular word. Signal Peak Value (SPV) is the feature used for recognizing the letters. Accuracy of the method is estimated using Bagging, Bayes Net, J48, Naive Bayes, PART and SVM classifiers available in Weka. The best and the average classification accuracies obtained in this method are 95.15 % (given by J48 classifier) and 86.92 %, respectively, which are quite acceptable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jalil, M., Butt, F.A., Malik, A.: A survey of different speech synthesis techniques. In: International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE). IEEE (2013)

    Google Scholar 

  2. Tabet, Y., Boughazi, M.: Speech synthesis techniques. A survey. In: 7th International Workshop on Systems, Signal Processing and their Applications (WOSSPA). IEEE (2011)

    Google Scholar 

  3. Breen, A.: Speech synthesis models: a review. Electron. Commun. Eng. J. 4(1), 19–31 (1992)

    Article  Google Scholar 

  4. Sidorov, M., et al.: Survey of automated speaker identification methods. In: 9th International Conference on Intelligent Environments (IE). IEEE (2013)

    Google Scholar 

  5. Lawson, A., et al.: Survey and evaluation of acoustic features for speaker recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2011)

    Google Scholar 

  6. Karim, R., Rahman, M.S., Iqbal, M.Z.: Recognition of spoken letters in Bangla. In: Proceedings of 5th International Conference on Computer and Information Technology (ICCIT02) (2002)

    Google Scholar 

  7. Pelton, G.E.: Voice Processing. McGraw-Hill International Edition, New York (1993)

    Google Scholar 

  8. Holmes, J.N.: Speech Synthesis and Recognition. Taylor & Francis, United Kingdom (2001)

    Google Scholar 

  9. Rudnicky, A.I., Hauptmann, A.G., Lee, K.-F.: Survey of current speech technology. Commun. ACM 37(1), 52–57 (1994)

    Article  Google Scholar 

  10. Verma, P., Rao, P.: Real-time melodic accompaniment system for indian music using TMS320C6713. In: 25th International Conference on VLSI Design (VLSID). IEEE (2012)

    Google Scholar 

  11. Al-Shoshan, A.I.: Speech and music classification and separation: a review. J. King Saud Univ. (2006)

    Google Scholar 

  12. Razak, A.A., Abidin, M.I.Z., Komiya, R.: Emotion pitch variation analysis in Malay and English voice samples. In: The 9th Asia-Pacific Conference on Communications, APCC 2003, vol. 1. IEEE (2003)

    Google Scholar 

  13. Kos, M., et al.: On-line speech/music segmentation for broadcast news domain. In: 16th International Conference on Systems, Signals and Image Processing, IWSSIP 2009. IEEE (2009)

    Google Scholar 

  14. Bachu, R.G., et al.: Separation of voiced and unvoiced using zero crossing rate and energy of the speech signal. In: Zone Conference Proceedings on American Society for Engineering Education (ASEE), 2008

    Google Scholar 

  15. Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals, vol. 100. Prentice-hall, Englewood Cliffs (1978)

    Google Scholar 

  16. Kim, Y.E., Whitman, B.: Singer identification in popular music recordings using voice coding features. In: Proceedings of the 3rd International Conference on Music Information Retrieval, vol. 13 (2002)

    Google Scholar 

  17. Cole, R., Fanty, M.: Spoken letter recognition. In: Proceedings of Third DARPA Speech and Natural Language Workshop, 1990

    Google Scholar 

  18. Cole, R., et al.: Speaker-independent recognition of spoken English letters. In: International Joint Conference on Neural Networks, IJCNN, 1990. IEEE (1990)

    Google Scholar 

  19. Pols, L.: Real-time recognition of spoken words. IEEE Trans. Comput. 100(9), 972–978 (1971)

    Article  Google Scholar 

  20. Nagarajan, T., Murthy, H.A., Hegde, R.M.: Segmentation of speech into syllable-like units. Energy 1(1.5), 2 (2003)

    Google Scholar 

  21. Greibus, M., Telksnys, L.: Rule based speech signal segmentation. J. Telecommun. Inf. Technol., 37–43 (2010)

    Google Scholar 

  22. Most Common Vocabulary Words in English. http://esl.about.com/library/vocabulary/bl1000_list1.htm

  23. Most common words in mp3. https://groups.google.com/forum/#!topic/alt.english.usage/X1XkubTmxo4

  24. Most Common English Words. http://www.rupert.id.au/resources/1-1000.txt

  25. Find local maxima—MATLAB findpeaks—MathWorks India. http://www.mathworks.in/help/signal/ref/findpeaks.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anik Roy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer India

About this paper

Cite this paper

Roy, A., Phadikar, S. (2015). Automatic Segmentation of Spoken Word Signals into Letters Based on Amplitude Variation for Speech to Text Transcription. In: Mandal, J., Satapathy, S., Kumar Sanyal, M., Sarkar, P., Mukhopadhyay, A. (eds) Information Systems Design and Intelligent Applications. Advances in Intelligent Systems and Computing, vol 340. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2247-7_63

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-2247-7_63

  • Published:

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-2246-0

  • Online ISBN: 978-81-322-2247-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics