Automatic Segmentation of Spoken Word Signals into Letters Based on Amplitude Variation for Speech to Text Transcription

Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 340)


In this paper a technique for automatic segmentation of spoken word signals is presented for identifying letters for transcription into textual form. Signal patterns for each letter present in different words have been used for the purpose. Voice signals are obtained by taking pronunciations of 1,000 words available in the standard dictionary. After collecting the signals, pre-processing is performed to reduce the noise taking a heuristically determined threshold value. Then the signals are segmented based on Amplitude Variation (AV) in different portions of the signal, each corresponding to an alphabet in that particular word. Signal Peak Value (SPV) is the feature used for recognizing the letters. Accuracy of the method is estimated using Bagging, Bayes Net, J48, Naive Bayes, PART and SVM classifiers available in Weka. The best and the average classification accuracies obtained in this method are 95.15 % (given by J48 classifier) and 86.92 %, respectively, which are quite acceptable.


Amplitude variation (AV) Letter Segmentation Signal peak values (SPV) Speech to text 


  1. 1.
    Jalil, M., Butt, F.A., Malik, A.: A survey of different speech synthesis techniques. In: International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE). IEEE (2013)Google Scholar
  2. 2.
    Tabet, Y., Boughazi, M.: Speech synthesis techniques. A survey. In: 7th International Workshop on Systems, Signal Processing and their Applications (WOSSPA). IEEE (2011)Google Scholar
  3. 3.
    Breen, A.: Speech synthesis models: a review. Electron. Commun. Eng. J. 4(1), 19–31 (1992)CrossRefGoogle Scholar
  4. 4.
    Sidorov, M., et al.: Survey of automated speaker identification methods. In: 9th International Conference on Intelligent Environments (IE). IEEE (2013)Google Scholar
  5. 5.
    Lawson, A., et al.: Survey and evaluation of acoustic features for speaker recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2011)Google Scholar
  6. 6.
    Karim, R., Rahman, M.S., Iqbal, M.Z.: Recognition of spoken letters in Bangla. In: Proceedings of 5th International Conference on Computer and Information Technology (ICCIT02) (2002)Google Scholar
  7. 7.
    Pelton, G.E.: Voice Processing. McGraw-Hill International Edition, New York (1993)Google Scholar
  8. 8.
    Holmes, J.N.: Speech Synthesis and Recognition. Taylor & Francis, United Kingdom (2001)Google Scholar
  9. 9.
    Rudnicky, A.I., Hauptmann, A.G., Lee, K.-F.: Survey of current speech technology. Commun. ACM 37(1), 52–57 (1994)CrossRefGoogle Scholar
  10. 10.
    Verma, P., Rao, P.: Real-time melodic accompaniment system for indian music using TMS320C6713. In: 25th International Conference on VLSI Design (VLSID). IEEE (2012)Google Scholar
  11. 11.
    Al-Shoshan, A.I.: Speech and music classification and separation: a review. J. King Saud Univ. (2006)Google Scholar
  12. 12.
    Razak, A.A., Abidin, M.I.Z., Komiya, R.: Emotion pitch variation analysis in Malay and English voice samples. In: The 9th Asia-Pacific Conference on Communications, APCC 2003, vol. 1. IEEE (2003)Google Scholar
  13. 13.
    Kos, M., et al.: On-line speech/music segmentation for broadcast news domain. In: 16th International Conference on Systems, Signals and Image Processing, IWSSIP 2009. IEEE (2009)Google Scholar
  14. 14.
    Bachu, R.G., et al.: Separation of voiced and unvoiced using zero crossing rate and energy of the speech signal. In: Zone Conference Proceedings on American Society for Engineering Education (ASEE), 2008Google Scholar
  15. 15.
    Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals, vol. 100. Prentice-hall, Englewood Cliffs (1978)Google Scholar
  16. 16.
    Kim, Y.E., Whitman, B.: Singer identification in popular music recordings using voice coding features. In: Proceedings of the 3rd International Conference on Music Information Retrieval, vol. 13 (2002)Google Scholar
  17. 17.
    Cole, R., Fanty, M.: Spoken letter recognition. In: Proceedings of Third DARPA Speech and Natural Language Workshop, 1990Google Scholar
  18. 18.
    Cole, R., et al.: Speaker-independent recognition of spoken English letters. In: International Joint Conference on Neural Networks, IJCNN, 1990. IEEE (1990)Google Scholar
  19. 19.
    Pols, L.: Real-time recognition of spoken words. IEEE Trans. Comput. 100(9), 972–978 (1971)CrossRefGoogle Scholar
  20. 20.
    Nagarajan, T., Murthy, H.A., Hegde, R.M.: Segmentation of speech into syllable-like units. Energy 1(1.5), 2 (2003)Google Scholar
  21. 21.
    Greibus, M., Telksnys, L.: Rule based speech signal segmentation. J. Telecommun. Inf. Technol., 37–43 (2010)Google Scholar
  22. 22.
    Most Common Vocabulary Words in English.
  23. 23.
  24. 24.
  25. 25.
    Find local maxima—MATLAB findpeaks—MathWorks India.

Copyright information

© Springer India 2015

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringWest Bengal University of TechnologySalt Lake, KolkataIndia

Personalised recommendations