• Mousmita Sarma
  • Kandarpa Kumar Sarma
Part of the Studies in Computational Intelligence book series (SCI, volume 550)


Speech is a naturally occuring non-stationary signal essential not only for person-to-person communication but also is an important aspect of human–computer interaction (HCI). Some of the issues related to analysis and design of speech-based applications for HCI have received widespread attention. Some of these issues are covered in this chapter which is used as background and motivation for the work included in the subsequent portion of the book.


Speech Artificial neural network Phoneme Segmentation RNN SOM 


  1. 1.
    Rabiner LR, Schafer RW (2009) Digital processing of speech signals, 3rd edn. Pearson Education, Dorling Kindersley (India) Pvt. Ltd, Delhi, IndiaGoogle Scholar
  2. 2.
    Haykin S (2009) Neural network and learning machine, 3rd edn. PHI Learning Private Limited, New Delhi, IndiaGoogle Scholar
  3. 3.
    Snell RC, Milinazzo F (1993) Formant location from LPC analysis data. IEEE Trans Speech Audio process 1(2):129–134Google Scholar
  4. 4.
    Templeton PD, Guillemin BJ (1990) Speaker identification based on vowel sounds using neural networks. In: Proceedings of 3rd international conference on speech science and technology, Melbourne, Australia, pp 280–285Google Scholar
  5. 5.
  6. 6.
    Hemert JPV (1991) Automatic segmentation of speech. IEEE Trans Signal Process 39(4):1008–1012CrossRefGoogle Scholar
  7. 7.
    Sweldens W, Deng B, Jawerth BD, Peters G (1993) Wavelet probing for compression based segmentation. In: Proceedings of SPIE conference, vol 2034, pp 266–276Google Scholar
  8. 8.
    Tang BT, Lang R, Schroder H (1994) Applying wavelet analysis to speech segmentation and classification. Wavelet applications. In: Proceedings of SPIE, vol 2242Google Scholar
  9. 9.
    Wendt C, Petropulu AP, Peters G (1996) Pitch determination and speech segmentation using the discrete wavelet transform. In: Proceedings of IEEE international symposium on circuits and systems, vol 2Google Scholar
  10. 10.
    Suh Y, Lee Y (1996) Phoneme segmentation of continuous speech using multi-layer perceptron. In: Proceedings of ICSLPGoogle Scholar
  11. 11.
    Shastri L, Chang S, Greenberg S (1999) Syllable detection and segmentation using temporal flowneural networks. In: Proceedings of the 14th international congress of phonetic sciences, San FranciscoGoogle Scholar
  12. 12.
    Gomez JA, Castro MJ (2002) Automatic segmentation of speech at the phonetic level. In: Structural, syntactic, and statistical pattern recognition. Lecture notes in computer science, vol 2396, pp 672–680Google Scholar
  13. 13.
    Nagarajan T, Murthy HA, Hegde RM (2003) Segmentation of speech into syllable-like units. In: Proceedings of EUROSPEECH, Geneva, SwitzerlandGoogle Scholar
  14. 14.
    Zioko B, Manandhar S, Wilson RC (2006) Phoneme segmentation of speech. In: Proceedings of 18th international conference on pattern recognition, vol 4Google Scholar
  15. 15.
    Awais MM, Ahmad W, Masud S, Shamail S (2006) Continuous Arabic speech segmentation using fft spectrogram. In: Proceedings of innovations in information technology, pp 1–6Google Scholar
  16. 16.
    Huggins-Daines D, Rudnicky AI (2006) A constrained Baum-Welch algorithm for improved phoneme segmentation and efficient training. In: Proceedings of interspeechGoogle Scholar
  17. 17.
    Zibert J, Pavesic N, Mihelic F (2006) Speech/non-speech segmentation based on phoneme recognition features. EURASIP J Appl Signal Process 2006:113CrossRefGoogle Scholar
  18. 18.
    Kuo J, Lo H, Wang H (2007) Improved HMM vs. SVM methods for automatic phoneme segmentation. In: Proceedings of INTERSPEECH, pp 2057–2060Google Scholar
  19. 19.
    Almpanidis G, Kotropoulos C (2007) Automatic phonemic segmentation using the Bayesian information criterion with generalised gamma priors. In: Proceedings of 15th European signal processing conference, Poznan, PolandGoogle Scholar
  20. 20.
    Qiao Y (2008) On unsupervised optimal phoneme segmentation. In: Proceedings of IEEE international conference on acoustics, speech and signal processing, pp 3989–3992Google Scholar
  21. 21.
    Qiao Y, Shimomura N, Minematsu N (2008) Unsupervised optimal phoneme segmentation: objectives, algorithm and comparisons. In: Proceedings of ICASSP, pp 3989–3992Google Scholar
  22. 22.
    Almpanidis G, Kotropoulos C (2008) Phonemic segmentation using the generalised gamma distribution and small sample bayesian information criterion. J Speech Commun 50(1):38–55CrossRefGoogle Scholar
  23. 23.
    Miller M, Stoytchev A (2008) Unsupervised audio speech segmentation using the voting experts algorithm. Available via
  24. 24.
    Jurado RS, Gomez-Gil P, Garcia CAR (2009) Speech text-independent segmentation using an improvement method for identification of phoneme boundaries. In: Proceedings of international conference on electrical, communications, and computers, pp 20–24Google Scholar
  25. 25.
    Patil V, Joshi S, Rao P (2009) Improving the robustness of phonetic segmentation to accent and style variation with a two-staged approach. In: Proceedings of the INTERSPEECH, ISCA, pp 2543–2546Google Scholar
  26. 26.
    Bharathi B, Prathiba K (2010) A novel approach for automatic phoneme segmentation. In: Proceedings of international conference on information science and applications, ChennaiGoogle Scholar
  27. 27.
    Zioko M, Gaka J, Zioko B, Drwiega T (2010) Perceptual wavelet decomposition for speech segmentation. In: Proceedings of INTERSPEECH, Makuhari, Chiba, JapanGoogle Scholar
  28. 28.
    Kalinli O (2012) Automatic phoneme segmentation using auditory attention features. In: Proceedings of INTERSPEECH, ISCAGoogle Scholar
  29. 29.
    King S, Hasegawa-Johnson M (2013) Accurate speech segmentation by mimicking human auditory processing. In: Proceedings of ICASSP, Vancouvar, CanadaGoogle Scholar
  30. 30.
    Kohonen T (1988) The neural phonetic typewriter. Computer 21(3):11–22CrossRefGoogle Scholar
  31. 31.
    Lang KJ, Waibel AH (1990) A time-delay neural network architecture for isolated word recognition. Neural Netw 3:23–43CrossRefGoogle Scholar
  32. 32.
    Singer E, Lippmann RP (1992) A speech recognizer using radial basis function neural networks in an HMM framework. In: Proceedings of the IEEE ICASSPGoogle Scholar
  33. 33.
    Hild H, Waibel A (1993) Multi-speaker/speaker-independent architectures for the multi-state time delay neural network. In: Proceedings of the IEEE ICNNGoogle Scholar
  34. 34.
    Hinton G, Deng L, Yu D, Dahl GE, Mohamed A, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN, Kingsbury B (2012) Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process Mag 82–97Google Scholar
  35. 35.
    Jurafsky D, Martin JH (2000) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 1st edn. Prentice Hall, New JerseyGoogle Scholar
  36. 36.
    Paul AK, Das D, Kamal M (2009) Bangla speech recognition system using LPC and ANN. In: Proceedings of 7th international conference on advances in pattern recognition, pp 04–06Google Scholar
  37. 37.
    Dede G, Sazl MH (2010) Speech recognition with artificial neural networks. Digit Signal Process 20(3):763–768CrossRefGoogle Scholar
  38. 38.
    Ahmad AM, Ismail S, Samaon DF (2004) Recurrent neural network with backpropagation through time for speech recognition. In: Proccedings of international symposium on communications and information technologies, Sapporo, JapanGoogle Scholar
  39. 39.
    Robinson T, Hochberg M, Renals S (1994) IPA: improved phone modelling with recurrent neural networks. In: Proceedings of IEEE ICASSPGoogle Scholar
  40. 40.
    Lee T, Ching PC, Chan LW (1995) An RNN based speech recognition system with discriminative training. In: Proceedings of the 4th European conference on speech communication and technology, pp 1667–1670Google Scholar
  41. 41.
    Jamieson LHRC (1996) Experiments on the implementation of recurrent neural networks for speech phone recognition. In: Proceedings of the 30th annual asilomar conference on signals, systems and computers, Pacific Grove, California, Nov 1996, pp 779–782Google Scholar
  42. 42.
    Koizumi T, Mori M, Taniguchi S, Maruya M (1996) Recurrent neural networks for phoneme recognition. In: Proceedings 4th international conference ICSLP 96, vol 1, pp 326–329Google Scholar
  43. 43.
    Rothkrantz LJM, Nollen D (1999) Speech recognition using Elman neural networks. In: Text, speech and dialogue. Lecture notes in computer science, vol 1692, pp 146–151Google Scholar
  44. 44.
    Yan ZX, Yu W, Wei X (2001) Speech recognition model based on recurrent neural networks. Available via
  45. 45.
    Sun Y, Bosch LT, Boves L (2010) Hybrid HMM/BLstm-Rnn for robust speech recognition. In: Proceedings of 13th international conference on text, speech and dialogue. Springer, Berlin, Heidelberg, pp 400–407Google Scholar
  46. 46.
    Vinyals O, Ravuri SV, Povey D (2012) Revisiting recurrent neural networks for robust ASR. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP)Google Scholar
  47. 47.
    Mikolov T, Zweig G (2012) Context dependent recurrent neural network language model. In: Proceedings of the IEEE spoken language technology workshop (SLT), Miami, FL, USA, pp 234–239Google Scholar
  48. 48.
    Kasabov N, Nikovski D, Peev E (1993) Speech recognition based on Kohonen self organizing feature maps and hybrid connectionist systems. In: Proceedings of 1st New Zealand international two-stream conference artificial neural networks and expert systems, pp 113–117Google Scholar
  49. 49.
    Beauge L, Durand S, Alexandre F (1993) Plausible self-organizing maps for speech recognition. In: Artificial neural nets and genetic algorithms, pp 221–226Google Scholar
  50. 50.
    Venkateswarlu RLK, Novel RV (2011) Approach for speech recognition by using self organized maps. In: 2011 international conference on emerging trends in networks and computer communications (ETNCC), pp 215–222Google Scholar
  51. 51.
    Kohonen T, Somervuo P (1997) Self-organizing maps of symbol strings with application to speech recognition. In: Proceedings of 1st international workshop on self organizing map, pp 2–7Google Scholar

Copyright information

© Springer India 2014

Authors and Affiliations

  1. 1.Department of Electronics and Communication EngineeringGauhati UniversityGuwahatiIndia
  2. 2.Department of Electronics and Communication TechnologyGauhati UniversityGuwahatiIndia

Personalised recommendations