Consonant–Vowel Recognition in the Presence of Coding and Background Noise

  • K. Sreenivasa Rao
  • Anil Kumar Vuppala
Chapter
Part of the SpringerBriefs in Electrical and Computer Engineering book series (BRIEFSELECTRIC)

Abstract

In this chapter, an approach for improving the recognition performance of CV units under clean, coded, and noisy conditions is presented. Proposed CV recognition method is carried out in two stages. In the first stage vowel category of CV unit is recognized, and in the second stage consonant category is recognized. At each stage of the proposed method, complementary evidences from support vector machine (SVM) and hidden Markov models (HMM) are combined for enhancing the recognition performance of CV units. In the proposed CV recognition approach, VOP is used as an anchor point for extracting features from the CV unit. Therefore, VOP detection methods presented in previous chapter are used for this work. Performance of the proposed CV recognition method is demonstrated under coding and noisy conditions. Recognition studies are carried out using isolated CV and CV units from Telugu broadcast news databases. Further, performance of the CV recognition system under background noise is improved by using combined temporal and spectral processing-based preprocessing methods.

References

  1. 1.
    K.N. Stevens, Acoustic Phonetics (MIT Press, Cambridge, MA, 1999)Google Scholar
  2. 2.
    D. Crystal, A Dictionary of Linguistics and Phonetics (Basil Blackwell, Cambridge, Massachusetts, 1985)Google Scholar
  3. 3.
    M.A. Jack, J. Laver, Aspects of Speech Technology (Edinburgh university press, Edinburgh, 1988)Google Scholar
  4. 4.
    S.R.M. Prasanna, Event-based analysis of speech, PhD thesis, IIT Madras, March 2004Google Scholar
  5. 5.
    S.R.M. Prasanna, S.V. Gangashetty, B. Yegnanarayana, Significance of vowel onset point for speech analysis, in Proc. of Int. Conf. Signal Processing and Communications, (Bangalore, India, 2001), pp. 81–88Google Scholar
  6. 6.
    K.S. Rao, Voice conversion by mapping the speaker-specific features using pitch synchronous approach. Comput. Speech Lang. 24, 474–494 (2010)CrossRefGoogle Scholar
  7. 7.
    D.J. Hermes, Vowel onset detection. J. Acoust. Soc. Am. 87, 866–873 (1990)CrossRefGoogle Scholar
  8. 8.
    J.-H. Wang, S.-H. Chen, A C/V segmentation algorithm for Mandarin speech using wavelet transforms, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Phoenix, Arizona, 1999), pp. 1261–1264Google Scholar
  9. 9.
    S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Detection of vowel onset points in continuous speech using autoassociative neural network models, in Proc. Int. Conf. Spoken Language Processing, (Jeju Island, Korea, 2004), pp. 401–410Google Scholar
  10. 10.
    J.-F. Wang, C.H. Wu, S.H. Chang, J.Y. Lee, A hierarchical neural network based C/V segmentation algorithm for Mandarin speech recognition. IEEE Trans. Signal Process. 39(9), 2141–2146 (1991)CrossRefGoogle Scholar
  11. 11.
    S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana., Extraction of fixed dimension patterns from varying duration segments of consonant-vowel utterances, in Proc. of IEEE ICISIP, pp. 159–164, 2004Google Scholar
  12. 12.
    S.R.M. Prasanna, B. Yegnanarayana, Detection of vowel onset point events using excitation source information, in Proc. of Interspeech (Lisbon, Portugal, 2005), pp. 1133–1136Google Scholar
  13. 13.
    A. Kazemzadeh, J. Tepperman, J. Silva, H. You, S. Lee, A. Alwan, S. Narayanan, Automatic detection of voice onset time contrasts for use in pronunciation assessment, in Proc. Int. Conf. Spoken Language Processing (Pittsburgh, PA, USA, 2006)Google Scholar
  14. 14.
    V. Stouten, H.V. hamme, Automatic voice onset time estimation from reassignment spectra. Speech Comm. 51, 1194–1205 (2009)Google Scholar
  15. 15.
    S.R.M. Prasanna, B.V.S. Reddy, P. Krishnamoorthy, Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Audio Speech Lang. Process. 17, 556–565 (2009)CrossRefGoogle Scholar
  16. 16.
    K.S. Rao, B. Yegnanarayana, Duration modification using glottal closure instants and vowel onset points. Speech Comm. 51, 1263–1269 (2009)CrossRefGoogle Scholar
  17. 17.
    K.S. Rao, A.K. Vuppala, Non-uniform time scale modification using instants of significant excitation and vowel onset points. Speech Comm. (Elsevier) 55(6), 745–756 (2013)Google Scholar
  18. 18.
    J.H.L. Hansen, S.S. Gray, W. Kim, Automatic voice onset time detection for unvoiced stops (/p/,/t/,/k/) with application to accent classification. Speech Comm. 52, 777–789 (2010)CrossRefGoogle Scholar
  19. 19.
    C. Prakash, N. Dhananjaya, S. Gangashetty, Bessel features for detection of voice onset time using AM-FM signal, in Proc. of Int. Conf. on the Systems, Signals and Image Processing (IWSSIP), (IEEE, Sarajevo, Bosnia and Herzegovina, 2011), pp. 1–4Google Scholar
  20. 20.
    D. Zaykovskiy, Survey of the speech recognition techniques for mobile devices, in Proc. of DS Publications, 2006Google Scholar
  21. 21.
    Z.H. Tan, B. Lindberg, Automatic Speech Recognition on Mobile Devices and over Communication Networks (Springer, London, 2008)CrossRefMATHGoogle Scholar
  22. 22.
    J.M. Huerta, Speech recognition in mobile environments, PhD thesis, Carnegie Mellon University, Apr. 2000Google Scholar
  23. 23.
    A.M. Peinado, J.C. Segura, Speech Recognition over Digital Channels (Wiley, New York, 2006)CrossRefGoogle Scholar
  24. 24.
    S. Kafley, A.K. Vuppala, A. Chauhan, K.S. Rao, “Continuous digit recognition in mobile environment,” in Proc. of IEEE Techsym (IIT Kharagpur, India, 2010), pp. 217–222Google Scholar
  25. 25.
    A.M. Gomez, A.M. Peinado, V. Sanchez, A.J. Rubio, Recognition of coded speech transmitted over wireless channels. IEEE Trans. Wireless Comm. 5, 2555–2562 (2006)CrossRefGoogle Scholar
  26. 26.
    S. Euler, J. Zinke, The influence of speech coding algorithms on automatic speech recognition, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Adelaide, Australia, 1994), pp. 621–624Google Scholar
  27. 27.
    B.T. Lilly, K.K. Paliwal, Effect of speech coders on speech recognition performance, in Proc. Int. Conf. Spoken Language Processing (Philadelphia, PA, USA, 1996), pp. 2344–2347Google Scholar
  28. 28.
    A. Gallardo-Antolin, C. Pelaez-Moreno, F.D. de Maria, Recognizing GSM digital speech. IEEE Trans. Speech Audio Process 13(6), 1186–1205 (2005)CrossRefGoogle Scholar
  29. 29.
    F. Quatieri, E. Singer, R.B. Dunn, D.A. Reynolds, J.P. Campbell, Speaker and language recognition using speech codec parameters, in Proc. of Eurospeech (Budapest, Hungary, 1999), pp. 787–790Google Scholar
  30. 30.
    R.B. Dunn, T.F. Quatieri, D.A. Reynolds, J.P. Campbell, Speaker recognition from coded speech in matched and mismatched condition, in Proc. of Speaker Recognition Workshop (Crete, Greece, 1999), pp. 115–120Google Scholar
  31. 31.
    R. Dunn, T. Quatieri, D. Reynolds, J. Campbell, Speaker recognition from coded speech and the effects of score normalization, in Proc. of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (IEEE, Monterery, CA, USA, 2001), pp. 1562–1567Google Scholar
  32. 32.
    A. Krobba, M. Debyeche, A. Amrouche, Evaluation of speaker identification system using GSM-EFR speech data, in Proc. of Int. Conf. on Design and Technology of Integrated Systems (Nanoscale Era Hammamet, 2010), pp. 1–5Google Scholar
  33. 33.
    A. Janicki, T. Staroszczyk, Speaker recognition from coded speech using support vector machines, in Proc. of 4th Int. Conf. on Text, Speech and Dialogue (Springer, Pilsen, Czech Republic, 2011), pp. 291–298Google Scholar
  34. 34.
    C. Mokbel, G. Chollet, Speech recognition in adverse environments: speech enhancement and spectral transformations, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Toronto, Ontario, Canada, 1991)Google Scholar
  35. 35.
    J.A. Nolazco-Flores, S. Young, CSS-PMC: a combined enhancement/compensation scheme for continuous speech recognition in noise. Cambridge University Engineering Department. Technical Report, 1993Google Scholar
  36. 36.
    J. Huang, Y. Zhao, Energy-constrained signal subspace method for speech enhancement and recognition. IEEE Signal Process. Lett. 4, 283–285 (1997)CrossRefGoogle Scholar
  37. 37.
    K. Hermus, W. Verhelst, P. Wambacq, Optimized subspace weighting for robust speech recognition in additive noise environments, in Proc. of ICSLP (Beijing, China, 2000), pp. 542–545Google Scholar
  38. 38.
    K. Hermus, P. Wambacq, Assessment of signal subspace based speech enhancement for noise robust speech recognition, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Montreal, Canada, 2004), pp. 945–948Google Scholar
  39. 39.
    H. Kris, W. Patrick, V.H. Hugo, A review of signal subspace speech enhancement and its application to noise robust speech recognition. EURASIP J. Appl. Signal Process. 195–209 (2007)Google Scholar
  40. 40.
    H. Hermanski, N. Morgan, H.G. Hirsch, Recognition of speech in additive and convolutional noise based on RASTA spectral processing, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Adelaide, Australia, 1994)Google Scholar
  41. 41.
    O. Viiki, B. Bye, K. Laurila, A recursive feature vector normalization approach for robust speech recognition in noise, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Seattle, USA, 1998)Google Scholar
  42. 42.
    D. Yu, L. Deng, J. Droppo, J. Wu, Y. Gong, A. Acero, A minimum-mean-square-error noise reduction algorithm on mel-frequency cepstra for robust speech recognition, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Las Vegas, USA, 2008), pp. 4041–4044Google Scholar
  43. 43.
    X. Cui, A. Alwan, Noise robust speech recognition using feature compensation based on polynomial regression of utterance SNR. IEEE Trans. Speech Audio Process. 13, 1161–1172 (2005)CrossRefGoogle Scholar
  44. 44.
    F. Hilger, H. Ney, Quantile based histogram equalization for noise robust large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 14(3), 845–854 (2006)CrossRefGoogle Scholar
  45. 45.
    A. de la Torre, A.M. Peinado, J.C. Segura, J.L. Perez-Cordoba, M.C. Benitez, A.J. Rubio, Histogram equalization of speech representation for robust speech recognition. IEEE Trans. Speech Audio Process. 13(3), 355–366 (2005)CrossRefGoogle Scholar
  46. 46.
    Y. Suh, M. Ji, H. Kim, Probabilistic class histogram equalization for robust speech recognition. IEEE Signal Process. Lett. 14(4), 287–290 (2007)CrossRefGoogle Scholar
  47. 47.
    K. Ohkura, M. Sugiyama, Speech recognition in a noisy environment using a noise reduction neural network and a codebook mapping technique, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Toronto, Canada, 1991)Google Scholar
  48. 48.
    M. Gales, S.Young, S.J. Young, Robust continuous speech recognition using parallel model combination. IEEE Trans. Speech Audio Process. 4(5), 352–359 (1996)Google Scholar
  49. 49.
    P.J. Moreno, Speech Recognition in Noisy Environments, PhD thesis, Carnegie Mellon University, 1996Google Scholar
  50. 50.
    S.V. Vaseghi, B.P. Milner, Noise compensation methods for hidden Markov model speech recognition in adverse environments. IEEE Trans. Speech Audio Process. 5, 11–21 (1997)CrossRefGoogle Scholar
  51. 51.
    H. Liao, M.J.F. Gales, Adaptive training with joint uncertainty decoding for robust recognition of noisy data, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Honolulu, USA, 2007), pp. 389–392Google Scholar
  52. 52.
    O. Kalinli, M.L. Seltzer, J. Droppo, A. Acero, Noise adaptive training for robust automatic speech recognition. IEEE Trans. Audio, Speech Lang. Process. 18(8), 1889–1901 (2010)Google Scholar
  53. 53.
    D.K. Kim, M.J.F. Gales, Noisy constrained maximum-likelihood linear regression for noise-robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 19(2), 315–325 (2011)CrossRefGoogle Scholar
  54. 54.
    S.V. Gangashetty, Neural network models for recognition of consonant-vowel units of speech in Multiple Languages, PhD thesis, IIT Madras, October 2004Google Scholar
  55. 55.
    C.C. Sekhar, Neural Network models for recognition of stop consonant-vowel (SCV) segments in continuous speech, PhD thesis, IIT Madras, 1996Google Scholar
  56. 56.
    K.S. Rao, Application of prosody models for developing speech systems in indian languages. Int. J. Speech Tech. (Springer) 14, 19–33 (2011)CrossRefGoogle Scholar
  57. 57.
    C.C. Sekhar, W.F. Lee, K. Takeda, F. Itakura, Acoustic modeling of subword units using support vector machines, in Proc. of WSLP (Mumbai, India, 2003)Google Scholar
  58. 58.
    S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Combining evidence from multiple classifiers for recognition of consonant-vowel units of speech in multiple languages, in Proc. of ICISIP (Chennai, India, 2005), pp. 387–391Google Scholar
  59. 59.
    K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Audio Speech Lang. Process. 14, 972–980 (2006)CrossRefGoogle Scholar
  60. 60.
    E. Moulines, J. Laroche, Non-parametric techniques for pitch-scale and time-scale modification of speech. Speech Comm. 16, 175–205 (1995)CrossRefGoogle Scholar
  61. 61.
    M.R. Portnoff, Time-scale modification of speech based on short-time Fourier analysis. IEEE Trans. Acoust. Speech Signal Process. 29, 374–390 (1981)CrossRefMathSciNetGoogle Scholar
  62. 62.
    H.G. Ilk, S. Guler, Adaptive time scale modification of speech for graceful degrading voice quality in congested networks for VoIP applications. Signal Process. 86, 127–139 (2006)CrossRefMATHGoogle Scholar
  63. 63.
    K.S. Rao, Real time prosody modification. J. Signal Inform. Process. 50–62 (2010)Google Scholar
  64. 64.
    T.F. Quatieri, R.J. McAulay, Shape invariant time-scale and pitch modification of speech. IEEE Signal Process. 40, 497–510 (1992)CrossRefGoogle Scholar
  65. 65.
    J. di Marino, Y. Laprie, Supression of phasiness for time-scale modifications of speech signals based on a shape invarience property, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Saltlake city, Utah, USA, 2001)Google Scholar
  66. 66.
    E. Moulines, F. Charpentier, Pitch-synchronous waveform processing techniques for text to speech synthesis using diphones. Speech Comm. 9, 453–467 (1990)CrossRefGoogle Scholar
  67. 67.
    M. Slaney, M. Covell, B. Lassiter, Automatic audio morphing, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Atlanta, GA, USA, 1996)Google Scholar
  68. 68.
    O. Donnellan, E. Jung, E. Coyle, Speech-adaptive time-scale modification for computer assisted language-learning, in Proc. of 3rd IEEE Int. Conf. on Advanced Learning Technologies (ICALT03) (Aix-en-Provence, France, 2003)Google Scholar
  69. 69.
    A. Klapuri, Sound onset detection by applying psychoacoustic knowledge, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Washington, DC, USA, 1999), pp. 3089–3092Google Scholar
  70. 70.
    C. Duxbury, M.E. Davies, M.B. Sandler, Separation of transient information in musical audio using multiresolution analysis techniques, in Proc. of Int. Conf. Digital Audio Effects (DAFX) Limerick (Limerick, 2001), pp. 1–4Google Scholar
  71. 71.
    J. Bonada, Automatic technique in frequency domain for near-lossless time-scale modification of audio, in Proc. of Int. Conf. Computer Music Conference (ICMC) (Berlin, Germany, 2000), pp. 396–399Google Scholar
  72. 72.
    C. Duxbury, M.E. Davies, M. Sandler, Improved time-scaling of musical audio using phase locking at transients, in Proc. of Audio Engineering Society Convention 11 (Munich, Germany, 2002), paper 5530Google Scholar
  73. 73.
    A. Roebel, A new approach to transient processing in the phase vocoder, in Proc. of Int. Conf. Digital Audio Effects (DAFX) (London, 2003), pp. 344–349Google Scholar
  74. 74.
    X. Rodet, F. Jaillet, Detection and modeling of fast attack transients, in Proc. of Int. Conf. Computer Music Conference (ICMC) (Havana, Cuba, 2001), pp. 30–33Google Scholar
  75. 75.
    S. Hainsworth, M. Macleod, P. Wolfe, Analysis of reassigned spectrograms for musical transcription, in Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (New Paltz, NY, 2001), pp. 23–26Google Scholar
  76. 76.
    S. Grofit, Y. Lavner, Time-scale modification of audio signals using enhanced WSOLA with management of transients. IEEE Trans. Audio Speech Lang. Process. 16, 106–115 (2008)CrossRefGoogle Scholar
  77. 77.
    J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, N.L. Dahlgren, V. Zue, TIMIT acoustic-phonetic continuous speech corpus linguistic data consortium, in Proc. of IEEE ICISIP (Philadelphia, PA, 1993)Google Scholar
  78. 78.
    S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Spotting multilingual consonant-vowel units of speech using neural networks, in An ISCA Tutorial and Research Workshop on Non-linear Speech Processing, pp. 287–297, 2005Google Scholar
  79. 79.
    R.M. Hegde, H.A. Murthy, V. Gadde, Continuous speech recognition using joint features derived from the modified group delay function and MFCC, in Proc. of INTERSPEECH-Int. Conf. Spoken Language Processing (Jeju Island, Korea, 2004), pp. 905–908Google Scholar
  80. 80.
    K.S. Rao, B. Yegnanarayana, Intonation modeling for Indian languages. Comput. Speech Lang. 23, 240–256 (2009)CrossRefGoogle Scholar
  81. 81.
    K.S. Rao, B. Yegnanarayana, Modeling durations of syllables using neural networks. Comput. Speech Lang. (Elsevier) 21, 282–295 (2007)CrossRefGoogle Scholar
  82. 82.
    K.S. Rao, S.G. Koolagudi, Selection of suitable features for modeling the durations of syllables. J. Softw. Eng. Appl. 1107–1117 (2010)Google Scholar
  83. 83.
    K.S. Rao, Role of neural network models for developing speech systems. SADHANA (Springer) 36, 783–836 (2011)CrossRefGoogle Scholar
  84. 84.
    L. Mary, K.S. Rao, B. Yegnanarayana, Neural Network Classifiers for Language Identification using Syntactic and Prosodic features, in Proc. IEEE Int. Conf. Intelligent Sensing and Information Processing (Chennai, India, 2005), pp. 404–408Google Scholar
  85. 85.
    L. Mary, B. Yegnanarayana, Extraction and representation of prosodic features for language and speaker recognition. Speech Comm. 50, 782–796 (2008)CrossRefGoogle Scholar
  86. 86.
    K.S. Rao, Acquisition and incorporation of prosody knowledge for speech systems in indian languages, PhD thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, May 2005Google Scholar
  87. 87.
    A.K. Vuppala, J. Yadav, K.S. Rao, S. Chakrabarti, Vowel onset point detection for low bit rate coded speech. IEEE Trans. Audio Speech Lang. Process. 20(6), 1894–1903 (2012)CrossRefGoogle Scholar
  88. 88.
    S.R.M. Kodukula, Significance of excitation source information for speech analysis. PhD thesis, IIT Madras, March 2009Google Scholar
  89. 89.
    S. Guruprasad, Exploring features and scoring methods for speaker recognition, Master’s thesis, MS Thesis, IIT Madras, 2004Google Scholar
  90. 90.
    P.S. Murthy, B. Yegnanarayana, Robustness of group-delay-based method for extraction of significant instants of excitation from speech signals. IEEE Trans. Speech Audio Process. 7, 609–619 (1999)CrossRefGoogle Scholar
  91. 91.
    K.S. Rao, S.R.M. Prasanna, B. Yegnanarayana, Determination of instants of significant excitation in speech using hilbert envelope and group delay function. IEEE Signal Process. Lett. 14, 762–765 (2007)CrossRefGoogle Scholar
  92. 92.
    K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)CrossRefGoogle Scholar
  93. 93.
    A.K. Vuppala, J. Yadav, K.S. Rao, S. Chakrabarti, Effect of speech coding on epoch extraction, in Proc. of IEEE Int. Conf. on Devices and Communications, (Mesra, India, 2011)Google Scholar
  94. 94.
    A.K. Vuppala, K.S. Rao, S. Chakrabarti, Vowel onset point detection for noisy speech using spectral energy at formant frequencies. Int. J. Speech Tech. (Springer) 16(2), 229–235 (2013)Google Scholar
  95. 95.
    M.A. Joseph, S. Guruprasad, B. Yegnanarayana, Extracting formants from short segments of speech using group delay functions, in Proc. of Interspeech (Pittsburgh, PA, USA, 2006), pp. 1009–1012Google Scholar
  96. 96.
    M.A. Joseph, Extracting formant frequencies from short segments of speech, Master’s thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Apr. 2008Google Scholar
  97. 97.
  98. 98.
    A.K. Vuppala, J. Yadav, K.S. Rao, S. Chakrabarti, Effect of noise on vowel onset point detection, in Proc. of Int. Conf. Contemporary Computing (Noida, India, 2011), pp. 201–211. Communications in Computer and Information Science (Springer)Google Scholar
  99. 99.
    A.K. Vuppala, S. Chakrabarti, K.S. Rao, Effect of speech coding on recognition of consonant-vowel (CV) units, in Proc. of Int. Conf. contemporary computing (Springer Communications in Computer and Information Science ISSN: 1865–0929), (Noida, India, 2010), pp. 284–294Google Scholar
  100. 100.
    A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved consonant-vowel recognition for low bit-rate coded speech. Wiley Int. J. Adapt. Contr. Signal Process. 26, 333–349 (2012)CrossRefGoogle Scholar
  101. 101.
    J.W. Picone, Signal modeling techniques in speech recognition. Proc. IEEE 81, 1215–1247 (1993)CrossRefGoogle Scholar
  102. 102.
    S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, P. Woodland, The HTK Book Version 3.0 (Cambridge University Press, Cambridge, 2000)Google Scholar
  103. 103.
    R. Collobert, S. Bengio, SVMTorch: support vector machines for large-scale regression problems. Proc. J. Mach. Learn. Res. 143–160 (2001)Google Scholar
  104. 104.
    A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved vowel onset point detection using epoch intervals. AEUE (Elsevier) 66, 697–700 (2012)Google Scholar
  105. 105.
    P. Krishnamoorthy, S.R.M. Prasanna, Enhancement of noisy speech by temporal and spectral processing. Speech Comm. 53, 154–174 (2011)CrossRefGoogle Scholar
  106. 106.
    S. Bell, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27, 113–120 (1979)CrossRefGoogle Scholar
  107. 107.
    S. Kamath, P. Loizou, A multi-band spectral subtraction method for enhancing speech corrupted by colored noise, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Orlando, USA, 2002)Google Scholar
  108. 108.
    Y. Ephrain, D. Malah, Speech enhancement using minimum mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32, 1109–1121 (1984)CrossRefGoogle Scholar
  109. 109.
    B. Yegnanarayana, C. Avendano, H. Hermansky, P.S. Murthy, Speech enhancement using linear prediction residual. Speech Comm. 28, 25–42 (1999)CrossRefGoogle Scholar
  110. 110.
    B. Yegnanarayana, P.S. Murthy, Enhancement of reverberant speech using lp residual signal. IEEE Trans. Speech Audio Process. 8, 267–281 (2000)CrossRefGoogle Scholar
  111. 111.
    B. Yegnanarayana, S.R.M. Prasanna, R. Duraiswami, D. Zotkin, Processing of reverberant speech for time-delay estimation. IEEE Trans. Speech Audio Process. 13, 1110–1118 (2005)CrossRefGoogle Scholar
  112. 112.
    A.K. Vuppala, K.S. Rao, S. Chakrabarti, P. Krishnamoorthy, S.R.M. Prasanna, Recognition of consonant-vowel (CV) units under background noise using combined temporal and spectral preprocessing. Int. J. Speech Tech. (Springer) 14(3), 259–272 (2011)Google Scholar
  113. 113.
    A.K. Vuppala, K.S. Rao, S. Chakrabarti, Spotting and recognition of consonant-vowel units from continuous speech using accurate vowel onset points. Circ. Syst. Signal Process. (Springer) 31(4), 1459–1474 (2012)Google Scholar
  114. 114.
    A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved speaker identification in wireless environment. Int. J. Signal Imag. Syst. Eng. 6(3), 130–137 (2013)CrossRefGoogle Scholar
  115. 115.
    A.K. Vuppala, K.S. Rao, Speaker identification under background noise using features extracted from steady vowel regions. Wiley Int. J. Adapt. Contr. Signal Process. 29, 781–792 (2013)CrossRefGoogle Scholar
  116. 116.
    A.K. Vuppala, S. Chakrabarti, K.S. Rao, L. Dutta, “Robust speaker recognition on mobile devices,” in Proc. of IEEE Int. Conf. on Signal Processing and Communications (Bangalore, India, 2010)Google Scholar
  117. 117.
    K.S. Prahallad, B. Yegnanarayana, S.V. Gangashetty, Online text-independent speaker verification system using autoassociative neural network models, in Proc. of INNS-IEEE Int. Joint Conf. Neural Networks (Washington DC, USA, 2001), pp. 1548–1553Google Scholar
  118. 118.
    B. Yegnanarayana, S.P. Kishore, AANN an alternative to GMM for pattern recognition. Neural Network 15, 459–469 (2002)CrossRefGoogle Scholar
  119. 119.
    A.K. Vuppala, S. Chakrabarti, K.S. Rao, Effect of speech coding on speaker identification, in Proc. of IEEE INDICON (Kolkata, India, 2010)Google Scholar
  120. 120.
    S. Sigurdsson, K.B. Petersen, T. Lehn-Schioler, Mel frequency cepstral coefficients: An evaluation of robustness of MP3 encoded music, in Proc. of Seventh Int. Conf. on Music Information Retrieval, 2006Google Scholar
  121. 121.
    A.L. Edwards, An Introduction to Linear Regression and Correlation (W.H. Freeman and Company Ltd, Cranbury, NJ, 08512, USA, 1976)Google Scholar
  122. 122.
    J.R. Deller, J.G. Proakis, J.H.L. Hansen, Discrete-Time Processing of Speech Signals (Macmilan Publishing, New York, 1993)Google Scholar
  123. 123.
    R.V. Hogg, J. Ledolter, Engineering Statistics (Macmillan Publishing, New York, 1987)Google Scholar
  124. 124.
    S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Detection of vowel onset points in continuous speech using autoassociative neural network models, in Proc. Int. Conf. Spoken Language Processing, pp. 401–410, 2004Google Scholar
  125. 125.
    J.R. Deller, J.H. Hansen, J.G. Proakis, Discrete Time Processing of Speech Signals, 1st edn. (Prentice Hall PTR, Upper Saddle River, NJ, 1993)Google Scholar
  126. 126.
    J. Benesty, M.M. Sondhi, Y.A. Huang, Springer Handbook of Speech Processing (Springer, New York, 2008)CrossRefGoogle Scholar
  127. 127.
    J. Volkmann, S. Stevens, E. Newman, A scale for the measurement of the psychological magnitude pitch. J. Acoust. Soc. Am. 8, 185–190 (1937)CrossRefGoogle Scholar
  128. 128.
    Z. Fang, Z. Guoliang, S. Zhanjiang, Comparison of different implementations of MFCC. J. Comput. Sci. Tech. 16(6), 582–589 (2001)CrossRefMATHGoogle Scholar
  129. 129.
    G.K.T. Ganchev, N. Fakotakis, Comparative evaluation of various MFCC implementations on the speaker verification task, in Proc. of Int. Conf. on Speech and Computer (Patras, Greece, 2005), pp. 191–194Google Scholar
  130. 130.
    L.R. Rabiner, B.H. Juang, Fundamentals of speech Recognition (Prentice Hall PTR, Englewood cliffs, NJ, 1993)Google Scholar
  131. 131.
    S. Furui, Comparison of speaker recognition methods using statistical features and dynamic features. IEEE Trans. Acoust. Speech Signal Process. 29(3), 342–350 (1981)CrossRefGoogle Scholar
  132. 132.
    J.S. Mason, X. Zhang, Velocity and acceleration features in speaker recognition, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Toronto, Canada, 1991), pp. 3673–3676Google Scholar
  133. 133.
    W.C. Chu, Speech Coding Algorithms: Foundation and Evolution of Standardized Coders (Wiley, New York, 2003)CrossRefGoogle Scholar
  134. 134.
    A.M. Kondoz, Digital Speech: Coding for Low Bit Rate Communication Systems, 2nd edn. (Wiley, New York, 2004)Google Scholar
  135. 135.
    H.L.J. Hansen, B.L. Pellom, An effective quality evaluation protocol for speech enhancement algorithm, in Proc. Int. Conf. Spoken Language Processing, pp. 2819–2822, 1998Google Scholar
  136. 136.
    L.R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, in Proc. of IEEE, pp. 257–286, 1989Google Scholar
  137. 137.
    S. Theodoridis, K. Koutroumbas, Pattern Recognition, 3rd edn. (Elsevier, Academic press, Waltham, Massachusetts, USA, 2006)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • K. Sreenivasa Rao
    • 1
  • Anil Kumar Vuppala
    • 2
  1. 1.Indian Institute of TechnologyKharagpurIndia
  2. 2.International Institute of Information TechnologyHyderabadIndia

Personalised recommendations