Skip to main content

Speaker Identification and Time Scale Modification Using VOPs

  • Chapter
  • First Online:
Speech Processing in Mobile Environments

Part of the book series: SpringerBriefs in Electrical and Computer Engineering ((BRIEFSSPEECHTECH))

Abstract

In this chapter, the proposed two-stage VOP detection method is used for improving the Speaker Identification (SI) performance in the presence of coding. With the help of VOPs, the crucial regions of speech segments which mainly characterize speaker-specific information are determined. Features extracted from these crucial speech segments are used for speaker identification task for improving the recognition accuracy. The accurate VOPs determined from the proposed method are also explored for nonuniform time scale modification. The proposed nonuniform time scale modification method provides high quality speech while varying speech rate. In this method, vowel regions are modified nonuniformly based on the type of vowel, and consonant and transition regions are unaltered irrespective of speaking rate. Here, vowel onset points are used to determine consonant, vowel, and transition regions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. K.N. Stevens, Acoustic Phonetics (MIT Press, Cambridge, MA, 1999)

    Google Scholar 

  2. D. Crystal, A Dictionary of Linguistics and Phonetics (Basil Blackwell, Cambridge, Massachusetts, 1985)

    Google Scholar 

  3. M.A. Jack, J. Laver, Aspects of Speech Technology (Edinburgh university press, Edinburgh, 1988)

    Google Scholar 

  4. S.R.M. Prasanna, Event-based analysis of speech, PhD thesis, IIT Madras, March 2004

    Google Scholar 

  5. S.R.M. Prasanna, S.V. Gangashetty, B. Yegnanarayana, Significance of vowel onset point for speech analysis, in Proc. of Int. Conf. Signal Processing and Communications, (Bangalore, India, 2001), pp. 81–88

    Google Scholar 

  6. K.S. Rao, Voice conversion by mapping the speaker-specific features using pitch synchronous approach. Comput. Speech Lang. 24, 474–494 (2010)

    Article  Google Scholar 

  7. D.J. Hermes, Vowel onset detection. J. Acoust. Soc. Am. 87, 866–873 (1990)

    Article  Google Scholar 

  8. J.-H. Wang, S.-H. Chen, A C/V segmentation algorithm for Mandarin speech using wavelet transforms, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Phoenix, Arizona, 1999), pp. 1261–1264

    Google Scholar 

  9. S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Detection of vowel onset points in continuous speech using autoassociative neural network models, in Proc. Int. Conf. Spoken Language Processing, (Jeju Island, Korea, 2004), pp. 401–410

    Google Scholar 

  10. J.-F. Wang, C.H. Wu, S.H. Chang, J.Y. Lee, A hierarchical neural network based C/V segmentation algorithm for Mandarin speech recognition. IEEE Trans. Signal Process. 39(9), 2141–2146 (1991)

    Article  Google Scholar 

  11. S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana., Extraction of fixed dimension patterns from varying duration segments of consonant-vowel utterances, in Proc. of IEEE ICISIP, pp. 159–164, 2004

    Google Scholar 

  12. S.R.M. Prasanna, B. Yegnanarayana, Detection of vowel onset point events using excitation source information, in Proc. of Interspeech (Lisbon, Portugal, 2005), pp. 1133–1136

    Google Scholar 

  13. A. Kazemzadeh, J. Tepperman, J. Silva, H. You, S. Lee, A. Alwan, S. Narayanan, Automatic detection of voice onset time contrasts for use in pronunciation assessment, in Proc. Int. Conf. Spoken Language Processing (Pittsburgh, PA, USA, 2006)

    Google Scholar 

  14. V. Stouten, H.V. hamme, Automatic voice onset time estimation from reassignment spectra. Speech Comm. 51, 1194–1205 (2009)

    Google Scholar 

  15. S.R.M. Prasanna, B.V.S. Reddy, P. Krishnamoorthy, Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Audio Speech Lang. Process. 17, 556–565 (2009)

    Article  Google Scholar 

  16. K.S. Rao, B. Yegnanarayana, Duration modification using glottal closure instants and vowel onset points. Speech Comm. 51, 1263–1269 (2009)

    Article  Google Scholar 

  17. K.S. Rao, A.K. Vuppala, Non-uniform time scale modification using instants of significant excitation and vowel onset points. Speech Comm. (Elsevier) 55(6), 745–756 (2013)

    Google Scholar 

  18. J.H.L. Hansen, S.S. Gray, W. Kim, Automatic voice onset time detection for unvoiced stops (/p/,/t/,/k/) with application to accent classification. Speech Comm. 52, 777–789 (2010)

    Article  Google Scholar 

  19. C. Prakash, N. Dhananjaya, S. Gangashetty, Bessel features for detection of voice onset time using AM-FM signal, in Proc. of Int. Conf. on the Systems, Signals and Image Processing (IWSSIP), (IEEE, Sarajevo, Bosnia and Herzegovina, 2011), pp. 1–4

    Google Scholar 

  20. D. Zaykovskiy, Survey of the speech recognition techniques for mobile devices, in Proc. of DS Publications, 2006

    Google Scholar 

  21. Z.H. Tan, B. Lindberg, Automatic Speech Recognition on Mobile Devices and over Communication Networks (Springer, London, 2008)

    Book  MATH  Google Scholar 

  22. J.M. Huerta, Speech recognition in mobile environments, PhD thesis, Carnegie Mellon University, Apr. 2000

    Google Scholar 

  23. A.M. Peinado, J.C. Segura, Speech Recognition over Digital Channels (Wiley, New York, 2006)

    Book  Google Scholar 

  24. S. Kafley, A.K. Vuppala, A. Chauhan, K.S. Rao, “Continuous digit recognition in mobile environment,” in Proc. of IEEE Techsym (IIT Kharagpur, India, 2010), pp. 217–222

    Google Scholar 

  25. A.M. Gomez, A.M. Peinado, V. Sanchez, A.J. Rubio, Recognition of coded speech transmitted over wireless channels. IEEE Trans. Wireless Comm. 5, 2555–2562 (2006)

    Article  Google Scholar 

  26. S. Euler, J. Zinke, The influence of speech coding algorithms on automatic speech recognition, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Adelaide, Australia, 1994), pp. 621–624

    Google Scholar 

  27. B.T. Lilly, K.K. Paliwal, Effect of speech coders on speech recognition performance, in Proc. Int. Conf. Spoken Language Processing (Philadelphia, PA, USA, 1996), pp. 2344–2347

    Google Scholar 

  28. A. Gallardo-Antolin, C. Pelaez-Moreno, F.D. de Maria, Recognizing GSM digital speech. IEEE Trans. Speech Audio Process 13(6), 1186–1205 (2005)

    Article  Google Scholar 

  29. F. Quatieri, E. Singer, R.B. Dunn, D.A. Reynolds, J.P. Campbell, Speaker and language recognition using speech codec parameters, in Proc. of Eurospeech (Budapest, Hungary, 1999), pp. 787–790

    Google Scholar 

  30. R.B. Dunn, T.F. Quatieri, D.A. Reynolds, J.P. Campbell, Speaker recognition from coded speech in matched and mismatched condition, in Proc. of Speaker Recognition Workshop (Crete, Greece, 1999), pp. 115–120

    Google Scholar 

  31. R. Dunn, T. Quatieri, D. Reynolds, J. Campbell, Speaker recognition from coded speech and the effects of score normalization, in Proc. of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (IEEE, Monterery, CA, USA, 2001), pp. 1562–1567

    Google Scholar 

  32. A. Krobba, M. Debyeche, A. Amrouche, Evaluation of speaker identification system using GSM-EFR speech data, in Proc. of Int. Conf. on Design and Technology of Integrated Systems (Nanoscale Era Hammamet, 2010), pp. 1–5

    Google Scholar 

  33. A. Janicki, T. Staroszczyk, Speaker recognition from coded speech using support vector machines, in Proc. of 4th Int. Conf. on Text, Speech and Dialogue (Springer, Pilsen, Czech Republic, 2011), pp. 291–298

    Google Scholar 

  34. C. Mokbel, G. Chollet, Speech recognition in adverse environments: speech enhancement and spectral transformations, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Toronto, Ontario, Canada, 1991)

    Google Scholar 

  35. J.A. Nolazco-Flores, S. Young, CSS-PMC: a combined enhancement/compensation scheme for continuous speech recognition in noise. Cambridge University Engineering Department. Technical Report, 1993

    Google Scholar 

  36. J. Huang, Y. Zhao, Energy-constrained signal subspace method for speech enhancement and recognition. IEEE Signal Process. Lett. 4, 283–285 (1997)

    Article  Google Scholar 

  37. K. Hermus, W. Verhelst, P. Wambacq, Optimized subspace weighting for robust speech recognition in additive noise environments, in Proc. of ICSLP (Beijing, China, 2000), pp. 542–545

    Google Scholar 

  38. K. Hermus, P. Wambacq, Assessment of signal subspace based speech enhancement for noise robust speech recognition, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Montreal, Canada, 2004), pp. 945–948

    Google Scholar 

  39. H. Kris, W. Patrick, V.H. Hugo, A review of signal subspace speech enhancement and its application to noise robust speech recognition. EURASIP J. Appl. Signal Process. 195–209 (2007)

    Google Scholar 

  40. H. Hermanski, N. Morgan, H.G. Hirsch, Recognition of speech in additive and convolutional noise based on RASTA spectral processing, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Adelaide, Australia, 1994)

    Google Scholar 

  41. O. Viiki, B. Bye, K. Laurila, A recursive feature vector normalization approach for robust speech recognition in noise, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Seattle, USA, 1998)

    Google Scholar 

  42. D. Yu, L. Deng, J. Droppo, J. Wu, Y. Gong, A. Acero, A minimum-mean-square-error noise reduction algorithm on mel-frequency cepstra for robust speech recognition, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Las Vegas, USA, 2008), pp. 4041–4044

    Google Scholar 

  43. X. Cui, A. Alwan, Noise robust speech recognition using feature compensation based on polynomial regression of utterance SNR. IEEE Trans. Speech Audio Process. 13, 1161–1172 (2005)

    Article  Google Scholar 

  44. F. Hilger, H. Ney, Quantile based histogram equalization for noise robust large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 14(3), 845–854 (2006)

    Article  Google Scholar 

  45. A. de la Torre, A.M. Peinado, J.C. Segura, J.L. Perez-Cordoba, M.C. Benitez, A.J. Rubio, Histogram equalization of speech representation for robust speech recognition. IEEE Trans. Speech Audio Process. 13(3), 355–366 (2005)

    Article  Google Scholar 

  46. Y. Suh, M. Ji, H. Kim, Probabilistic class histogram equalization for robust speech recognition. IEEE Signal Process. Lett. 14(4), 287–290 (2007)

    Article  Google Scholar 

  47. K. Ohkura, M. Sugiyama, Speech recognition in a noisy environment using a noise reduction neural network and a codebook mapping technique, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Toronto, Canada, 1991)

    Google Scholar 

  48. M. Gales, S.Young, S.J. Young, Robust continuous speech recognition using parallel model combination. IEEE Trans. Speech Audio Process. 4(5), 352–359 (1996)

    Google Scholar 

  49. P.J. Moreno, Speech Recognition in Noisy Environments, PhD thesis, Carnegie Mellon University, 1996

    Google Scholar 

  50. S.V. Vaseghi, B.P. Milner, Noise compensation methods for hidden Markov model speech recognition in adverse environments. IEEE Trans. Speech Audio Process. 5, 11–21 (1997)

    Article  Google Scholar 

  51. H. Liao, M.J.F. Gales, Adaptive training with joint uncertainty decoding for robust recognition of noisy data, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Honolulu, USA, 2007), pp. 389–392

    Google Scholar 

  52. O. Kalinli, M.L. Seltzer, J. Droppo, A. Acero, Noise adaptive training for robust automatic speech recognition. IEEE Trans. Audio, Speech Lang. Process. 18(8), 1889–1901 (2010)

    Google Scholar 

  53. D.K. Kim, M.J.F. Gales, Noisy constrained maximum-likelihood linear regression for noise-robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 19(2), 315–325 (2011)

    Article  Google Scholar 

  54. S.V. Gangashetty, Neural network models for recognition of consonant-vowel units of speech in Multiple Languages, PhD thesis, IIT Madras, October 2004

    Google Scholar 

  55. C.C. Sekhar, Neural Network models for recognition of stop consonant-vowel (SCV) segments in continuous speech, PhD thesis, IIT Madras, 1996

    Google Scholar 

  56. K.S. Rao, Application of prosody models for developing speech systems in indian languages. Int. J. Speech Tech. (Springer) 14, 19–33 (2011)

    Article  Google Scholar 

  57. C.C. Sekhar, W.F. Lee, K. Takeda, F. Itakura, Acoustic modeling of subword units using support vector machines, in Proc. of WSLP (Mumbai, India, 2003)

    Google Scholar 

  58. S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Combining evidence from multiple classifiers for recognition of consonant-vowel units of speech in multiple languages, in Proc. of ICISIP (Chennai, India, 2005), pp. 387–391

    Google Scholar 

  59. K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Audio Speech Lang. Process. 14, 972–980 (2006)

    Article  Google Scholar 

  60. E. Moulines, J. Laroche, Non-parametric techniques for pitch-scale and time-scale modification of speech. Speech Comm. 16, 175–205 (1995)

    Article  Google Scholar 

  61. M.R. Portnoff, Time-scale modification of speech based on short-time Fourier analysis. IEEE Trans. Acoust. Speech Signal Process. 29, 374–390 (1981)

    Article  MathSciNet  Google Scholar 

  62. H.G. Ilk, S. Guler, Adaptive time scale modification of speech for graceful degrading voice quality in congested networks for VoIP applications. Signal Process. 86, 127–139 (2006)

    Article  MATH  Google Scholar 

  63. K.S. Rao, Real time prosody modification. J. Signal Inform. Process. 50–62 (2010)

    Google Scholar 

  64. T.F. Quatieri, R.J. McAulay, Shape invariant time-scale and pitch modification of speech. IEEE Signal Process. 40, 497–510 (1992)

    Article  Google Scholar 

  65. J. di Marino, Y. Laprie, Supression of phasiness for time-scale modifications of speech signals based on a shape invarience property, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Saltlake city, Utah, USA, 2001)

    Google Scholar 

  66. E. Moulines, F. Charpentier, Pitch-synchronous waveform processing techniques for text to speech synthesis using diphones. Speech Comm. 9, 453–467 (1990)

    Article  Google Scholar 

  67. M. Slaney, M. Covell, B. Lassiter, Automatic audio morphing, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Atlanta, GA, USA, 1996)

    Google Scholar 

  68. O. Donnellan, E. Jung, E. Coyle, Speech-adaptive time-scale modification for computer assisted language-learning, in Proc. of 3rd IEEE Int. Conf. on Advanced Learning Technologies (ICALT03) (Aix-en-Provence, France, 2003)

    Google Scholar 

  69. A. Klapuri, Sound onset detection by applying psychoacoustic knowledge, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Washington, DC, USA, 1999), pp. 3089–3092

    Google Scholar 

  70. C. Duxbury, M.E. Davies, M.B. Sandler, Separation of transient information in musical audio using multiresolution analysis techniques, in Proc. of Int. Conf. Digital Audio Effects (DAFX) Limerick (Limerick, 2001), pp. 1–4

    Google Scholar 

  71. J. Bonada, Automatic technique in frequency domain for near-lossless time-scale modification of audio, in Proc. of Int. Conf. Computer Music Conference (ICMC) (Berlin, Germany, 2000), pp. 396–399

    Google Scholar 

  72. C. Duxbury, M.E. Davies, M. Sandler, Improved time-scaling of musical audio using phase locking at transients, in Proc. of Audio Engineering Society Convention 11 (Munich, Germany, 2002), paper 5530

    Google Scholar 

  73. A. Roebel, A new approach to transient processing in the phase vocoder, in Proc. of Int. Conf. Digital Audio Effects (DAFX) (London, 2003), pp. 344–349

    Google Scholar 

  74. X. Rodet, F. Jaillet, Detection and modeling of fast attack transients, in Proc. of Int. Conf. Computer Music Conference (ICMC) (Havana, Cuba, 2001), pp. 30–33

    Google Scholar 

  75. S. Hainsworth, M. Macleod, P. Wolfe, Analysis of reassigned spectrograms for musical transcription, in Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (New Paltz, NY, 2001), pp. 23–26

    Google Scholar 

  76. S. Grofit, Y. Lavner, Time-scale modification of audio signals using enhanced WSOLA with management of transients. IEEE Trans. Audio Speech Lang. Process. 16, 106–115 (2008)

    Article  Google Scholar 

  77. J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, N.L. Dahlgren, V. Zue, TIMIT acoustic-phonetic continuous speech corpus linguistic data consortium, in Proc. of IEEE ICISIP (Philadelphia, PA, 1993)

    Google Scholar 

  78. S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Spotting multilingual consonant-vowel units of speech using neural networks, in An ISCA Tutorial and Research Workshop on Non-linear Speech Processing, pp. 287–297, 2005

    Google Scholar 

  79. R.M. Hegde, H.A. Murthy, V. Gadde, Continuous speech recognition using joint features derived from the modified group delay function and MFCC, in Proc. of INTERSPEECH-Int. Conf. Spoken Language Processing (Jeju Island, Korea, 2004), pp. 905–908

    Google Scholar 

  80. K.S. Rao, B. Yegnanarayana, Intonation modeling for Indian languages. Comput. Speech Lang. 23, 240–256 (2009)

    Article  Google Scholar 

  81. K.S. Rao, B. Yegnanarayana, Modeling durations of syllables using neural networks. Comput. Speech Lang. (Elsevier) 21, 282–295 (2007)

    Article  Google Scholar 

  82. K.S. Rao, S.G. Koolagudi, Selection of suitable features for modeling the durations of syllables. J. Softw. Eng. Appl. 1107–1117 (2010)

    Google Scholar 

  83. K.S. Rao, Role of neural network models for developing speech systems. SADHANA (Springer) 36, 783–836 (2011)

    Article  Google Scholar 

  84. L. Mary, K.S. Rao, B. Yegnanarayana, Neural Network Classifiers for Language Identification using Syntactic and Prosodic features, in Proc. IEEE Int. Conf. Intelligent Sensing and Information Processing (Chennai, India, 2005), pp. 404–408

    Google Scholar 

  85. L. Mary, B. Yegnanarayana, Extraction and representation of prosodic features for language and speaker recognition. Speech Comm. 50, 782–796 (2008)

    Article  Google Scholar 

  86. K.S. Rao, Acquisition and incorporation of prosody knowledge for speech systems in indian languages, PhD thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, May 2005

    Google Scholar 

  87. A.K. Vuppala, J. Yadav, K.S. Rao, S. Chakrabarti, Vowel onset point detection for low bit rate coded speech. IEEE Trans. Audio Speech Lang. Process. 20(6), 1894–1903 (2012)

    Article  Google Scholar 

  88. S.R.M. Kodukula, Significance of excitation source information for speech analysis. PhD thesis, IIT Madras, March 2009

    Google Scholar 

  89. S. Guruprasad, Exploring features and scoring methods for speaker recognition, Master’s thesis, MS Thesis, IIT Madras, 2004

    Google Scholar 

  90. P.S. Murthy, B. Yegnanarayana, Robustness of group-delay-based method for extraction of significant instants of excitation from speech signals. IEEE Trans. Speech Audio Process. 7, 609–619 (1999)

    Article  Google Scholar 

  91. K.S. Rao, S.R.M. Prasanna, B. Yegnanarayana, Determination of instants of significant excitation in speech using hilbert envelope and group delay function. IEEE Signal Process. Lett. 14, 762–765 (2007)

    Article  Google Scholar 

  92. K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)

    Article  Google Scholar 

  93. A.K. Vuppala, J. Yadav, K.S. Rao, S. Chakrabarti, Effect of speech coding on epoch extraction, in Proc. of IEEE Int. Conf. on Devices and Communications, (Mesra, India, 2011)

    Google Scholar 

  94. A.K. Vuppala, K.S. Rao, S. Chakrabarti, Vowel onset point detection for noisy speech using spectral energy at formant frequencies. Int. J. Speech Tech. (Springer) 16(2), 229–235 (2013)

    Google Scholar 

  95. M.A. Joseph, S. Guruprasad, B. Yegnanarayana, Extracting formants from short segments of speech using group delay functions, in Proc. of Interspeech (Pittsburgh, PA, USA, 2006), pp. 1009–1012

    Google Scholar 

  96. M.A. Joseph, Extracting formant frequencies from short segments of speech, Master’s thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Apr. 2008

    Google Scholar 

  97. Noisex-92: http://spib.rice.edu/spib/select_noise.html

  98. A.K. Vuppala, J. Yadav, K.S. Rao, S. Chakrabarti, Effect of noise on vowel onset point detection, in Proc. of Int. Conf. Contemporary Computing (Noida, India, 2011), pp. 201–211. Communications in Computer and Information Science (Springer)

    Google Scholar 

  99. A.K. Vuppala, S. Chakrabarti, K.S. Rao, Effect of speech coding on recognition of consonant-vowel (CV) units, in Proc. of Int. Conf. contemporary computing (Springer Communications in Computer and Information Science ISSN: 1865–0929), (Noida, India, 2010), pp. 284–294

    Google Scholar 

  100. A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved consonant-vowel recognition for low bit-rate coded speech. Wiley Int. J. Adapt. Contr. Signal Process. 26, 333–349 (2012)

    Article  Google Scholar 

  101. J.W. Picone, Signal modeling techniques in speech recognition. Proc. IEEE 81, 1215–1247 (1993)

    Article  Google Scholar 

  102. S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, P. Woodland, The HTK Book Version 3.0 (Cambridge University Press, Cambridge, 2000)

    Google Scholar 

  103. R. Collobert, S. Bengio, SVMTorch: support vector machines for large-scale regression problems. Proc. J. Mach. Learn. Res. 143–160 (2001)

    Google Scholar 

  104. A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved vowel onset point detection using epoch intervals. AEUE (Elsevier) 66, 697–700 (2012)

    Google Scholar 

  105. P. Krishnamoorthy, S.R.M. Prasanna, Enhancement of noisy speech by temporal and spectral processing. Speech Comm. 53, 154–174 (2011)

    Article  Google Scholar 

  106. S. Bell, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27, 113–120 (1979)

    Article  Google Scholar 

  107. S. Kamath, P. Loizou, A multi-band spectral subtraction method for enhancing speech corrupted by colored noise, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Orlando, USA, 2002)

    Google Scholar 

  108. Y. Ephrain, D. Malah, Speech enhancement using minimum mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32, 1109–1121 (1984)

    Article  Google Scholar 

  109. B. Yegnanarayana, C. Avendano, H. Hermansky, P.S. Murthy, Speech enhancement using linear prediction residual. Speech Comm. 28, 25–42 (1999)

    Article  Google Scholar 

  110. B. Yegnanarayana, P.S. Murthy, Enhancement of reverberant speech using lp residual signal. IEEE Trans. Speech Audio Process. 8, 267–281 (2000)

    Article  Google Scholar 

  111. B. Yegnanarayana, S.R.M. Prasanna, R. Duraiswami, D. Zotkin, Processing of reverberant speech for time-delay estimation. IEEE Trans. Speech Audio Process. 13, 1110–1118 (2005)

    Article  Google Scholar 

  112. A.K. Vuppala, K.S. Rao, S. Chakrabarti, P. Krishnamoorthy, S.R.M. Prasanna, Recognition of consonant-vowel (CV) units under background noise using combined temporal and spectral preprocessing. Int. J. Speech Tech. (Springer) 14(3), 259–272 (2011)

    Google Scholar 

  113. A.K. Vuppala, K.S. Rao, S. Chakrabarti, Spotting and recognition of consonant-vowel units from continuous speech using accurate vowel onset points. Circ. Syst. Signal Process. (Springer) 31(4), 1459–1474 (2012)

    Google Scholar 

  114. A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved speaker identification in wireless environment. Int. J. Signal Imag. Syst. Eng. 6(3), 130–137 (2013)

    Article  Google Scholar 

  115. A.K. Vuppala, K.S. Rao, Speaker identification under background noise using features extracted from steady vowel regions. Wiley Int. J. Adapt. Contr. Signal Process. 29, 781–792 (2013)

    Article  Google Scholar 

  116. A.K. Vuppala, S. Chakrabarti, K.S. Rao, L. Dutta, “Robust speaker recognition on mobile devices,” in Proc. of IEEE Int. Conf. on Signal Processing and Communications (Bangalore, India, 2010)

    Google Scholar 

  117. K.S. Prahallad, B. Yegnanarayana, S.V. Gangashetty, Online text-independent speaker verification system using autoassociative neural network models, in Proc. of INNS-IEEE Int. Joint Conf. Neural Networks (Washington DC, USA, 2001), pp. 1548–1553

    Google Scholar 

  118. B. Yegnanarayana, S.P. Kishore, AANN an alternative to GMM for pattern recognition. Neural Network 15, 459–469 (2002)

    Article  Google Scholar 

  119. A.K. Vuppala, S. Chakrabarti, K.S. Rao, Effect of speech coding on speaker identification, in Proc. of IEEE INDICON (Kolkata, India, 2010)

    Google Scholar 

  120. S. Sigurdsson, K.B. Petersen, T. Lehn-Schioler, Mel frequency cepstral coefficients: An evaluation of robustness of MP3 encoded music, in Proc. of Seventh Int. Conf. on Music Information Retrieval, 2006

    Google Scholar 

  121. A.L. Edwards, An Introduction to Linear Regression and Correlation (W.H. Freeman and Company Ltd, Cranbury, NJ, 08512, USA, 1976)

    Google Scholar 

  122. J.R. Deller, J.G. Proakis, J.H.L. Hansen, Discrete-Time Processing of Speech Signals (Macmilan Publishing, New York, 1993)

    Google Scholar 

  123. R.V. Hogg, J. Ledolter, Engineering Statistics (Macmillan Publishing, New York, 1987)

    Google Scholar 

  124. S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Detection of vowel onset points in continuous speech using autoassociative neural network models, in Proc. Int. Conf. Spoken Language Processing, pp. 401–410, 2004

    Google Scholar 

  125. J.R. Deller, J.H. Hansen, J.G. Proakis, Discrete Time Processing of Speech Signals, 1st edn. (Prentice Hall PTR, Upper Saddle River, NJ, 1993)

    Google Scholar 

  126. J. Benesty, M.M. Sondhi, Y.A. Huang, Springer Handbook of Speech Processing (Springer, New York, 2008)

    Book  Google Scholar 

  127. J. Volkmann, S. Stevens, E. Newman, A scale for the measurement of the psychological magnitude pitch. J. Acoust. Soc. Am. 8, 185–190 (1937)

    Article  Google Scholar 

  128. Z. Fang, Z. Guoliang, S. Zhanjiang, Comparison of different implementations of MFCC. J. Comput. Sci. Tech. 16(6), 582–589 (2001)

    Article  MATH  Google Scholar 

  129. G.K.T. Ganchev, N. Fakotakis, Comparative evaluation of various MFCC implementations on the speaker verification task, in Proc. of Int. Conf. on Speech and Computer (Patras, Greece, 2005), pp. 191–194

    Google Scholar 

  130. L.R. Rabiner, B.H. Juang, Fundamentals of speech Recognition (Prentice Hall PTR, Englewood cliffs, NJ, 1993)

    Google Scholar 

  131. S. Furui, Comparison of speaker recognition methods using statistical features and dynamic features. IEEE Trans. Acoust. Speech Signal Process. 29(3), 342–350 (1981)

    Article  Google Scholar 

  132. J.S. Mason, X. Zhang, Velocity and acceleration features in speaker recognition, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Toronto, Canada, 1991), pp. 3673–3676

    Google Scholar 

  133. W.C. Chu, Speech Coding Algorithms: Foundation and Evolution of Standardized Coders (Wiley, New York, 2003)

    Book  Google Scholar 

  134. A.M. Kondoz, Digital Speech: Coding for Low Bit Rate Communication Systems, 2nd edn. (Wiley, New York, 2004)

    Google Scholar 

  135. H.L.J. Hansen, B.L. Pellom, An effective quality evaluation protocol for speech enhancement algorithm, in Proc. Int. Conf. Spoken Language Processing, pp. 2819–2822, 1998

    Google Scholar 

  136. L.R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, in Proc. of IEEE, pp. 257–286, 1989

    Google Scholar 

  137. S. Theodoridis, K. Koutroumbas, Pattern Recognition, 3rd edn. (Elsevier, Academic press, Waltham, Massachusetts, USA, 2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Rao, K.S., Vuppala, A.K. (2014). Speaker Identification and Time Scale Modification Using VOPs. In: Speech Processing in Mobile Environments. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-03116-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-03116-3_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-03115-6

  • Online ISBN: 978-3-319-03116-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics