Fusion of a Novel Volterra-Wiener Filter Based Nonlinear Residual Phase and MFCC for Speaker Verification

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10458)


This paper investigates the complementary nature of the speaker-specific information present in the Volterra-Wiener filter residual (VWFR) phase of speech signal in comparison with the information present in conventional Mel Frequency Cepstral Coefficients (MFCC) and Teager Energy Operator (TEO) phase. The feature set is derived from residual phase extracted from the output of nonlinear filter designed using Volterra-Weiner series exploiting higher order linear as well as nonlinear relationships hidden in the sequence of samples of speech signal. The proposed feature set is being used to conduct Speaker Verification (SV) experiments on NIST SRE 2002 database using state-of-the-art GMM-UBM system. The score-level fusion of proposed feature set with MFCC gives an EER of 6.05% as compared to EER of 8.9% with MFCC alone. EER of 8.83% is obtained for TEO phase in fusion with MFCC, indicating that residual phase from proposed nonlinear filtering approach contain complementary speaker-specific information.


Volterra-Wiener filter residual (VWFR) Volterra-Weiner series Nonlinear filter GMM-UBM MFCC TEO phase 


  1. 1.
    Markhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975)CrossRefGoogle Scholar
  2. 2.
    Furui, S.: Cepstral analysis technique for automatic speaker verification. IEEE Trans. Acoust. Speech Signal Process. 29(2), 254–272 (1981)CrossRefGoogle Scholar
  3. 3.
    Prasanna, S., Gupta, C.S., Yegnanarayana, B.: Extraction of speaker-specific excitation from linear prediction residual of speech. Speech Commun. 48(10), 1243–1261 (2006)CrossRefGoogle Scholar
  4. 4.
    Murty, K., Prasanna, S., Yegnanarayana, B.: Speaker-specific information from residual phase. In: International Conference on Signal Processing and Communications (SPCOM), IISc Bangalore India, pp. 516–519 (2004)Google Scholar
  5. 5.
    Patil, H., Parhi, K.: Development of TEO phase for speaker recognition. In: International Conference on Signal Processing and Communications (SPCOM), IISc Bangalore, India, pp. 1–5 (2010)Google Scholar
  6. 6.
    Agrawal, P., Patil, H.: Fusion of TEO phase with MFCC features for speaker verification. In: Proceedings 2nd International Conference on Perception and Machine Intelligence (PerMIn), C-DAC Kolkata, India, pp. 161–166 (2015)Google Scholar
  7. 7.
    Kaiser, J.: On a simple algorithm to calculate the energy of a signal. In: Proceedings of International Conference on Acoustic, Speech and Signal Processing (ICASSP), Albuquerque New Mexico, USA, vol. 1, pp. 381–384 (1990)Google Scholar
  8. 8.
    Murty, K., Yegnanarayana, B.: Combining evidences from residual phase and MFCC features for speaker recognition. IEEE Sig. Process. Lett. 13(1), 52–55 (2006)CrossRefGoogle Scholar
  9. 9.
    Korenberg, M.: Indentifying nonlinear difference equation and functional expansion representation: the fast orthogonal algorithm. Ann. Biomed. Eng. 16(1), 123–142 (1988)MathSciNetCrossRefGoogle Scholar
  10. 10.
    NIST: The NIST Year 2002 Speaker Recognition Evaluation Plan. Last Accessed 25 Mar 2015
  11. 11.
    Murty, K., Yegnanarayana, B.: Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)CrossRefGoogle Scholar
  12. 12.
    Reynolds, D., Quatieri, T., Dunn, R.: Speaker verification using adapted Gaussian mixture models. Digital Sig. Process. 10(1), 19–41 (2000)CrossRefGoogle Scholar
  13. 13.
    Teager, H.M.: Some observations on oral airflow during phonation. IEEE Trans. Acoust. Speech Sig. Process. 28(5), 599–601 (1980)CrossRefGoogle Scholar
  14. 14.
    Teager, H.M., Teager, S.M.: Evidence for nonlinear sound production mechanisms in the vocal tract. In: Hardcastle, W.J., Marchal, A. (eds.) Speech Production and Speech Modelling. NATO ASI Series (Series D: Behavioural and Social Sciences), vol. 55, pp. 241–261. Springer, Dordrecht (1990). doi: 10.1007/978-94-009-2037-8_10
  15. 15.
    Quatieri, T.: Discrete-Time Speech Signal Processing. Prentice-Hall, Upper Saddle River (2002)Google Scholar
  16. 16.
    Sicuranza, V., Mathews, G.: Polynomial Signal Processing. Wiley, New York (2000)Google Scholar
  17. 17.
    Patil, H., Patel, T.: Nonlinear prediction of speech signal using Volterra-Wiener series, pp. 1687–1691. INTERSPEECH, Lyon, France (2013)Google Scholar
  18. 18.
    Moorel, M., Bernstein, R., Mitra, S.: A generalization of the Teager algorithm. In: Proceedings of IEEE Workshop on Nonlinear Signal Processing, September 1997Google Scholar
  19. 19.
    Drugman, T.: GLOAT (GLOttal Analysis Toolbox). Last Accessed 25 Mar 2015
  20. 20.
    Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET curve in assessment of detection task performance. In: Proceedings of European Conference on Speech Processing Technology, Rhodes, Greece, pp. 1895–1898 (1997)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Indian Institute of ScienceBengaluruIndia
  2. 2.Dhirubhai Ambani Institute of Information and Communication TechnologyGandhinagarIndia

Personalised recommendations