A method to compensate the influence of speech codec in speaker recognition
- 35 Downloads
The recognition of a person by his voice or “speaker recognition”, is a biometric specialty increasingly used in electronic commerce and electronic banking transactions and forensic investigations, among others. Speaker recognition is supported by the discriminative information contained in the speech of a person and its main challenge is the variability that exists between different speech samples of the same person, used for training and evaluation, or “session variability”. When a speech communication is transmitted over the internet, for example, the coding–decoding process “codec” of the speech causes loss of such information and affects the effectiveness of the speaker recognition. Some methods have been proposed to mitigate this effect. This work makes a study of the degree of affectation of this information for some commonly used codec types and proposes our own solution, to compensate the session variability provoked by the codec. The influence of some types of codec in the quality of the sample was evaluated first with a set of synthesized speech samples. Later, experiments were carried out with speech samples of international competitions, retransmitted over two different codecs, and the effect on the speaker recognition effectiveness was checked. Finally, the variability compensation was applied, with an improvement of the recognition effectiveness, measured by the equal error rate, of 20.8% for the g.722 codec and 27.8% for the gsm 6.20 codec.
KeywordsSpeaker recognition Variability compensation Linear discriminant analysis i-Vector representation Voice codec
- Calvo, J. R. (2015). (In Spanish) Métodos de transmisión de voz sobre internet: VoIP. El reconocimiento del locutor en Internet. Technical Report RT078, Blue Serie, CENATAV.Google Scholar
- Dunn, R. B., et al. (2001). Speaker recognition from coded speech in matched and mismatched conditions. In IEEE Odyssey’01 The Speaker and Language Recognition Workshop Proceedings, pp 72–83.Google Scholar
- Fernández, L., Wagner, M., & Möller, S. (2012). Analysis of automatic speaker verification performance over different narrowband and wideband telephone channels. In SST’12 Australasian Conference Proceedings, pp. 157–160.Google Scholar
- Fernández, L., Wagner, M., & Möller, S. (2014a). Advantages of wideband over narrowband channels for speaker verification employing MFCCs and LFCCs. In ISCA Interspeech Conference Proceedings, pp 1115–1118.Google Scholar
- Fernández, L., Wagner, M., & Möller, S. (2014b). Spectral sub-band analysis of speaker verification employing narrowband and wideband speech. IEEE Odyssey’14 The Speaker and Language Recognition Workshop Proceedings, pp 81–87.Google Scholar
- Hatch, A. O., Kajarekar, S. S., & Stolcke, A. (2006). Within-class covariance normalization for svm-based speaker recognition. ISCA ICSLP’06 Conference Proceedings, pp. 1471–1474.Google Scholar
- International Telecommunication Union (2004). ITU-T Recommendation P.563: Single-ended method for objective speech quality assessment in narrow-band telephony applications. https://www.itu.int/rec/T-REC-P.563.
- International Telecommunication Union (1996). Recommendation Series, I. T. U. T. P.800: “Methods for subjective determination of transmission quality”. https://www.itu.int/rec/T-REC-P.800.
- Jain, A., Flynn, P., & Ross, A. (2007). Handbook of biometrics. Berlin: Springer.Google Scholar
- Janicki, A. (2010). SVM-based speaker verification for codec and un-coded speech. EUSIPCO’10 Conference Proceedings, pp 26–30.Google Scholar
- Janicki, A., & Staroszczyk, T. (2011). Speaker recognition from coded speech using SVM. TSD’11 Conference Proceedings, LNAI 6836, pp. 291–298.Google Scholar
- Martin, A., Doddington, G., Kamm, T., Ordowski, M., & Przybocki, M. (1997). The DET curve in assessment of detection task performance. ESCA Eurospeech’97 Conference Proceedings, pp 1895–1898.Google Scholar
- McLaren, M., et al. (2013). Improving robustness to compressed speech in speaker recognition. In Proceedings of interspeech, pp. 3698–3701, 2013.Google Scholar
- National Institute of Standardization (2008). The 2008 NIST speaker recognition evaluation results. https://www.nist.gov/itl/iad/mig/2008-nist-speaker-recognition-evaluation-results.
- Scheffer, N., Ferrer, L., Lawson, A., Lei, Y., & McLaren, M. (2013). Recent developments in voice biometrics: Robustness and high accuracy. In IEEE Proceedings of International Conference on technologies for homeland security (HST), pp. 447–452.Google Scholar
- Silovsky, J., et al. (2011). Assessment of speaker recognition on lossy codecs used for transmission of speech. In ELMAR’11 Symposium Proceedings, pp. 205–208.Google Scholar
- Solomonoff, A., Campbell, W. M., & Boardman, I. (2005). Advances in channel compensation for SVM speaker recognition. In IEEE ICASSP’05 Conference Proceedings, pp 629–632.Google Scholar