Reversible Speech De-identification Using Parametric Transformations and Watermarking

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10077)


This paper presents a system capable of de-identifying speech signals in order to hide and protect the identity of the speaker. It applies a relatively simple yet effective transformation of the pitch and the frequency axis of the spectral envelope thanks to a flexible wideband harmonic model. Moreover, it inserts the parameters of the transformation in the signal by means of watermarking techniques, thus enabling re-identification. Our experiments show that for adequate modification factors its performance is satisfactory in terms of quality, de-identification degree and naturalness. The limitations due to the signal processing framework are discussed as well.


Watermark Harmonic Model Signal Processing Framework Intermediate Significant Bit (ISBs) Frequency Warping 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work has been partially funded by the Spanish Ministry of Economy and Competitiveness (RESTORE project, TEC2015-67163-C2-1-R MINECO/FEDER,UE) and the Basque Government (ELKAROLA, KK-2015/00098).


  1. 1.
    Ribaric, S., Ariyaeeinia, A., Pavesic, N.: De-identification for privacy protection in multimedia content: a survey. Signal Process. Image Commun. 47, 131–151 (2016)CrossRefGoogle Scholar
  2. 2.
    Jin, Q., Toth, A.R., Schultz, T., Black, A.W.: Voice convergin: speaker de-identification by voice transformation. In: Proceedings of ICASSP, pp. 3909–3912 (2009)Google Scholar
  3. 3.
    Pobar, M., Ipsic, I.: Online speaker de-identification using voice transformation. In: Proceedings of MIPRO, pp. 1264–1267 (2014)Google Scholar
  4. 4.
    Justin, T., Struc, V., Dobrisek, S., Vesnicer, B., Ipsic, I., Mihelic, F.: Speaker de-identification using diphone recognition and speech synthesis. In: Proceedings of 11th IEEE International Conference on Automatic Face and Gesture Recognition, pp. 1–7 (2015)Google Scholar
  5. 5.
    Magariños, C., Lopez-Otero, P., Docio, L., Erro, D., Rodriguez-Banga, E., Garcia-Mateo, C.: Piecewise linear definition of transformation functions for speaker de-identification. In: Proceedings of SPLINE (2016)Google Scholar
  6. 6.
    Magariños, C., Lopez-Otero, P., Docio, L., Rodriguez-Banga, E., Erro, D., Garcia-Mateo, C.: Reversible speaker de-identification using pre-trained transformation functions. IEEE Signal Process. Lett. (2016, submitted)Google Scholar
  7. 7.
    Erro, D., Moreno, A., Bonafonte, A.: Flexible harmonic/stochastic speech synthesis. In: Proceedings of 6th ISCA Speech Synthesis Workshop, pp. 194–199 (2007)Google Scholar
  8. 8.
    Degottex, G., Stylianou, Y.: Analysis and synthesis of speech using an adaptive full-band harmonic model. IEEE Trans. Audio Speech Lang. Process. 21(10), 2085–2095 (2013)CrossRefGoogle Scholar
  9. 9.
    Stylianou, Y.: Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification. Ph.D. thesis, ENST, Paris (1996)Google Scholar
  10. 10.
    Boersma, P.: Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In: Proceedings of Institute of Phonetic Sciences, University of Amsterdam, pp. 97–110 (1993)Google Scholar
  11. 11.
    Tokuda, K., Kobayashi, T., Masuko, T., Imai, S.: Mel-generalized cepstral analysis - a unified approach to speech spectral estimation. In: Proceedings of ICSLP, vol. 3, pp. 1043–1046 (1994)Google Scholar
  12. 12.
    Nematollahi, M.A., Al-Haddad, S.A.R.: An overview of digital speech watermarking. Int. J. Speech Tech. 16(4), 471–488 (2013)CrossRefGoogle Scholar
  13. 13.
    Kirovski, D., Malvar, H.S.: Spread-spectrum watermarking of audio signals. IEEE Trans. Signal Process. 51(4), 1020–1033 (2003)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Korzhik, V.I., Morales-Luna, G., Fedyanin, I.: Audio watermarking based on echo hiding with zero error probability. Int. J. Emerg. Technol. Adv. Eng. 10(1), 1–10 (2013)Google Scholar
  15. 15.
    Hernaez, I., Saratxaga, I., Ye, J., Sanchez, J., Erro, D., Navas, E.: Speech watermarking based on coding of the harmonic phase. In: Navarro Mesa, J.L., Ortega, A., Teixeira, A., Hernández Pérez, E., Quintana Morales, P., Ravelo García, A., Guerra Moreno, I., Toledano, D.T. (eds.) IberSPEECH 2014. LNCS (LNAI), vol. 8854, pp. 259–268. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-13623-3_27 Google Scholar
  16. 16.
    Zeki, A.M., Manaf, A.A.: A novel digital watermarking technique based on ISB (Intermediate Significant Bit). Int. J. Comput. Electr. Autom. Control Inf. Eng. 3(2), 444–451 (2009)Google Scholar
  17. 17.
    Moon, T.K.: Error Correction Coding: Mathematical Methods and Algorithms. Wiley, New York (2005)CrossRefzbMATHGoogle Scholar
  18. 18.
    Rix, A., Beerends, J., Hollier, M., Hekstra, A.: Perceptual evaluation of speech quality (PESQ) - a new method for speech quality assessment of telephone networks and codecs. In: Proceedings of ICASSP, vol. 2, pp. 749–752 (2001)Google Scholar
  19. 19.
    Phonexia speaker identification.
  20. 20.
    Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)CrossRefGoogle Scholar
  21. 21.
    White, L., King, S.: The EUSTACE speech corpus (2003).

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.AholabUniversity of the Basque Country (UPV/EHU)BilbaoSpain
  2. 2.IKERBASQUE, Basque Foundation for ScienceBilbaoSpain

Personalised recommendations