Intelligibility Assessment of the De-Identified Speech Obtained Using Phoneme Recognition and Speech Synthesis Systems

  • Tadej Justin
  • France Mihelič
  • Simon Dobrišek
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8655)


The paper presents and evaluates a speaker de-identification technique using speech recognition and two speech synthesis techniques. The phoneme recognition system is built using HMM-based acoustical models of context-dependent diphone speech units, and two different speech synthesis systems (diphone TD-PSOLA-based and HMM-based) are employed for re-synthesizing the recognized sequences of speech units. Since the acoustical models of the two speech synthesis systems are assumed to be completely independent of the input speaker’s voice, the highest level of input speaker de-identification is ensured. The proposed de-identification system is considered to be language dependent, but is, however, vocabulary and speaker independent since it is based mainly on acoustical modelling of the selected diphone speech units. Due to the relatively simple computing methods, the whole de-identification procedure runs in real-time.

The speech outputs are compared and assessed by testing the intelligibility of the re-synthesized speech from different points of view. The assessment results show interesting variabilities of the evaluators’ transcriptions depending on the input speaker, the synthesis method applied and the evaluators capabilities. But in spite of the relatively high phoneme recognition error rate (approx. 19%), the re-synthesized speech is in many cases still fully intelligible.


Voice de-identification phoneme recognition speech synthesis diphone speech units HMM modelling intelligibility evaluation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ribarić, S., et al.: De-identification for privacy protection in mutlimedia content. COST Action MOU (2013)Google Scholar
  2. 2.
    Poh, N., Štruc, V., Pavešić, N., et al.: An evaluation of video-to-video face verification. IEEE Transactions on Information Forensics and Security 5(4), 781–801 (2010)CrossRefGoogle Scholar
  3. 3.
    Stylianou, Y.: Voice Transformation: A survey. In: ICASSP 1999, pp. 3585–3588 (1999) ISSN 1520-6149Google Scholar
  4. 4.
    Pfitzinger, H.R.: Unsupervised Speech Morphing between Utterances of any Speakers. In: Cassidy, S., Cox, F., Mannell, R., Palethorpe, S. (eds.) Proceedings of the 10th Australian International Conference on Speech Science & Technology, pp. 545–550 (2004)Google Scholar
  5. 5.
    Qin, J., Toth, A.R., Schultz, T., Black, A.W.: Speaker de-identification via voice transformation. In: IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU 2009, pp. 529–533 (2009) ISBN 978-1-4244-5478-5Google Scholar
  6. 6.
    Dobrišek, S., Mihelič, F., Pavešić, N.: Acoustical modelling of phone transitions: biphones and diphones - what are the differences? In: Olaszy, G., Nemeth, G., Erdohegyi, K. (eds.) Proceedings of Eurospeech 1999, vol. 3, pp. 1307–1310 (1999)Google Scholar
  7. 7.
    O’Shaughnessy, D., Barbeau, L., Bernardi, D., Archambault, D.: Diphone speech synthesis. Speech Communication 7(1), 55–65 (1988)CrossRefGoogle Scholar
  8. 8.
    Dobrišek, S.: Analysis and Recognition of Phones in Speech Signals, PhD Thesis, University of Ljubljana (2001)Google Scholar
  9. 9.
    Žganec Gros, J., Pavešić, N., Mihelič, F.: Text-to-Speech synthesis: A complete system for the Slovenian language, vol. 5(1), pp. 11–19. CIT (1997) ISSN 1330-1136.Google Scholar
  10. 10.
    Pobar, M., Justin, T., Žibert, J., Mihelič, F., Ipšić, I.: A Comparison of Two Approaches to Bilingual HMM-Based Speech Synthesis. In: Habernal, I., Matousek, V. (eds.) TSD 2013. LNCS (LNAI), vol. 8082, pp. 44–51. Springer, Heidelberg (2013)Google Scholar
  11. 11.
    Zen, H., Nose, T., Yamagishi, J., et al.: The hmm-based speech synthesis system (hts) version 2.0. In: Proc. of Sixth ISCA Workshop on Speech Synthesis, pp. 294–299 (2007)Google Scholar
  12. 12.
    Vesnicer, B., Mihelič, F.: Evaluation of the Slovenian HMM-based speech synthesis system. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 513–520. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  13. 13.
    Mihelič, F., Žganec Gros, J., Dobrišek, S., Žibert, J., Pavešić, N.: Spoken language resources at LUKS of the University of Ljubljana. Int. J. Speech Technol. 6(3), 221–232 (2003)CrossRefGoogle Scholar
  14. 14.
    Ipšić, I., Mihelič, F., Dobrišek, S., Gros, J., Pavešić, N.: A Slovenian spoken dialog system for air flight inquiries. In: Olaszy, G., Nemeth, G., Erdohegyi, K. (eds.) Proceedings of Eurospeech 1999, vol. 6, pp. 2659–2662 (1999)Google Scholar
  15. 15.
    Young, S.J., Evermann, G., Gales, M.J.F., et al.: The HTK Book, version 3.4.1. Cambridge University Engineering Department, Cambridge (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Tadej Justin
    • 1
  • France Mihelič
    • 1
  • Simon Dobrišek
    • 1
  1. 1.Faculty of Electrical EngineeringUniversity of LjubljanaLjubljanaSlovenia

Personalised recommendations