Intelligibility Assessment of the De-Identified Speech Obtained Using Phoneme Recognition and Speech Synthesis Systems

Justin, Tadej; Mihelič, France; Dobrišek, Simon

doi:10.1007/978-3-319-10816-2_64

Tadej Justin²¹,
France Mihelič²¹ &
Simon Dobrišek²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8655))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1525 Accesses
1 Citations

Abstract

The paper presents and evaluates a speaker de-identification technique using speech recognition and two speech synthesis techniques. The phoneme recognition system is built using HMM-based acoustical models of context-dependent diphone speech units, and two different speech synthesis systems (diphone TD-PSOLA-based and HMM-based) are employed for re-synthesizing the recognized sequences of speech units. Since the acoustical models of the two speech synthesis systems are assumed to be completely independent of the input speaker’s voice, the highest level of input speaker de-identification is ensured. The proposed de-identification system is considered to be language dependent, but is, however, vocabulary and speaker independent since it is based mainly on acoustical modelling of the selected diphone speech units. Due to the relatively simple computing methods, the whole de-identification procedure runs in real-time.

The speech outputs are compared and assessed by testing the intelligibility of the re-synthesized speech from different points of view. The assessment results show interesting variabilities of the evaluators’ transcriptions depending on the input speaker, the synthesis method applied and the evaluators capabilities. But in spite of the relatively high phoneme recognition error rate (approx. 19%), the re-synthesized speech is in many cases still fully intelligible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ribarić, S., et al.: De-identification for privacy protection in mutlimedia content. COST Action MOU (2013)
Google Scholar
Poh, N., Štruc, V., Pavešić, N., et al.: An evaluation of video-to-video face verification. IEEE Transactions on Information Forensics and Security 5(4), 781–801 (2010)
Article Google Scholar
Stylianou, Y.: Voice Transformation: A survey. In: ICASSP 1999, pp. 3585–3588 (1999) ISSN 1520-6149
Google Scholar
Pfitzinger, H.R.: Unsupervised Speech Morphing between Utterances of any Speakers. In: Cassidy, S., Cox, F., Mannell, R., Palethorpe, S. (eds.) Proceedings of the 10th Australian International Conference on Speech Science & Technology, pp. 545–550 (2004)
Google Scholar
Qin, J., Toth, A.R., Schultz, T., Black, A.W.: Speaker de-identification via voice transformation. In: IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU 2009, pp. 529–533 (2009) ISBN 978-1-4244-5478-5
Google Scholar
Dobrišek, S., Mihelič, F., Pavešić, N.: Acoustical modelling of phone transitions: biphones and diphones - what are the differences? In: Olaszy, G., Nemeth, G., Erdohegyi, K. (eds.) Proceedings of Eurospeech 1999, vol. 3, pp. 1307–1310 (1999)
Google Scholar
O’Shaughnessy, D., Barbeau, L., Bernardi, D., Archambault, D.: Diphone speech synthesis. Speech Communication 7(1), 55–65 (1988)
Article Google Scholar
Dobrišek, S.: Analysis and Recognition of Phones in Speech Signals, PhD Thesis, University of Ljubljana (2001)
Google Scholar
Žganec Gros, J., Pavešić, N., Mihelič, F.: Text-to-Speech synthesis: A complete system for the Slovenian language, vol. 5(1), pp. 11–19. CIT (1997) ISSN 1330-1136.
Google Scholar
Pobar, M., Justin, T., Žibert, J., Mihelič, F., Ipšić, I.: A Comparison of Two Approaches to Bilingual HMM-Based Speech Synthesis. In: Habernal, I., Matousek, V. (eds.) TSD 2013. LNCS (LNAI), vol. 8082, pp. 44–51. Springer, Heidelberg (2013)
Google Scholar
Zen, H., Nose, T., Yamagishi, J., et al.: The hmm-based speech synthesis system (hts) version 2.0. In: Proc. of Sixth ISCA Workshop on Speech Synthesis, pp. 294–299 (2007)
Google Scholar
Vesnicer, B., Mihelič, F.: Evaluation of the Slovenian HMM-based speech synthesis system. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 513–520. Springer, Heidelberg (2004)
Chapter Google Scholar
Mihelič, F., Žganec Gros, J., Dobrišek, S., Žibert, J., Pavešić, N.: Spoken language resources at LUKS of the University of Ljubljana. Int. J. Speech Technol. 6(3), 221–232 (2003)
Article Google Scholar
Ipšić, I., Mihelič, F., Dobrišek, S., Gros, J., Pavešić, N.: A Slovenian spoken dialog system for air flight inquiries. In: Olaszy, G., Nemeth, G., Erdohegyi, K. (eds.) Proceedings of Eurospeech 1999, vol. 6, pp. 2659–2662 (1999)
Google Scholar
Young, S.J., Evermann, G., Gales, M.J.F., et al.: The HTK Book, version 3.4.1. Cambridge University Engineering Department, Cambridge (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Electrical Engineering, University of Ljubljana, 1000, Ljubljana, Tržaška 25, Slovenia
Tadej Justin, France Mihelič & Simon Dobrišek

Authors

Tadej Justin
View author publications
You can also search for this author in PubMed Google Scholar
France Mihelič
View author publications
You can also search for this author in PubMed Google Scholar
Simon Dobrišek
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Botanicá 6a, 60200, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Department of Information Technologies, Masaryk University, 602 00, Brno, Czech Republic
Aleš Horák , Ivan Kopeček & Karel Pala , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Justin, T., Mihelič, F., Dobrišek, S. (2014). Intelligibility Assessment of the De-Identified Speech Obtained Using Phoneme Recognition and Speech Synthesis Systems. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2014. Lecture Notes in Computer Science(), vol 8655. Springer, Cham. https://doi.org/10.1007/978-3-319-10816-2_64

Download citation

DOI: https://doi.org/10.1007/978-3-319-10816-2_64
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10815-5
Online ISBN: 978-3-319-10816-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics