Cross-Language Acoustic Modeling for Macedonian Speech Technology Applications
This paper presents a cross-language development method for speech recognition and synthesis applications for Macedonian language. Unified system for speech recognition and synthesis trained on German language data was used for acoustic model bootstrapping and adaptation. Both knowledge-based and data-driven approaches for source and target language phoneme mapping were used for initial transcription and labeling of small amount of recorded speech. The recognition experiments on the source language acoustic model with target language dataset showed significant recognition performance degradation. Acceptable performance was achieved after Maximum a posteriori (MAP) model adaptation with limited amount of target language data, allowing suitable use for small to medium vocabulary speech recognition applications. The same unified system was used again to train new separate acoustic model for HMM based synthesis. Qualitative analysis showed, despite the low quality of the available recordings and sub-optimal phoneme mapping, that HMM synthesis produces perceptually good and intelligible synthetic speech.
Keywordsspeech recognition speech synthesis cross-language bootstrapping
Unable to display preview. Download preview PDF.
- 1.Vu, N.T., Kraus, F., Schultz, T.: Rapid building of an ASR system for Under-Resourced Languages based on Multilingual Unsupervised training. In: Interspeech 2011, Florence, Italy, August 28 (2011)Google Scholar
- 2.Schultz, T., Waibel, A.: Experiments on Cross-language Acoustic Modeling. In: Proceedings of the 7th European Conference on Speech Communication and Technology, Eurospeech 2001, Aalborg, Denmark, p. 2721 (2001)Google Scholar
- 3.Le, V.B., Besacier, L.: First steps in fast acoustic modeling for a new target language: application to Vietnamese. In: ICASSP 2005, Philadelphia, USA, March 19-23, vol. 1, pp. 821–824 (2005)Google Scholar
- 4.Martin, T., Sridharan, S.: Cross-language acoustic model refinement for the Indonesian language. In: International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 865–868 (March 2005)Google Scholar
- 5.Lööf, J., Gollan, C., Ney, H.: Cross-language Bootstrapping for Unsupervised Acoustic Model Training: Rapid Development of a Polish Speech Recognition System. In: Interspeech, pp. 88–91 (September 2009)Google Scholar
- 6.Le, V.B., Besacier, L., Schultz, T.: Acoustic-Phonetic Unit Similarities for Context Dependent Acoustic Model Portability. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006 (2006)Google Scholar
- 7.Chungurski, S., Kraljevski, I., Mihajlov, D., Arsenovski, S.: Concatenative speech synthesizers and speech corpus for Macedonian language. In: 30th International Conference on Information Technology Interfaces, Dubrovnik, Croatia, June 23-26, pp. 669–674 (2008)Google Scholar
- 9.Strecha, G., Wolff, M.: Speech synthesis using HMM based diphone inventory encoding for low-resource devices. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 22-27, pp. 5380–5383 (2011)Google Scholar
- 10.Bub, T., Schwinn, J.: VERBMOBIL: The Evolution of a Complex Large Speech-to-Speech Translation System. In: Int. Conf. on Spoken Language Processing, Philadelphia, PA, USA, vol. 4, pp. 2371–2374 (October 1996)Google Scholar
- 12.Imai, S., Sumita, K., Furuichi, C.: Mel log spectrum approximation (MLSA) filter for speech synthesis. Trans. IECE J66-A, 122–129 (1983)Google Scholar
- 13.Tokuda, K., et al.: Speech parameter generation algorithms for HMM-based speech synthesis. In: ICASSP. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Istanbul, June 5-9, vol. III, pp. 1315–1318. IEEE Computer Society Press, Los Alamitos (2000)Google Scholar
- 14.Hoffmann, R., Hirschfeld, D., Jokisch, O., Kordon, U., Mixdorff, H., Mehnert, D.: Evaluation of a multilingual TTS system with respect to the prosodic quality. In: Proc. 14th Intern. Congress of Phonetic Sciences (ICPhS), San Francisco, USA, August 1-7, pp. 2307–2310 (1999)Google Scholar