Improved Estimation of Articulatory Features Based on Acoustic Features with Temporal Context

Mizera, Petr; Pollak, Petr

doi:10.1007/978-3-319-24033-6_63

Petr Mizera¹⁵ &
Petr Pollak¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9302))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1 Citations

Abstract

The paper deals with neural network-based estimation of articulatory features for Czech which are intended to be applied within automatic phonetic segmentation or automatic speech recognition. In our current approach we use the multi-layer perceptron networks to extract the articulatory features on the basis of non-linear mapping from standard acoustic features extracted from speech signal. The suitability of various acoustic features and the optimum length of temporal context at the input of used network were analysed. The temporal context is represented by a context window created from the stacked feature vectors. The optimum length of the temporal contextual information was analysed and identified for the context window in the range from 9 to 21 frames. We obtained 90.5% frame level accuracy on average across all the articulatory feature classes for mel-log filter-bank features. The highest classification rate of 95.3% was achieved for the voicing class.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Qian, Y., Liu, J.: Articulatory feature based multilingual mlps for low-resource speech recognition. In: INTERSPEECH. ISCA (2012)
Google Scholar
Qian, Y., Xu, J., Povey, D., Liu, J.: Strategies for using mlp based features with limited target-language training data. In: 2011 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 354–358 (December 2011)
Google Scholar
Frankel, J., Magimai-Doss, M., King, S., Livescu, K., Cetin, O.: Articulatory feature classifiers trained on 2000 hours of telephone speech. In: Proceedings of Interspeech, Antwerp, Belgium (2007)
Google Scholar
Carson-Berndsen, J.: Articulatory-acoustic-feature-based automatic language identification. In: ISCA - MultiLing 2006, Stellenbosch, South Africa (2006)
Google Scholar
Zhang, S.X., Mak, M.: High-level speaker verification via articulatory-feature based sequence kernels and SVM. In: Proceedings of Interspeech, Brisbane, Australia (2008)
Google Scholar
Kirchhoff, K.: Combining articulatory and acoustic information for speech recognition in noisy and reverberant environments. In: Proceedings of ICSLP (1998)
Google Scholar
Rasipuram, R., Magimai-Doss, M.: Improving articulatory feature and phoneme recognition using multitask learning. In: Honkela, T. (ed.) ICANN 2011, Part I. LNCS, vol. 6791, pp. 299–306. Springer, Heidelberg (2011)
Chapter Google Scholar
Frankel, J., Wester, M., King, S.: Articulatory feature recognition using dynamic Bayesian networks. Computer Speech & Language, 620–640 (2007)
Google Scholar
Næss, A.B., Livescu, K., Prabhavalkar, R.: Articulatory feature classification using nearest neighbors. In: INTERSPEECH, Florence, Italy, ISCA, pp. 2301–2304 (2011)
Google Scholar
Li, J., Yu, D., Huang, J.T., Gong, Y.: Improving wideband speech recognition using mixed-bandwidth training data in cd-dnn-hmm. In: 2012 IEEE Spoken Language Technology Workshop (SLT), pp. 131–136 (December 2012)
Google Scholar
Abdel-Hamid, O., Mohamed, A.R., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22(10), 1533–1545 (2014)
Article Google Scholar
Morgan, N., et al.: Pushing the envelope - aside [speech recognition]. IEEE Signal Processing Magazine 22(5), 81–88 (2005)
Article Google Scholar
Pinto, J.P., Prasanna, S.R.M., Yegnanarayana, B., Hermansky, H.: Significance of contextual information in phoneme recognition. Idiap-RR Idiap-RR-28-2007, IDIAP (2007)
Google Scholar
Yu, D., Siniscalchi, S., Deng, L., Lee, C.: Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition. In: Acoustics, Speech and Signal Processing (ICASSP), pp. 4169–4172 (2012)
Google Scholar
Abdel-hamid, O., Rahman Mohamed, A., Jiang, H., Penn, G.: Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In: Acoustics, Speech and Signal Processing (ICASSP), pp. 4277–4280 (2012)
Google Scholar
Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition. Signal Processing Magazine (2012)
Google Scholar
Volín, J.: Phonetic and phonology. In: Cvrček, V., et al. (eds.) Grammar of Contemporary Czech. Karolinum (2013) 35–64; In Czech language: Mluvnice současné češtiny
Google Scholar
Wells, J.C., Batusek, R., Matousek, J., Hanzl, V.: Czech SAMPA Home Page (2003). http://www.phon.ucl.ac.uk/home/sampa/czech-uni.htm
Mizera, P., Pollak, P.: Robust neural network-based estimation of articulatory features for czech. Neural Network World 24(5), 463–478 (2014)
Article Google Scholar
Schwarz, P.: Phoneme recognition based on long temporal context, PhD Thesis (2009)
Google Scholar
Park, J., Diehl, F., Gales, M.J.F., Tomalin, M., Woodland, P.C.: Efficient generation and use of mlp features for arabic speech recognition (2008)
Google Scholar
Kirchhoff, K.: Robust speech recognition using articulatory information. PhD thesis, Der. Technischen Fakultaet der Universitaet Bielefeld (June 1999)
Google Scholar
Pollak, P., Cernocky, J.: Czech SPEECON adult database. Czech Technical University in Prague & Brno University of Technology, Technical report (April 2004)
Google Scholar
Fousek, P., Mizera, P., Pollak, P.: Ctucopy feature extraction tool. http://noel.feld.cvut.cz/speechlab/
Povey, D., Ghoshal, A., et al.: The Kaldi speech recognition toolkit. In: Proc. of ASRU 2011, IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Electrical Engineering, Czech Technical University in Prague, K13131, Technicka 2, 166 27 Praha 6, Prague, Czech Republic
Petr Mizera & Petr Pollak

Authors

Petr Mizera
View author publications
You can also search for this author in PubMed Google Scholar
Petr Pollak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Petr Mizera .

Editor information

Editors and Affiliations

University of West Bohemia, Pilsen, Czech Republic
Pavel Král
University of West Bohemia, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mizera, P., Pollak, P. (2015). Improved Estimation of Articulatory Features Based on Acoustic Features with Temporal Context. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_63

Download citation

DOI: https://doi.org/10.1007/978-3-319-24033-6_63
Published: 11 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24032-9
Online ISBN: 978-3-319-24033-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics