Skip to main content

Improved Estimation of Articulatory Features Based on Acoustic Features with Temporal Context

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9302))

Included in the following conference series:

Abstract

The paper deals with neural network-based estimation of articulatory features for Czech which are intended to be applied within automatic phonetic segmentation or automatic speech recognition. In our current approach we use the multi-layer perceptron networks to extract the articulatory features on the basis of non-linear mapping from standard acoustic features extracted from speech signal. The suitability of various acoustic features and the optimum length of temporal context at the input of used network were analysed. The temporal context is represented by a context window created from the stacked feature vectors. The optimum length of the temporal contextual information was analysed and identified for the context window in the range from 9 to 21 frames. We obtained 90.5% frame level accuracy on average across all the articulatory feature classes for mel-log filter-bank features. The highest classification rate of 95.3% was achieved for the voicing class.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Qian, Y., Liu, J.: Articulatory feature based multilingual mlps for low-resource speech recognition. In: INTERSPEECH. ISCA (2012)

    Google Scholar 

  2. Qian, Y., Xu, J., Povey, D., Liu, J.: Strategies for using mlp based features with limited target-language training data. In: 2011 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 354–358 (December 2011)

    Google Scholar 

  3. Frankel, J., Magimai-Doss, M., King, S., Livescu, K., Cetin, O.: Articulatory feature classifiers trained on 2000 hours of telephone speech. In: Proceedings of Interspeech, Antwerp, Belgium (2007)

    Google Scholar 

  4. Carson-Berndsen, J.: Articulatory-acoustic-feature-based automatic language identification. In: ISCA - MultiLing 2006, Stellenbosch, South Africa (2006)

    Google Scholar 

  5. Zhang, S.X., Mak, M.: High-level speaker verification via articulatory-feature based sequence kernels and SVM. In: Proceedings of Interspeech, Brisbane, Australia (2008)

    Google Scholar 

  6. Kirchhoff, K.: Combining articulatory and acoustic information for speech recognition in noisy and reverberant environments. In: Proceedings of ICSLP (1998)

    Google Scholar 

  7. Rasipuram, R., Magimai-Doss, M.: Improving articulatory feature and phoneme recognition using multitask learning. In: Honkela, T. (ed.) ICANN 2011, Part I. LNCS, vol. 6791, pp. 299–306. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  8. Frankel, J., Wester, M., King, S.: Articulatory feature recognition using dynamic Bayesian networks. Computer Speech & Language, 620–640 (2007)

    Google Scholar 

  9. Næss, A.B., Livescu, K., Prabhavalkar, R.: Articulatory feature classification using nearest neighbors. In: INTERSPEECH, Florence, Italy, ISCA, pp. 2301–2304 (2011)

    Google Scholar 

  10. Li, J., Yu, D., Huang, J.T., Gong, Y.: Improving wideband speech recognition using mixed-bandwidth training data in cd-dnn-hmm. In: 2012 IEEE Spoken Language Technology Workshop (SLT), pp. 131–136 (December 2012)

    Google Scholar 

  11. Abdel-Hamid, O., Mohamed, A.R., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22(10), 1533–1545 (2014)

    Article  Google Scholar 

  12. Morgan, N., et al.: Pushing the envelope - aside [speech recognition]. IEEE Signal Processing Magazine 22(5), 81–88 (2005)

    Article  Google Scholar 

  13. Pinto, J.P., Prasanna, S.R.M., Yegnanarayana, B., Hermansky, H.: Significance of contextual information in phoneme recognition. Idiap-RR Idiap-RR-28-2007, IDIAP (2007)

    Google Scholar 

  14. Yu, D., Siniscalchi, S., Deng, L., Lee, C.: Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition. In: Acoustics, Speech and Signal Processing (ICASSP), pp. 4169–4172 (2012)

    Google Scholar 

  15. Abdel-hamid, O., Rahman Mohamed, A., Jiang, H., Penn, G.: Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In: Acoustics, Speech and Signal Processing (ICASSP), pp. 4277–4280 (2012)

    Google Scholar 

  16. Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition. Signal Processing Magazine (2012)

    Google Scholar 

  17. Volín, J.: Phonetic and phonology. In: Cvrček, V., et al. (eds.) Grammar of Contemporary Czech. Karolinum (2013) 35–64; In Czech language: Mluvnice současné češtiny

    Google Scholar 

  18. Wells, J.C., Batusek, R., Matousek, J., Hanzl, V.: Czech SAMPA Home Page (2003). http://www.phon.ucl.ac.uk/home/sampa/czech-uni.htm

  19. Mizera, P., Pollak, P.: Robust neural network-based estimation of articulatory features for czech. Neural Network World 24(5), 463–478 (2014)

    Article  Google Scholar 

  20. Schwarz, P.: Phoneme recognition based on long temporal context, PhD Thesis (2009)

    Google Scholar 

  21. Park, J., Diehl, F., Gales, M.J.F., Tomalin, M., Woodland, P.C.: Efficient generation and use of mlp features for arabic speech recognition (2008)

    Google Scholar 

  22. Kirchhoff, K.: Robust speech recognition using articulatory information. PhD thesis, Der. Technischen Fakultaet der Universitaet Bielefeld (June 1999)

    Google Scholar 

  23. Pollak, P., Cernocky, J.: Czech SPEECON adult database. Czech Technical University in Prague & Brno University of Technology, Technical report (April 2004)

    Google Scholar 

  24. Fousek, P., Mizera, P., Pollak, P.: Ctucopy feature extraction tool. http://noel.feld.cvut.cz/speechlab/

  25. Povey, D., Ghoshal, A., et al.: The Kaldi speech recognition toolkit. In: Proc. of ASRU 2011, IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Petr Mizera .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Mizera, P., Pollak, P. (2015). Improved Estimation of Articulatory Features Based on Acoustic Features with Temporal Context. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_63

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24033-6_63

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24032-9

  • Online ISBN: 978-3-319-24033-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics