Articulatory-feature-based methods for performance improvement of Multilingual Phone Recognition Systems using Indian languages

Manjunath, K E; Jayagopi, Dinesh Babu; Rao, K Sreenivasa; Ramasubramanian, V

doi:10.1007/s12046-020-01428-9

Articulatory-feature-based methods for performance improvement of Multilingual Phone Recognition Systems using Indian languages

Published: 30 July 2020

Volume 45, article number 190, (2020)
Cite this article

Sādhanā Aims and scope Submit manuscript

K E Manjunath^1,3,
Dinesh Babu Jayagopi¹,
K Sreenivasa Rao² &
…
V Ramasubramanian¹

141 Accesses
2 Citations
Explore all metrics

Abstract

In this work, the performance of Multilingual Phone Recognition System (Multi-PRS) is improved using articulatory features (AFs). Four Indian languages – Kannada, Telugu, Bengali and Odia – are used for developing Multi-PRS. The transcription is derived using international phonetic alphabets (IPAs). Multi-PRS is trained using hidden Markov models and the state-of-the-art Deep Neural Networks (DNNs). AFs for five AF groups – place, manner, roundness, frontness and height – are predicted from Mel-frequency cepstral coefficients (MFCCs) using DNNs. The oracle AFs, which are derived from the ground truth IPA transcriptions, are used to set the best performance realizable by the predicted AFs. The performances of predicted and oracle AFs are compared. In addition to the AFs, the phone posteriors are explored to further boost the performance of Multi-PRS. Multi-task learning is explored to improve the prediction accuracy of AFs and thereby reduce the Phone Error Rates (PERs) of Multi-PRSs. Fusion of AFs is done using two approaches: i) lattice re-scoring approach and ii) AFs as tandem features. We show that oracle AFs by feature fusion with MFCCs offer a remarkably low target of PER of 10.4%, which is 24.7% absolute reduction compared with baseline Multi-PRS with MFCCs alone. The best performing system using predicted AFs has shown 3.2% reduction in absolute PER (9.1% reduction in relative PER) compared with baseline Multi-PRS. The best performance is obtained using the tandem approach for fusion of various AFs and phone posteriors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Articulatory Features for Multilingual Phone Recognition

Improvement of Phone Recognition Accuracy Using Articulatory Features

Article 08 May 2017

Development and Analysis of Multilingual Phone Recognition System

References

The International Phonetic Association 2007 Handbook of the International Phonetic Association. Cambridge University Press
Stuker S, Metze F, Schultz T and Waibel A 2003 Integrating multilingual articulatory features into speech recognition. In: Proceedings of INTERSPEECH, pp. 1033–1036
Manjunath K E and Sreenivasa Rao K 2017 Improvement of phone recognition accuracy using articulatory features. Circuits, Systems, and Signal Processing 37(2): 704–728
Article Google Scholar
Gerfen. 2011 Phonetics theory [online]. Available: http://www.unc.edu/\(\tilde{{\rm g}}\)erfen/Ling 30Sp2002/phonetics.html, pages 251–257
Narayanan S et al 2011 A multimodal real-time MRI articulatory corpus for speech research. In: Proceedings of INTERSPEECH, pp. 837–840
Narayanan S et al 2014 Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research (TC). The Journal of the Acoustical Society of America 136(3): 1307–1311
Article Google Scholar
Lee S, Yildirim S, Kazemzadeh A and Narayanan S 2005 An articulatory study of emotional speech production. In: Proceedings of INTERSPEECH, pp. 497–500
The Centre for Speech Technology Research, The University of Edinburgh. MOCHA-TIMIT: MOCHA MultiCHannel Articulatory database: English [online]. Available: http://www.cstr.ed.ac.uk/research/projects/artic/mocha.html
Afshan A and Ghosh P K 2016 Better acoustic normalization in subject independent acoustic-to-articulatory inversion: benefit to recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5395–5399
Mitra V, Sivaraman G, Nam H, Espy-Wilson C and Saltzman E 2014 Articulatory features from Deep Neural Networks and their role in speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3017–3021
Kirchhoff K, Fink G A and Sagerer G 2002 Combining acoustic and articulatory feature information for robust speech recognition. Speech Communication 37: 303–319
Article Google Scholar
Frankel J, Magimai-Doss M, King S, Livescu K, Cetin O 2007 Articulatory feature classifiers trained on 2000 hours of telephone speech. In: Proceedings of INTERSPEECH
Cetin O, Kantor A, King S, Bartels C, Magimai-Doss, Frankel J and Livescu K 2007 An articulatory feature-based tandem approach and factored observation modeling. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 4, p. IV-645
Rajamanohar M and Fosler-Lussier E 2005 An evaluation of hierarchical articulatory feature detectors. In: Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 59–64
Dusan S and Deng L 1998 Estimation of articulatory parameters from speech acoustics by Kalman filtering. In: Proceedings of the CITO Researcher Retreat, pp. 47–48
Wakita H 1973 Direct estimation of the vocal tract shape by inverse filtering of acoustic speech waveforms. IEEE Transactions on Audio, Speech, and Language Processing 21(5): 417–427
Google Scholar
Dhananjaya N, Yegnanarayana B and Suryakanth V G 2011 Acoustic-phonetic information from excitation source for refining manner hypotheses of a phone recognizer. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Corredor-Ardoy C, Lamel L, Adda-Decker M and Gauvain JL 1998 Multilingual phone recognition of spontaneous telephone speech. In: Proceedings of ICASSP, pp. 413–416
Schultz T and Waibel A 2001 Language independent and language adaptive acoustic modeling for speech recognition. Speech Communication 35: 31–51
Article Google Scholar
Schultz T and Waibel A 1998 Multilingual and crosslingual speech recognition. In: Proceedings of the DARPA Workshop on Broadcast News Transcription and Understanding, pp. 259–262
Schultz T and Kirchhoff K 2006 Multilingual speech processing. Academic Press
Heigold G, Vanhoucke V, Senior A, Nguyen P, Ranzato M, Devin M and Dean J 2013 Multilingual acoustic models using distributed deep neural networks. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Vu N T et al 2014 Multilingual deep neural network based acoustic modeling for rapid language adaptation. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Kumar C S, Mohandas V P and Haizhou L 2005 Multilingual speech recognition: a unified approach. In: Proceedings of INTERSPEECH, pp. 3357–3360
Gangashetty S V, Sekhar C C and Yegnanarayana B 2005 Spotting multilingual consonant–vowel units of speech using neural network models. In: Proceedings of the International Conference on Non-Linear Speech Processing (NOLISP), pp. 303–317
Mohan A, Rose R, Ghalehjegh S H and Umesh S 2014 Acoustic modelling for speech recognition in Indian languages in an agricultural commodities task domain. Speech Communication 56: 167–180
Article Google Scholar
Deng L 1997 Integrated-multilingual speech recognition using universal phonological features in a functional speech production model. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Metze F 2005 Articulatory features for conversational speech recognition. PhD Thesis, Carnegie Mellon University
Zhao Y, Zhao R, Wang X and Ji Q 2016 Multilingual articulatory features augmentation learning. In: Proceedings of the 23rd International Conference on Pattern Recognition (ICPR), pp. 2895–2899
Livescu K et al 2007 Articulatory feature-based methods for acoustic and audio-visual speech recognition: summary from the 2006 JHU summer workshop. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. IV-621–IV-624
Black A W et al 2012 Articulatory features for expressive speech synthesis. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4005–4008
Sahraeian R and Compernolle D V 2017 Crosslingual and multilingual speech recognition based on the speech manifold. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25(12): 2301–2312
Article Google Scholar
King S, Frankel J, Livescu K, McDermott E, Richmond K and Wester M 2007 Speech production knowledge in automatic speech recognition. The Journal of the Acoustical Society of America 121(2): 723–742
Article Google Scholar
Mermelstein P 1969 Computer simulation of articulatory activity in speech production. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 447–454
Mermelstein P 1973 Articulatory model for the study of speech production. The Journal of the Acoustical Society of America 53(4): 1070–1082
Article Google Scholar
Rubin P, Baer T and Mermelstein P 1981 An articulatory synthesizer for perceptual research. The Journal of the Acoustical Society of America 70(2): 321–328
Article Google Scholar
Mitra V, Wang W, Stolcke A, Nam H, Richey C, Yuan J and Liberman M 2013 Articulatory trajectories for large-vocabulary speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Frankel J and King S 2007 Speech recognition using linear dynamic models. IEEE Transactions on Audio, Speech, and Language Processing 15(1): 246–256
Article Google Scholar
Zlokarnik I 1995 Adding articulatory features to acoustic features for automatic speech recognition. The Journal of the Acoustical Society of America 97(5): 3246–3246
Article Google Scholar
Mitra V et al 2014 Articulatory features from deep neural networks and their role in speech recognition. In: Proceedings of ICASSP, pp. 3017–3021
Rasipuram R and Magimai.-Doss M 2016 Articulatory feature based continuous speech recognition using probabilistic lexical modeling. Computer Speech and Language 36: 233–259
Article Google Scholar
Stuker S, Schultz T, Metze F and Waibel A 2003 Multilingual articulatory features. In: Proceedings of ICASSP, vol. 1, pp. 144–147
Google Scholar
Schultz T 2002 GlobalPhone: a multilingual speech and text database developed at Karlsruhe university. In: Proceedings of ICSLP, Denver, CO, USA
Ore B M 2007 Multilingual articulatory features for speech recognition. Master’s Thesis, Wright State University
Rasipuram R and Magimai-Doss M 2011 Improving articulatory feature and phoneme recognition using multitask learning. In: Proceedings of Artificial Neural Networks and Machine Learning (ICANN), vol. 6791, pp. 299–306
Google Scholar
Muller M, Stuker S and Waibel A 2016 Towards improving low-resource speech recognition using articulatory and language features. In: Proceedings of the International Workshop on Spoken Language Translation (IWSLT), pp. 1–7
Muller M and Waibel A 2015 Using language adaptive deep neural networks for improved multilingual speech recognition. In: Proceedings of the International Workshop on Spoken Language Translation (IWSLT)
Sahraeian R 2017 Acoustic modeling of under-resourced languages. PhD Thesis, Katholieke Universiteit Leuven (KU Leuven)
Sahraeian R, Compernolle D V and de Wet F 2014 On using intrinsic spectral analysis for low-resource languages. In: Proceedings of the 4th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU)
Dash D, Kim M, Teplansky K and Wang J 2018 Automatic speech recognition with articulatory information and a unified dictionary for Hindi, Marathi, Bengali, and Oriya. In: Proceedings of INTERSPEECH
Manjunath K E, Rao K S, Jayagopi D B and Ramasubramanian V 2018 Indian languages ASR: a multilingual phone recognition framework with IPA based common phone-set, predicted articulatory features and feature fusion. In: Proceedings of INTERSPEECH
Development of prosodically guided phonetic engine for searching speech databases in Indian languages [online]. http://speech.iiit.ac.in/svldownloads/pro_po_en_report/
Kumar S B S, Rao K S and Pati D 2013 Phonetic and prosodically rich transcribed speech corpus in Indian languages: Bengali and Odia. In: Proceedings of O-COCOSDA, pp. 1–5
Shridhara MV, Banahatti BK, Narthan L, Karjigi V and Kumaraswamy R 2013 Development of Kannada speech corpus for prosodically guided phonetic search engine. In: Proceedings of O-COCOSDA, pp. 1–6
Madhavi M C, Sharma S and Patil H A 2014 Development of language resources for speech application in Gujarati and Marathi. In: Proceedings of the IEEE International Conference on Asian Language Processing (IALP), vol. 1, pp. 115–118
Google Scholar
Sarma B D, Sarma M, Sarma M and Prasanna S R M 2013 Development of Assamese phonetic engine: some issues. In: Proceedings of IEEE INDICON, pp. 1–6
Manjunath K E and Sreenivasa Rao K 2014 Automatic phonetic transcription for read, extempore and conversation speech for an Indian language: Bengali. In: Proceedings of the IEEE National Conference on Communications (NCC)
Riedhammer K T, Bocklet T, Ghoshal A and Povey D 2012 Revisiting semi-continuous hidden Markov models. In: Proceedings of ICASSP, pp. 4721–4724
Zhang X, Trmal J, Povey D and Khudanpur S 2014 Improving deep neural network acoustic models using generalized maxout networks. In: Proceedings of ICASSP, pp. 215–219
Povey D et al 2011 The Kaldi Speech Recognition Toolkit. In: Proceedings of the IEEE Workshop on ASRU
Manjunath K E, Jayagopi D B, Rao K S and Ramasubramanian V 2019 Development and analysis of multilingual phone recognition systems using Indian languages. International Journal of Speech Technology
Sclite Tool [online]. http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/sclite.htm
Manjunath K E, Sreenivasa Rao K and Jayagopi D B 2017 Development of multilingual phone recognition system for Indian languages. In: Proceedings of the IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES)
Erler K and Freeman G H 1996 An HMM-based speech recognizer using overlapping articulatory features. Journal of Acoustic Society of America 100(4): 2500–2513
Article Google Scholar
Ohman S E G 1965 Coarticulation in VCV utterances: spectrographic measurements. Journal of Acoustic Society of America 39(1): 151–168
Article Google Scholar
Ramachandran VR Coarticulation knowledge for a text-to-speech system for an Indian language. MS Thesis, Speech and Vision Laboratory, Indian Institute of Technology Madras, India
Hermansky H, Ellis D P and Sharma S 2000 Tandem connectionist feature extraction for conventional HMM systems. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 3, pp. 1635–1638
Lal P and King S 2013 Cross-lingual automatic speech recognition using tandem features. IEEE Transactions on Audio, Speech, and Language Processing 21(12): 2506–2515
Article Google Scholar
Siniscalchi S M, Li J and Lee C 2006 A study on lattice rescoring with knowledge scores for automatic speech recognition. In: Proceedings of INTERSPEECH, pp. 517–520
Rasipuram R and Magimai-Doss M 2011 Integrating articulatory features using Kullback–Leibler divergence based acoustic model for phoneme recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 5192–5195
Ketabdar H and Bourlard H 2008 Hierarchical integration of phonetic and lexical knowledge in phone posterior estimation. In: Proceedings of ICASSP, pp. 4065–4068
Ketabdar H and Bourlard H 2010 Enhanced phone posteriors for improving speech recognition systems. IEEE Transactions on Audio, Speech, and Language Processing 18(6): 1094–1106
Article Google Scholar
Caruana R 1998 Multitask learning. In: Learning to learn. Boston, MA: Springer, pp. 95–133

Download references

Acknowledgements

We thank Prof. B Yegnanarayana, Prof. K Sri Rama Murthy and Prof. R Kumaraswamy for providing Telugu and Kannada datasets.

Author information

Authors and Affiliations

International Institute of Information Technology Bangalore, Bengaluru, India
K E Manjunath, Dinesh Babu Jayagopi & V Ramasubramanian
Indian Institute of Technology Kharagpur, Kharagpur, India
K Sreenivasa Rao
U R Rao Satellite Centre, Indian Space Research Organisation, Bengaluru, India
K E Manjunath

Authors

K E Manjunath
View author publications
You can also search for this author in PubMed Google Scholar
Dinesh Babu Jayagopi
View author publications
You can also search for this author in PubMed Google Scholar
K Sreenivasa Rao
View author publications
You can also search for this author in PubMed Google Scholar
V Ramasubramanian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K E Manjunath.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Manjunath, K.E., Jayagopi, D.B., Rao, K.S. et al. Articulatory-feature-based methods for performance improvement of Multilingual Phone Recognition Systems using Indian languages. Sādhanā 45, 190 (2020). https://doi.org/10.1007/s12046-020-01428-9

Download citation

Received: 09 May 2019
Revised: 26 December 2019
Accepted: 16 April 2020
Published: 30 July 2020
DOI: https://doi.org/10.1007/s12046-020-01428-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Articulatory-feature-based methods for performance improvement of Multilingual Phone Recognition Systems using Indian languages

Abstract

Access this article

Similar content being viewed by others

Articulatory Features for Multilingual Phone Recognition

Improvement of Phone Recognition Accuracy Using Articulatory Features

Development and Analysis of Multilingual Phone Recognition System

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Articulatory-feature-based methods for performance improvement of Multilingual Phone Recognition Systems using Indian languages

Abstract

Access this article

Similar content being viewed by others

Articulatory Features for Multilingual Phone Recognition

Improvement of Phone Recognition Accuracy Using Articulatory Features

Development and Analysis of Multilingual Phone Recognition System

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation