A Survey of Recent DNN Architectures on the TIMIT Phone Recognition Task

Michálek, Josef; Vaněk, Jan

doi:10.1007/978-3-030-00794-2_47

A Survey of Recent DNN Architectures on the TIMIT Phone Recognition Task

Conference paper
First Online: 08 September 2018

1527 Accesses
8 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11107))

Abstract

In this survey paper, we have evaluated several recent deep neural network (DNN) architectures on a TIMIT phone recognition task. We chose the TIMIT corpus due to its popularity and broad availability in the community. It also simulates a low-resource scenario that is helpful in minor languages. Also, we prefer the phone recognition task because it is much more sensitive to an acoustic model quality than a large vocabulary continuous speech recognition (LVCSR) task. In recent years, many DNN published papers reported results on TIMIT. However, the reported phone error rates (PERs) were often much higher than a PER of a simple feed-forward (FF) DNN. That was the main motivation of this paper: To provide a baseline DNNs with open-source scripts to easily replicate the baseline results for future papers with lowest possible PERs. According to our knowledge, the best-achieved PER of this survey is better than the best-published PER to date.

This paper was supported by the project no. P103/12/G084 of the Grant Agency of the Czech Republic and by the grant of the University of West Bohemia, project No. SGS-2016-039. Access to computing and storage facilities owned by parties and projects contributing to the National Grid Infrastructure MetaCentrum provided under the programme “Projects of Large Research, Development, and Innovations Infrastructures” (CESNET LM2015042), is greatly appreciated.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

A flexible framework of neural networks for deep learning. https://chainer.org
Garofolo, J.S., et al.: TIMIT Acoustic-Phonetic Continuous Speech Corpus. Linguistic Data Consortium LDC93S1 (1993)
Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Article MathSciNet Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Kaldi speech recognition toolkit. https://github.com/kaldi-asr/kaldi
Larochelle, H., Bengio, Y., Louradour, J., Lamblin, P.: Exploring strategies for training deep neural networks. J. Mach. Learn. Res. 10(Jan), 1–40 (2009)
MATH Google Scholar
Lopes, C., Perdigao, F.: Phoneme Recognition on the TIMIT Database. Speech Technologies (2011). https://doi.org/10.5772/17600, http://www.intechopen.com/books/speech-technologies/phoneme-recognition-on-the-timit-database
Google Scholar
Mohamed, A., Dahl, G.E., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process. 20(1), 14–22 (2012). https://doi.org/10.1109/TASL.2011.2109382
Article Google Scholar
Moon, T., Choi, H., Lee, H., Song, I.: RNNDROP: a novel dropout for RNNs in ASR. In: Proceedings of the ASRU (2015)
Google Scholar
Peddinti, V., Povey, D., Khudanpur, S.: A time delay neural network architecture for efficient modeling of long temporal contexts. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Tóth, L.: Convolutional deep rectifier neural nets for phone recognition. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp. 1722–1726, August 2013
Google Scholar
Tóth, L.: Convolutional deep maxout networks for phone recognition. In: Proceedings of the INTERSPEECH, pp. 1078–1082 (2014). https://doi.org/10.1186/s13636-015-0068-3
Vaněk, J., Zelinka, J., Soutner, D., Psutka, J.: A regularization post layer: an additional way how to make deep neural networks robust. In: Camelin, N., Estève, Y., Martín-Vide, C. (eds.) SLSP 2017. LNCS (LNAI), vol. 10583, pp. 204–214. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68456-7_17
Chapter Google Scholar
Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., Lang, K.J.: Phoneme recognition using time-delay neural networks. In: Readings in Speech Recognition, pp. 393–404. Elsevier (1990)
Google Scholar

Download references

Author information

Authors and Affiliations

University of West Bohemia, Univerzitní 8, 301 00, Pilsen, Czech Republic
Josef Michálek & Jan Vaněk

Authors

Josef Michálek
View author publications
You can also search for this author in PubMed Google Scholar
Jan Vaněk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Josef Michálek .

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Aleš Horák
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Ivan Kopeček
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Michálek, J., Vaněk, J. (2018). A Survey of Recent DNN Architectures on the TIMIT Phone Recognition Task. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2018. Lecture Notes in Computer Science(), vol 11107. Springer, Cham. https://doi.org/10.1007/978-3-030-00794-2_47

Download citation

DOI: https://doi.org/10.1007/978-3-030-00794-2_47
Published: 08 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00793-5
Online ISBN: 978-3-030-00794-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics