Efficient data selection for ASR

Kleynhans, Neil Taylor; Barnard, Etienne

doi:10.1007/s10579-014-9285-0

Efficient data selection for ASR

Original Paper
Published: 14 October 2014

Volume 49, pages 327–353, (2015)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Neil Taylor Kleynhans¹ &
Etienne Barnard²

318 Accesses
4 Citations
Explore all metrics

Abstract

Automatic speech recognition (ASR) technology has matured over the past few decades and has made significant impacts in a variety of fields, from assistive technologies to commercial products. However, ASR system development is a resource intensive activity and requires language resources in the form of text annotated audio recordings and pronunciation dictionaries. Unfortunately, many languages found in the developing world fall into the resource-scarce category and due to this resource scarcity the deployment of ASR systems in the developing world is severely inhibited. One approach to assist with resource-scarce ASR system development, is to select “useful” training samples which could reduce the resources needed to collect new corpora. In this work, we propose a new data selection framework which can be used to design a speech recognition corpus. We show for limited data sets, independent of language and bandwidth, the most effective strategy for data selection is frequency-matched selection and that the widely-used maximum entropy methods generally produced the least promising results. In our model, the frequency-matched selection method corresponds to a logarithmic relationship between accuracy and corpus size; we also investigated other model relationships, and found that a hyperbolic relationship (as suggested from simple asymptotic arguments in learning theory) may lead to somewhat better performance under certain conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Barnard, E. (1994). A model for nonpolynomial decrease in error rate with increasing sample size. IEEE Transactions on Neural Networks, 5(6), 994–997.
Article Google Scholar
Barnard, E., Davel, M., & van Heerden, C. (2009). ASR corpus design for resource-scarce languages. In Proceedings of INTERSPEECH, ISCA (pp. 2847–2850). Brighton, UK.
Erol, B., Cohen, J., Etoh, M., Hon, H. W., Luo, J., & Schalkwyk, J. (2009). Mobile media search. In Proceedings of the international conference on acoustics, speech and signal processing (ICASSP) (pp. 4897–4900). Taipei, Taiwan.
Fisher, W. M., Doddington, G. R., & Goudie-Marshall, K. M. (1986). The DARPA speech recognition research database: specifications and status. In Proceedings of the DARPA workshop on speech recognition (pp. 93–99).
Gillick, L., & Cox, S. J. (1989). Some statistical issues in the comparison of speech recognition algorithms. In Proceedings of the international conference on acoustics, speech and signal processing (ICASSP), Vol. 1 (pp. 532–535). Glasgow, Scotland.
Gouvêa, E., & Davel, M. H. (2011). Kullback-Leibler divergence-based ASR training data selection. In Proceedings of INTERSPEECH (pp. 2297–2300). Florence, Italy.
Graff, D., Wu, Z., MacIntyre, R., & Liberman, M. (1997). The 1996 broadcast news speech and language-model corpus. In Proceedings of the DARPA workshop on spoken language technology (pp. 11–14). Citeseer.
Kleynhans, N. T. (2013). Automatic speech recognition for resource-scarce environments. Ph.D. thesis, North-West University, Potchefstroom Campus.
Moore, R. K. (2003). A comparison of the data requirements of automatic speech recognition systems and human listeners. In Proceedings of EUROSPEECH (pp. 2582–2584). Geneva, Switzerland.
Navratil, J. (2001). Spoken language recognition-a step toward multilinguality in speech processing. IEEE Transactions on Speech and Audio Processing, 9(6), 678–685.
Article Google Scholar
Paul, D. B., & Baker, J. M. (1992). The design for the Wall Street Journal-based CSR corpus. In Proceedings of the workshop on speech and natural language, association for computational linguistics (pp. 357–362).
Rabiner, L. R. (1997). Applications of speech recognition in the area of telecommunications. In Proceedings of the IEEE workshop on automatic speech recognition and understanding, 1997 (pp. 501–510). Santa Barbara, California, USA.
Reynolds, D. A. (2001). Automatic speaker recognition: Current approaches and future trends. In Proceedings of the international conference on acoustics, speech and signal processing (ICASSP) (pp. 1–6). Salt Lake City, Utah, USA.
Santen, J. P. H., & Buchsbaum, A. L. (1997). Methods for optimal text selection. In: Proceedings of EUROSPEECH, ISCA (pp. 553–556). Rhodes, Greece.
Wu, Y., Zhang, R., & Rudnicky, A. (2007). Data selection for speech recognition. In: IEEE workshop on automatic speech recognition and understanding, ASRU, 2007 (pp. 562–565). Pittsburgh, Pennsylvania, USA.
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., et al. (2009). The HTK book. Revised for HTK version 3.4 http://htk.eng.cam.ac.uk.

Download references

Author information

Authors and Affiliations

Human Language Technologies Research Group, Meraka Institute, CSIR, Meiring Naude Road, Brummeria, Pretoria, South Africa
Neil Taylor Kleynhans
MuST Group, North-West University, Vaal Triangle Campus, 171 Hendrik van Eck Blvd, Vanderbijlpark, 1900, South Africa
Etienne Barnard

Authors

Neil Taylor Kleynhans
View author publications
You can also search for this author in PubMed Google Scholar
Etienne Barnard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Neil Taylor Kleynhans.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kleynhans, N.T., Barnard, E. Efficient data selection for ASR. Lang Resources & Evaluation 49, 327–353 (2015). https://doi.org/10.1007/s10579-014-9285-0

Download citation

Published: 14 October 2014
Issue Date: June 2015
DOI: https://doi.org/10.1007/s10579-014-9285-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient data selection for ASR

Abstract

Access this article

Similar content being viewed by others

Autoencoders and their applications in machine learning: a survey

Automatic speech recognition: a survey

Speech Emotion Recognition: A Comprehensive Survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient data selection for ASR

Abstract

Access this article

Similar content being viewed by others

Autoencoders and their applications in machine learning: a survey

Automatic speech recognition: a survey

Speech Emotion Recognition: A Comprehensive Survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation