Validation of phonetic transcriptions in the context of automatic speech recognition

Van Bael, Christophe; van den Heuvel, Henk; Strik, Helmer

doi:10.1007/s10579-007-9033-9

Validation of phonetic transcriptions in the context of automatic speech recognition

Published: 17 July 2007

Volume 41, pages 129–146, (2007)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Christophe Van Bael¹,
Henk van den Heuvel¹ &
Helmer Strik¹

3 Citations
Explore all metrics

Abstract

Some of the speech databases and large spoken language corpora that have been collected during the last fifteen years have been (at least partly) annotated with a broad phonetic transcription. Such phonetic transcriptions are often validated in terms of their resemblance to a handcrafted reference transcription. However, there are at least two methodological issues questioning this validation method. First, no reference transcription can fully represent the phonetic truth. This calls into question the status of such a transcription as a single reference for the quality of other phonetic transcriptions. Second, phonetic transcriptions are often generated to serve various purposes, none of which are considered when the transcriptions are compared to a reference transcription that was not made with the same purpose in mind. Since phonetic transcriptions are often used for the development of automatic speech recognition (ASR) systems, and since the relationship between ASR performance and a transcription’s resemblance to a reference transcription does not seem to be straightforward, we verified whether phonetic transcriptions that are to be used for ASR development can be justifiably validated in terms of their similarity to a purpose-independent reference transcription. To this end, we validated canonical representations and manually verified broad phonetic transcriptions of read speech and spontaneous telephone dialogues in terms of their resemblance to a handcrafted reference transcription on the one hand, and in terms of their suitability for ASR development on the other hand. Whereas the manually verified phonetic transcriptions resembled the reference transcription much closer than the canonical representations, the use of both transcription types yielded similar recognition results. The difference between the outcomes of the two validation methods has two implications. First, ASR developers can save themselves the effort of collecting expensive reference transcriptions in order to validate phonetic transcriptions of speech databases or spoken language corpora. Second, phonetic transcriptions should preferably be validated in terms of the application they will serve because a higher resemblance to a purpose-independent reference transcription is no guarantee for a transcription to be better suited for ASR development.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Prompt Engineering in Large Language Models

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Machine translation systems and quality assessment: a systematic review

Article Open access 10 April 2021

Abbreviations

ASR:: Automatic speech recognition
CGN:: Corpus Gesproken Nederlands—Spoken Dutch Corpus
MPT:: Manual phonetic transcription
RT:: Reference transcription
WER:: Word error rate

References

Baayen, R. H., Piepenbrock, R., & Gulikers, L. (1995). The CELEX lexical database (release 2). Philadelphia, PA: Linguistic Data Consortium, University of Pennsylvania.
Google Scholar
Binnenpoorte, D., & Cucchiarini, C. (2003). Phonetic transcription of large speech corpora: How to boost efficiency without affecting quality. In Proceedings of the 15th international congress of phonetic sciences (ICPhS), Barcelona, Spain, pp. 2981–2984.
Binnenpoorte, D., Goddijn, S. M. A., & Cucchiarini, C. (2003). How to improve human and machine transcriptions of spontaneous speech. In Proceedings of the ISCA/IEEE workshop on spontaneous speech processing and recognition (SSPR), Tokyo, Japan, pp. 147–150.
Booij, G. (1999). The phonology of Dutch. New York: Oxford University Press.
Google Scholar
CELEX Lexical database (2005). [http://www.ru.nl/celex/].
Cucchiarini, C. (1993). Phonetic transcription: A methodological and empirical study. Ph.D. Dissertation, University of Nijmegen, the Netherlands.
Cucchiarini, C. (1996). Assessing transcription agreement: Methodological aspects. Clinical Linguistics and Phonetics, 10(2), 131–155.
Article Google Scholar
Goddijn, S. M. A., & Binnenpoorte, D. (2003). Assessing manually corrected broad phonetic transcriptions in the spoken Dutch Corpus. In Proceedings of the 15th international congress of phonetic sciences (ICPhS), Barcelona, Spain, pp. 1361–1364.
Godfrey, J., Holliman, E., & McDaniel, J. (1992). SWITCHBOARD: Telephone speech corpus for research and development. In Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), San Francisco, USA, pp. 737–740.
Greenberg, S. (1997). The Switchboard Transcription Project. Research Report #24, 1996. Large vocabulary continuous speech recognition summer research workshop technical report series. Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD, USA.
Hoste, V., Daelemans, W., Tjong Kim Sang, E., & Gillis, S. (2000). Meta-learning for phonemic annotation of corpora. In Proceedings of the 17th international conference on machine learning (ICML), Stanford University, CA, USA, pp. 375–382.
Kerkhoff, J., & Rietveld, T. (1994). Prosody in Niros with Fonpars and Alfeios. In Proceedings of the Department of Language and Speech, Vol. 18, University of Nijmegen, pp. 107–119.
Kessens, J. M., & Strik, H. (2004). On automatic phonetic transcription quality: Lower word error rates do not guarantee better transcriptions. Computer, Speech and Language, 18, 123–141.
Article Google Scholar
Kipp, A., Wesenick, M.-B., & Schiel, F. (1996). Automatic detection and segmentation of pronunciation variants in German speech corpora. In Proceedings of the international conference on spoken language processing (ICSLP), Philadelphia, USA, pp. 106–109.
Kuijpers, C., & van Donselaar, W. (1997). The influence of rhythmic context on schwa epenthesis and schwa deletion in Dutch. Language and Speech, 41(1), 87–108.
Google Scholar
Oostdijk, N. (2002). The design of the spoken Dutch corpus. In P. Peters, P. Collins, & A. Smith (Eds.), New frontiers of corpus research (pp. 105–112). Amsterdam: Rodopi.
PAROLE lexicon. (2005). [http://ww2.tst.inl.nl].
Pearce, D. (2001). Developing the ETSI Aurora advanced distributed speech recognition front-end & what next? In Proceedings of the IEEE workshop on automatic speech recognition and understanding (ASRU), Madonna di Campiglio Trento, Italy, pp. 131–134.
Quazza, S., & van den Heuvel, H. (2000). Lexicon development for speech and language processing. In F. Van Eynde & D. Gibbon (Eds.), Lexicon development for speech and language processing (pp. 207–233). Dordrecht: Kluwer Academic Publishers.
Referentiebestand Nederlands (RBN). (2005). [http://ww2.tst.inl.nl].
Saraçlar, M., Nock, H., & Khudanpur, S. (2000). Pronunciation modeling by sharing Gaussian densities across phonetic models. Computer Speech and Language, 14, 137–160.
Article Google Scholar
Shriberg, L. D., Kwiatkowski, J., & Hoffman, K. (1984). A procedure for phonetic transcription by consensus. Journal of Speech and Hearing Research, 27, 456–465.
Google Scholar
Spoken Dutch Corpus – Het Project Corpus Gesproken Nederlands. (2005). [http://lands.let.kun.nl/cgn/ehome.htm].
Young, S., Evermann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., & Woodland, P. (2001). The HTK book (for HTK version 3.1). Cambridge University Engineering Department.

Download references

Acknowledgements

The work of Christophe Van Bael was funded by the Speech Technology Foundation (Stichting Spraaktechnologie, Utrecht, The Netherlands). The authors would like to thank Louis Pols, various colleagues at the Department of Language and Speech (now CLST) and three anonymous reviewers for their comments on previous versions of this paper.

Author information

Authors and Affiliations

Centre for Language and Speech Technology, Radboud University Nijmegen, P.O. Box 9103, Nijmegen, 6500 HD, The Netherlands
Christophe Van Bael, Henk van den Heuvel & Helmer Strik

Authors

Christophe Van Bael
View author publications
You can also search for this author in PubMed Google Scholar
Henk van den Heuvel
View author publications
You can also search for this author in PubMed Google Scholar
Helmer Strik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christophe Van Bael.

Appendices

Appendix 1: Feature matrix used to align two phonetic transcriptions of speech (Align)

Table 5 (a) Articulatory feature values for consonants

Full size table

Appendix 2: Phone mapping 46 CGN phone set to 39 phone set

Class	Example	CGN-symbol	Can/MPT symbol(s)
Plosives	put	p	p
	bad	b	b
	tak	t	t
	dak	d	d
	kat	k	k
	goal	g	k
Fricatives	fiets	f	f
	vat	v	v
	sap	s	s
	zat	z	z
	sjaal	S	S
	ravage	Z	z + j
	licht	x	x
	regen	G	G
	geheel	h	h
Sonorants	lang	N	N
	mat	m	m
	nat	n	n
	oranje	J	n + j
	lat	l	l
	rat	r	r
	wat	w	w
	jas	j	j
Short vowels	lip	I	I
	leg	E	E
	lat	A	A
	bom	O	O
	put	Y	Y
Long vowels	liep	i	i
	buur	y	y
	leeg	e	e
	deuk	2	@+
	laat	a	a
	boom	o	o
	boek	u	u
Schwa	gelijk	@	@
Diphthongs	wijs	E+	E+
	huis	Y+	Y+
	koud	A+	A+
Loan vowels	scène	E:	E
	freule	Y:	Y
	zone	O:	O
Nasalised vowels	vaccin	E∼	E
	croissant	A∼	A
	congé	O∼	O
	parfum	Y∼	Y
Long silence			sil
Optional short silence			sp

Rights and permissions

Reprints and permissions

About this article

Cite this article

Van Bael, C., van den Heuvel, H. & Strik, H. Validation of phonetic transcriptions in the context of automatic speech recognition. Lang Resources & Evaluation 41, 129–146 (2007). https://doi.org/10.1007/s10579-007-9033-9

Download citation

Received: 22 February 2007
Accepted: 27 April 2007
Published: 17 July 2007
Issue Date: May 2007
DOI: https://doi.org/10.1007/s10579-007-9033-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Validation of phonetic transcriptions in the context of automatic speech recognition

Abstract

Access this article

Similar content being viewed by others

Prompt Engineering in Large Language Models

A comprehensive survey on automatic speech recognition using neural networks

Machine translation systems and quality assessment: a systematic review

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Feature matrix used to align two phonetic transcriptions of speech (Align)

Appendix 2: Phone mapping 46 CGN phone set to 39 phone set

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Validation of phonetic transcriptions in the context of automatic speech recognition

Abstract

Access this article

Similar content being viewed by others

Prompt Engineering in Large Language Models

A comprehensive survey on automatic speech recognition using neural networks

Machine translation systems and quality assessment: a systematic review

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Feature matrix used to align two phonetic transcriptions of speech (Align)

Appendix 2: Phone mapping 46 CGN phone set to 39 phone set

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation