Abstract
Forced alignment has been at the core of speech recognition technology since the 1970s, and it was first used in phonetics research in the 1990s. Progress in digital multimedia, networking, and mass storage has created enormous and growing volumes of transcribed speech, which forced alignment can turn into vast phonetic databases. However, speech science has so far taken relatively little advantage of this opportunity, because it requires tools and methods that are now difficult for most speech researchers to access. Moreover, these tools have not been completely developed and tested for many applications. These technologies are leading the study of human speech into a revolutionary new era—a movement from the study of small, private, and mostly artificial datasets to the analysis of published collections of natural speech that are thousands or even millions of times larger. In this chapter, we will illustrate some of the ways that forced alignment can be used as a tool in speech science and discuss directions for improvement.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67:1–48.
Cho, Taehong, and Peter Ladefoged. 1999. Variation and universals in VOT: Evidence from 18 languages. Journal of Phonetics 27:207–229.
Cucchiarini, Catia. 1993. Phonetic transcription: A methodological and empirical study, Ph.D. thesis. University of Nijmegen, Netherlands.
Evanini, Keelan, Stephen Isard, and Mark Liberman 2009. Automatic formant extraction for sociolinguistic analysis of large corpora. Interspeech 2009:1655–1658.
Flege, James Emil. 1991. Age of learning affects the authenticity of voice-onset time (VOT) in stop consonants produced in a second language. Journal of the Acoustical Society of America 89:395–411.
Fox, Michelle Annette Minnick. 2006. Usage-based effects in Latin American Spanish syllable-final/s/lenition. Doctoral dissertation. University of Pennsylvania, Philadelphia, PA.
Garofolo, John S., Lori F. Lamel, William M. Fisher, Jonathan G. Fiscus, David S. Pallett, Nancy L. Dahlgren, and Victor Zue. 1993. TIMIT acoustic-phonetic continuous speech corpus LDC93S1. Philadelphia, PA: Linguistic Data Consortium. Available at https://catalog.ldc.upenn.edu/LDC93S1. Accessed 2 April 2019.
Godfrey, John J., Edward C. Holliman, and Jane McDaniel. 1992. SWITCHBOARD: Telephone speech corpus for research and development. In Proceedings of ICASSP 1992, 517–520. San Francisco, California. Available at https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=225858. Accessed 2 April 2019.
Hosom, John-Paul. 2000. Automatic time alignment of phonemes using acoustic-phonetic information. Ph.D. thesis. Oregon Graduate Institute of Science and Technology, Beaverton, OR.
Hosom, John-Paul. 2009. Speaker-independent phoneme alignment using transition-dependent states. Speech Communication 51:352–368.
Huang, Shudong, Jing Liu, Xuling Wu, Lei Wu, Yongmin Yan, and Zhoakai Qin. 1997. Mandarin broadcast news speech (HUB4-NE) LDC98S73. Philadelphia, PA: Linguistic Data Consortium. Available at https://catalog.ldc.upenn.edu/LDC98S73. Accessed 2 April 2019.
Jelinek, Frederick. 1976. Continuous speech recognition by statistical methods. In Proceedings of the IEEE 64(4):532–556. Available at https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1454428. Accessed 2 April 2019.
Johnson, Keith. 2004. Massive reduction in conversational American English. In Spontaneous speech: Data and analysis, ed. Kiyoko Yoneyama and Kikuo Maekawa. In Proceedings of the 1st Session of the 10th International Symposium, 29–54. Tokyo, Japan. Available at http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.142.5012&rep=rep1&type=pdf. Accessed 2 April 2019.
Labov, William, Ingrid Rosenfelder, and Josef Fruehwald. 2013. One hundred years of sound change in Philadelphia: Linear incrementation, reversal, and reanalysis. Language 89:30–65.
Leung, Hong C., and Victor W. Zue. 1984. A procedure for automatic alignment of phonetic transcription with continuous speech. In Proceedings of ICASSP 1984, 73–76. San Diego, California. Available at https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1172426. Accessed 2 April 2019.
Lisker, Leigh, and Arthur Abramson. 1964. A cross-language study of voicing in initial stops: Acoustical measurements. Word 20:384–422.
Sonderegger, Morgan, and Joseph Keshet. 2012. Automatic measurement of voice onset time using discriminative structured prediction. Journal of the Acoustical Society of America 132:3965–3979.
Stevens, Kenneth N. 2002. Toward a model for lexical access based on acoustic landmarks and distinctive features. Journal of the Acoustical Society of America 111:1872–1891.
Wieling, Martijn, Jack Grieve, Gosse Bouma, Josef Fruehwald, John Coleman, and Mark Liberman. 2016. Variation and change in the use of hesitation markers in Germanic languages. Language Dynamics and Change 199–234.
Wightman, Colin W., and David T. Talkin, D. 1997. The aligner: Text to speech alignment using Markov Models. In Progress in speech synthesis, ed. Jan P. H. van Santen, Richard W. Sproat, Joseph P. Olive, and Julia Hirschberg, 313–323. New York: Springer Verlag.
Witt, Silke M., and Steve J. Young. 2000. Phone-level pronunciation scoring and assessment for interactive language learning. Speech Communication 30:95–108.
Young, Steve J., J. J. Odell, and Philip C. Woodland. 1994. Tree-based state tying for high accuracy acoustic modeling. In Proceedings of the ARPA Workshop on Human Language Technology, 307–312. Plainsboro, New Jersey. Available at https://www.aclweb.org/anthology/H94-1062. Accessed 2 April 2019.
Yuan, Jiahong, and Mark Liberman. 2015. Investigating consonant reduction in Mandarin Chinese with improved forced alignment. In Proceedings of Interspeech 2015, 2675–2678. Dresden, Germany. Available at http://languagelog.ldc.upenn.edu/myl/MandarinConsonantReduction.pdf. Accessed 2 April 2019.
Yuan, Jiahong, Neville Ryant, Mark Liberman, Andreas Stolcke, Vikramjit Mitra, and Wen Wang. 2013. Automatic phonetic segmentation using boundary models. In Proceedings of Interspeech 2013: 2306–2310. Lyon, France. Available at https://www.researchgate.net/publication/286363369_Automatic_phonetic_segmentation_using_boundary_models. Accessed 2 April 2019.
Yuan, Jiahong, Neville Ryant, and Mark Liberman. 2014. Automatic phonetic segmentation in Mandarin Chinese: Boundary models, glottal features and tone. In Proceedings of ICASSP 2014: 2539–2543. Florence, Italy. Available at https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6854058. Accessed 2 April 2019.
Yuan, Jiahong, Xiaoying Xu, Wei Lai, and Mark Liberman. 2016. Pauses and pause fillers in Mandarin monologue speech: The effects of sex and proficiency. In Proceedings of Speech Prosody 2016, 1167–1170. Boston, Massachusetts. Available at https://pdfs.semanticscholar.org/7f15/be1f4954ec9f9e7600264666bcae2119e5bb.pdf. Accessed 2 April 2019.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Yuan, J., Lai, W., Cieri, C., Liberman, M. (2023). Using Forced Alignment for Phonetics Research. In: Huang, CR., Hsieh, SK., Jin, P. (eds) Chinese Language Resources. Text, Speech and Language Technology, vol 49. Springer, Cham. https://doi.org/10.1007/978-3-031-38913-9_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-38913-9_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38912-2
Online ISBN: 978-3-031-38913-9
eBook Packages: EducationEducation (R0)