Using Forced Alignment for Phonetics Research

Yuan, Jiahong; Lai, Wei; Cieri, Christopher; Liberman, Mark

doi:10.1007/978-3-031-38913-9_17

Jiahong Yuan⁵,
Wei Lai⁶,
Christopher Cieri⁷ &
…
Mark Liberman⁷

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 49))

118 Accesses
1 Citations

Abstract

Forced alignment has been at the core of speech recognition technology since the 1970s, and it was first used in phonetics research in the 1990s. Progress in digital multimedia, networking, and mass storage has created enormous and growing volumes of transcribed speech, which forced alignment can turn into vast phonetic databases. However, speech science has so far taken relatively little advantage of this opportunity, because it requires tools and methods that are now difficult for most speech researchers to access. Moreover, these tools have not been completely developed and tested for many applications. These technologies are leading the study of human speech into a revolutionary new era—a movement from the study of small, private, and mostly artificial datasets to the analysis of published collections of natural speech that are thousands or even millions of times larger. In this chapter, we will illustrate some of the ways that forced alignment can be used as a tool in speech science and discuss directions for improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67:1–48.
Google Scholar
Cho, Taehong, and Peter Ladefoged. 1999. Variation and universals in VOT: Evidence from 18 languages. Journal of Phonetics 27:207–229.
Google Scholar
Cucchiarini, Catia. 1993. Phonetic transcription: A methodological and empirical study, Ph.D. thesis. University of Nijmegen, Netherlands.
Google Scholar
Evanini, Keelan, Stephen Isard, and Mark Liberman 2009. Automatic formant extraction for sociolinguistic analysis of large corpora. Interspeech 2009:1655–1658.
Google Scholar
Flege, James Emil. 1991. Age of learning affects the authenticity of voice-onset time (VOT) in stop consonants produced in a second language. Journal of the Acoustical Society of America 89:395–411.
Google Scholar
Fox, Michelle Annette Minnick. 2006. Usage-based effects in Latin American Spanish syllable-final/s/lenition. Doctoral dissertation. University of Pennsylvania, Philadelphia, PA.
Google Scholar
Garofolo, John S., Lori F. Lamel, William M. Fisher, Jonathan G. Fiscus, David S. Pallett, Nancy L. Dahlgren, and Victor Zue. 1993. TIMIT acoustic-phonetic continuous speech corpus LDC93S1. Philadelphia, PA: Linguistic Data Consortium. Available at https://catalog.ldc.upenn.edu/LDC93S1. Accessed 2 April 2019.
Godfrey, John J., Edward C. Holliman, and Jane McDaniel. 1992. SWITCHBOARD: Telephone speech corpus for research and development. In Proceedings of ICASSP 1992, 517–520. San Francisco, California. Available at https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=225858. Accessed 2 April 2019.
Hosom, John-Paul. 2000. Automatic time alignment of phonemes using acoustic-phonetic information. Ph.D. thesis. Oregon Graduate Institute of Science and Technology, Beaverton, OR.
Google Scholar
Hosom, John-Paul. 2009. Speaker-independent phoneme alignment using transition-dependent states. Speech Communication 51:352–368.
Google Scholar
Huang, Shudong, Jing Liu, Xuling Wu, Lei Wu, Yongmin Yan, and Zhoakai Qin. 1997. Mandarin broadcast news speech (HUB4-NE) LDC98S73. Philadelphia, PA: Linguistic Data Consortium. Available at https://catalog.ldc.upenn.edu/LDC98S73. Accessed 2 April 2019.
Jelinek, Frederick. 1976. Continuous speech recognition by statistical methods. In Proceedings of the IEEE 64(4):532–556. Available at https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1454428. Accessed 2 April 2019.
Johnson, Keith. 2004. Massive reduction in conversational American English. In Spontaneous speech: Data and analysis, ed. Kiyoko Yoneyama and Kikuo Maekawa. In Proceedings of the 1st Session of the 10th International Symposium, 29–54. Tokyo, Japan. Available at http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.142.5012&rep=rep1&type=pdf. Accessed 2 April 2019.
Labov, William, Ingrid Rosenfelder, and Josef Fruehwald. 2013. One hundred years of sound change in Philadelphia: Linear incrementation, reversal, and reanalysis. Language 89:30–65.
Google Scholar
Leung, Hong C., and Victor W. Zue. 1984. A procedure for automatic alignment of phonetic transcription with continuous speech. In Proceedings of ICASSP 1984, 73–76. San Diego, California. Available at https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1172426. Accessed 2 April 2019.
Lisker, Leigh, and Arthur Abramson. 1964. A cross-language study of voicing in initial stops: Acoustical measurements. Word 20:384–422.
Google Scholar
Sonderegger, Morgan, and Joseph Keshet. 2012. Automatic measurement of voice onset time using discriminative structured prediction. Journal of the Acoustical Society of America 132:3965–3979.
Google Scholar
Stevens, Kenneth N. 2002. Toward a model for lexical access based on acoustic landmarks and distinctive features. Journal of the Acoustical Society of America 111:1872–1891.
Google Scholar
Wieling, Martijn, Jack Grieve, Gosse Bouma, Josef Fruehwald, John Coleman, and Mark Liberman. 2016. Variation and change in the use of hesitation markers in Germanic languages. Language Dynamics and Change 199–234.
Google Scholar
Wightman, Colin W., and David T. Talkin, D. 1997. The aligner: Text to speech alignment using Markov Models. In Progress in speech synthesis, ed. Jan P. H. van Santen, Richard W. Sproat, Joseph P. Olive, and Julia Hirschberg, 313–323. New York: Springer Verlag.
Google Scholar
Witt, Silke M., and Steve J. Young. 2000. Phone-level pronunciation scoring and assessment for interactive language learning. Speech Communication 30:95–108.
Google Scholar
Young, Steve J., J. J. Odell, and Philip C. Woodland. 1994. Tree-based state tying for high accuracy acoustic modeling. In Proceedings of the ARPA Workshop on Human Language Technology, 307–312. Plainsboro, New Jersey. Available at https://www.aclweb.org/anthology/H94-1062. Accessed 2 April 2019.
Yuan, Jiahong, and Mark Liberman. 2015. Investigating consonant reduction in Mandarin Chinese with improved forced alignment. In Proceedings of Interspeech 2015, 2675–2678. Dresden, Germany. Available at http://languagelog.ldc.upenn.edu/myl/MandarinConsonantReduction.pdf. Accessed 2 April 2019.
Yuan, Jiahong, Neville Ryant, Mark Liberman, Andreas Stolcke, Vikramjit Mitra, and Wen Wang. 2013. Automatic phonetic segmentation using boundary models. In Proceedings of Interspeech 2013: 2306–2310. Lyon, France. Available at https://www.researchgate.net/publication/286363369_Automatic_phonetic_segmentation_using_boundary_models. Accessed 2 April 2019.
Yuan, Jiahong, Neville Ryant, and Mark Liberman. 2014. Automatic phonetic segmentation in Mandarin Chinese: Boundary models, glottal features and tone. In Proceedings of ICASSP 2014: 2539–2543. Florence, Italy. Available at https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6854058. Accessed 2 April 2019.
Yuan, Jiahong, Xiaoying Xu, Wei Lai, and Mark Liberman. 2016. Pauses and pause fillers in Mandarin monologue speech: The effects of sex and proficiency. In Proceedings of Speech Prosody 2016, 1167–1170. Boston, Massachusetts. Available at https://pdfs.semanticscholar.org/7f15/be1f4954ec9f9e7600264666bcae2119e5bb.pdf. Accessed 2 April 2019.

Download references

Author information

Authors and Affiliations

Interdisciplinary Research Center for Linguistic Sciences, School of Humanities and Social Sciences, University of Science and Technology of China, Hefei, China
Jiahong Yuan
Department of Psychology and Human Development, Vanderbilt University, Nashville, TN, USA
Wei Lai
Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, USA
Christopher Cieri & Mark Liberman

Authors

Jiahong Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Wei Lai
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Cieri
View author publications
You can also search for this author in PubMed Google Scholar
Mark Liberman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiahong Yuan .

Editor information

Editors and Affiliations

Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Kowloon, Hong Kong
Chu-Ren Huang
Graduate Institute of Linguistics, National Taiwan University, Taipei, Taiwan
Shu-Kai Hsieh
School of Electronic Information and Artificial Intelligence, Leshan Normal University, Leshan City, Sichuan, China
Peng Jin

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Yuan, J., Lai, W., Cieri, C., Liberman, M. (2023). Using Forced Alignment for Phonetics Research. In: Huang, CR., Hsieh, SK., Jin, P. (eds) Chinese Language Resources. Text, Speech and Language Technology, vol 49. Springer, Cham. https://doi.org/10.1007/978-3-031-38913-9_17

Download citation

DOI: https://doi.org/10.1007/978-3-031-38913-9_17
Published: 19 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-38912-2
Online ISBN: 978-3-031-38913-9
eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics