Skip to main content

Using Forced Alignment for Phonetics Research

  • Chapter
  • First Online:
Chinese Language Resources

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 49))

Abstract

Forced alignment has been at the core of speech recognition technology since the 1970s, and it was first used in phonetics research in the 1990s. Progress in digital multimedia, networking, and mass storage has created enormous and growing volumes of transcribed speech, which forced alignment can turn into vast phonetic databases. However, speech science has so far taken relatively little advantage of this opportunity, because it requires tools and methods that are now difficult for most speech researchers to access. Moreover, these tools have not been completely developed and tested for many applications. These technologies are leading the study of human speech into a revolutionary new era—a movement from the study of small, private, and mostly artificial datasets to the analysis of published collections of natural speech that are thousands or even millions of times larger. In this chapter, we will illustrate some of the ways that forced alignment can be used as a tool in speech science and discuss directions for improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67:1–48.

    Google Scholar 

  • Cho, Taehong, and Peter Ladefoged. 1999. Variation and universals in VOT: Evidence from 18 languages. Journal of Phonetics 27:207–229.

    Google Scholar 

  • Cucchiarini, Catia. 1993. Phonetic transcription: A methodological and empirical study, Ph.D. thesis. University of Nijmegen, Netherlands.

    Google Scholar 

  • Evanini, Keelan, Stephen Isard, and Mark Liberman 2009. Automatic formant extraction for sociolinguistic analysis of large corpora. Interspeech 2009:1655–1658.

    Google Scholar 

  • Flege, James Emil. 1991. Age of learning affects the authenticity of voice-onset time (VOT) in stop consonants produced in a second language. Journal of the Acoustical Society of America 89:395–411.

    Google Scholar 

  • Fox, Michelle Annette Minnick. 2006. Usage-based effects in Latin American Spanish syllable-final/s/lenition. Doctoral dissertation. University of Pennsylvania, Philadelphia, PA.

    Google Scholar 

  • Garofolo, John S., Lori F. Lamel, William M. Fisher, Jonathan G. Fiscus, David S. Pallett, Nancy L. Dahlgren, and Victor Zue. 1993. TIMIT acoustic-phonetic continuous speech corpus LDC93S1. Philadelphia, PA: Linguistic Data Consortium. Available at https://catalog.ldc.upenn.edu/LDC93S1. Accessed 2 April 2019.

  • Godfrey, John J., Edward C. Holliman, and Jane McDaniel. 1992. SWITCHBOARD: Telephone speech corpus for research and development. In Proceedings of ICASSP 1992, 517–520. San Francisco, California. Available at https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=225858. Accessed 2 April 2019.

  • Hosom, John-Paul. 2000. Automatic time alignment of phonemes using acoustic-phonetic information. Ph.D. thesis. Oregon Graduate Institute of Science and Technology, Beaverton, OR.

    Google Scholar 

  • Hosom, John-Paul. 2009. Speaker-independent phoneme alignment using transition-dependent states. Speech Communication 51:352–368.

    Google Scholar 

  • Huang, Shudong, Jing Liu, Xuling Wu, Lei Wu, Yongmin Yan, and Zhoakai Qin. 1997. Mandarin broadcast news speech (HUB4-NE) LDC98S73. Philadelphia, PA: Linguistic Data Consortium. Available at https://catalog.ldc.upenn.edu/LDC98S73. Accessed 2 April 2019.

  • Jelinek, Frederick. 1976. Continuous speech recognition by statistical methods. In Proceedings of the IEEE 64(4):532–556. Available at https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1454428. Accessed 2 April 2019.

  • Johnson, Keith. 2004. Massive reduction in conversational American English. In Spontaneous speech: Data and analysis, ed. Kiyoko Yoneyama and Kikuo Maekawa. In Proceedings of the 1st Session of the 10th International Symposium, 29–54. Tokyo, Japan. Available at http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.142.5012&rep=rep1&type=pdf. Accessed 2 April 2019.

  • Labov, William, Ingrid Rosenfelder, and Josef Fruehwald. 2013. One hundred years of sound change in Philadelphia: Linear incrementation, reversal, and reanalysis. Language 89:30–65.

    Google Scholar 

  • Leung, Hong C., and Victor W. Zue. 1984. A procedure for automatic alignment of phonetic transcription with continuous speech. In Proceedings of ICASSP 1984, 73–76. San Diego, California. Available at https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1172426. Accessed 2 April 2019.

  • Lisker, Leigh, and Arthur Abramson. 1964. A cross-language study of voicing in initial stops: Acoustical measurements. Word 20:384–422.

    Google Scholar 

  • Sonderegger, Morgan, and Joseph Keshet. 2012. Automatic measurement of voice onset time using discriminative structured prediction. Journal of the Acoustical Society of America 132:3965–3979.

    Google Scholar 

  • Stevens, Kenneth N. 2002. Toward a model for lexical access based on acoustic landmarks and distinctive features. Journal of the Acoustical Society of America 111:1872–1891.

    Google Scholar 

  • Wieling, Martijn, Jack Grieve, Gosse Bouma, Josef Fruehwald, John Coleman, and Mark Liberman. 2016. Variation and change in the use of hesitation markers in Germanic languages. Language Dynamics and Change 199–234.

    Google Scholar 

  • Wightman, Colin W., and David T. Talkin, D. 1997. The aligner: Text to speech alignment using Markov Models. In Progress in speech synthesis, ed. Jan P. H. van Santen, Richard W. Sproat, Joseph P. Olive, and Julia Hirschberg, 313–323. New York: Springer Verlag.

    Google Scholar 

  • Witt, Silke M., and Steve J. Young. 2000. Phone-level pronunciation scoring and assessment for interactive language learning. Speech Communication 30:95–108.

    Google Scholar 

  • Young, Steve J., J. J. Odell, and Philip C. Woodland. 1994. Tree-based state tying for high accuracy acoustic modeling. In Proceedings of the ARPA Workshop on Human Language Technology, 307–312. Plainsboro, New Jersey. Available at https://www.aclweb.org/anthology/H94-1062. Accessed 2 April 2019.

  • Yuan, Jiahong, and Mark Liberman. 2015. Investigating consonant reduction in Mandarin Chinese with improved forced alignment. In Proceedings of Interspeech 2015, 2675–2678. Dresden, Germany. Available at http://languagelog.ldc.upenn.edu/myl/MandarinConsonantReduction.pdf. Accessed 2 April 2019.

  • Yuan, Jiahong, Neville Ryant, Mark Liberman, Andreas Stolcke, Vikramjit Mitra, and Wen Wang. 2013. Automatic phonetic segmentation using boundary models. In Proceedings of Interspeech 2013: 2306–2310. Lyon, France. Available at https://www.researchgate.net/publication/286363369_Automatic_phonetic_segmentation_using_boundary_models. Accessed 2 April 2019.

  • Yuan, Jiahong, Neville Ryant, and Mark Liberman. 2014. Automatic phonetic segmentation in Mandarin Chinese: Boundary models, glottal features and tone. In Proceedings of ICASSP 2014: 2539–2543. Florence, Italy. Available at https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6854058. Accessed 2 April 2019.

  • Yuan, Jiahong, Xiaoying Xu, Wei Lai, and Mark Liberman. 2016. Pauses and pause fillers in Mandarin monologue speech: The effects of sex and proficiency. In Proceedings of Speech Prosody 2016, 1167–1170. Boston, Massachusetts. Available at https://pdfs.semanticscholar.org/7f15/be1f4954ec9f9e7600264666bcae2119e5bb.pdf. Accessed 2 April 2019.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiahong Yuan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Yuan, J., Lai, W., Cieri, C., Liberman, M. (2023). Using Forced Alignment for Phonetics Research. In: Huang, CR., Hsieh, SK., Jin, P. (eds) Chinese Language Resources. Text, Speech and Language Technology, vol 49. Springer, Cham. https://doi.org/10.1007/978-3-031-38913-9_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-38913-9_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-38912-2

  • Online ISBN: 978-3-031-38913-9

  • eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics