Skip to main content
Log in

Development and application of an accurate and flexible automatic aligner

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Large databases are useful tools for speech technology research. Their usefulness is greatly enhanced if the data is annotated with time aligned labels. This is expensive and time consuming and has lead to the investigation and development of automatic aligners. This paper reports on an automatic aligner developed initially to solve the problem of annotating a large database within a set period of time. While developing the aligner, we investigated the importance of the models, the use of manual labels to bootstrap the system, and the role of the dictionary in the effectiveness of the aligner, and found that each had a contribution to make. The aligner produced was tested on unseen data to gauge its accuracy before being applied as a tool to annotation of a large amount of data. The aligner was developed in a way that facilitates its use in other applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Boeffard, O., Miclet, L., and White, S. (1992). Automatic generation of optimised unit dictionaries from text to speech synthesis.ICSLP '92, pp. 1211–1214.

  • Boeffard, O., Cherbonnel, Emerard, F., and White, S. (1993). Automatic segmentation and quality evaluation of speech unit inventories for concatenation-based multi-lingual psola test to speech systems.Eurospeech '93, pp. 1449–1452.

  • Cleirigh, C. and Vonwiller, J. (1994). Accent identification with a view to assisting recognition.ICSLP'94, pp. 375–378.

  • Croot, K., Fletcher, J., and Harrington, J. (1992). Levels of Segmentation and Labelling in the Australian National Database of Spoken Language.SST'92, pp. 86–90.

  • Grayden, D. and Scordilis, M. (1994a). Phonemic segmentation of Fluent Speech.ICASSP'94, pp. I-73–I-76.

  • Grayden, D. and Scordilis, M. (1994b). A hierarchical Approach to Phoneme Recognition of Fluent Speech.SST'94, pp. 473–478.

  • Kumpf, K. and King, R. (1996). Automatic accent classification of foreign accented Australian English speech.ICSLP'96.

  • Ljolje, A., Hirschberg, J., and van Santen, J. (1994). Automatic segmentation for concatenative inventory selection.Proceedings of the 2nd ESCA/IEEE Workshop on Speech Synthesis.

  • Millar, J., Dermody, P., Harrington, J., and Vonwiller, J. (1990a). A national database of spoken language: Concept design and implementation.Proceedings on Spoken Language Processing (ICSLP-90), Kobe, Japan, pp. 1281–1284.

  • Millar, B., Dermody, P., Harrington, J., and Vonwiller, J. (1990b). A national cluster of spoken language databases for Australia.Proceedings of the Speech Science and Technology Conference, Australia (SST-90), pp. 440–445.

  • Millar, B., Vonwiller, J., Harrington, J., and Dermody, P. (1994). The Australian national database of spoken language.ICASSP'94, 2:67–100.

    Google Scholar 

  • Talkin, D. and Wightman (1994). The aligner: Test to speech alignment using Markov Models and a pronunciation dictionary.Proceedings of the 2nd ESCA/IEEE Workshop on Speech Synthesis.

  • van Hoeckel, C. The reliability of manual labelling of continuous speech.Proceedings of the ESCA Workshop on Speech Input/Output Assessment an Speech Databases, pp. 5.5.1–5.5.4.

  • Vonwiller, J., Rogers, I., Cleirigh, C., and Lewis, W. (1996). Speaker and material selection for the Australian national database of spoken language.Journal of Quantitative Linguistics, 2(3): 177–211.

    Google Scholar 

  • Xi, Xiao, Nandagopal, D., and Johnson, D. (1988). On the application of AR model in segmenting isolated word speech signals.Proceedings of SST'88, pp. 170–175.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vonwiller, J., Cleirigh, C., Garsden, H. et al. Development and application of an accurate and flexible automatic aligner. Int J Speech Technol 1, 151–160 (1997). https://doi.org/10.1007/BF02277196

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02277196

Keywords

Navigation