Text-to-speech alignment is the alignment of a textual transcript to an audio stream. The computed synchronization data is a mapping from words in the text transcript to temporal intervals in the audio. This alignment provides a basic tool that facilitates many applications and has wide-spread general applicability.
KeywordsMultimedia Application Alignment System Edit Modeling Audio Frame Speech Audio
Unable to display preview. Download preview PDF.