Syllable Based Concatenative Synthesis for Text to Speech Conversion

Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 33)


Speech Synthesis functions as a medium which converts text into speech. Speech Recognition and Speech Synthesis plays a vital role in Human-Machine Interaction. Synthesized speeches are extracted from concatenating the pieces of pre-recorded speech utterances from the database. The proposed work converts the written text into a syllables (syllable text representation) using rule based approach and subsequently it converts the syllable representation to modified syllable waveform clips that can be combined together to produce as sound. Syllabic transcription attempts to describe the individual variations that occur between speakers of a dialect or language. Syllable based concatenative synthesis aims to record the syllables that a speaker uses rather than the actual spoken variants of those syllables that are produced when a speaker converse a word. The Concatenative Speech Synthesis methods provide highly understandable speech utterance.


Concatenative speech synthesis Concatenate wave segments Syllable Syllable transcription Speech processing Speech synthesis (SS) Text normalization Text to speech (TTS) conversion Waveform concatenation 


  1. 1.
    Schuller, B., Zhang, Z., Weninger, F., Burkhardt, F.: Synthesized speech for model training in cross-corpus recognition of human emotion. Int. J. Speech Technol. 15, 313–323 (2012)CrossRefGoogle Scholar
  2. 2.
    Campbell, N.: Developments in corpus-based speech synthesis: approaching natural conversational speech. IEICE Trans. 87, 497–500 (2004)Google Scholar
  3. 3.
    Campbell, N.: Conversational speech synthesis and the need for some laughter. IEEE Trans. Audio Speech Lang. Process. 17(4), 1171–1179 (2006)CrossRefGoogle Scholar
  4. 4.
    Sreenivasa Rao, K., Yegnanarayana, B.: Intonation modeling for Indian languages. Comput. Speech Lang. 23, 240–256 (2009)CrossRefGoogle Scholar
  5. 5.
    Vowel: Online etymology dictionary. Accessed 21 Nov 2013
  6. 6.
    Atal, B.S., Hanauer, S.L.: Speech analysis and synthesis by linear prediction of the speech wave. J. Acoust. Soc. Am. 50, 637–655 (1971)CrossRefGoogle Scholar
  7. 7.
    Badin, P., Fant, G.: Notes on vocal tract computation. Techical Report, STL-QPSR (1984)Google Scholar
  8. 8.
    Carlson, R., Sigvardson, T., Sjolander, A.: Data-driven formant synthesis. Technical Report, TMH-QPSR (2008)Google Scholar
  9. 9.
    Banks, G.F., Hoaglin, L. W.: An experimental study of duration characteristics of voice during the expression of emotion. Speech Monogr. 8, 85–90 (1941)Google Scholar
  10. 10.
    Clark, R.A.J., Richmond, K., King, S.: Multisyn: opendomain unit selection for the festival speech synthesis system. Speech Commun. 49, 317–330 (2007)Google Scholar
  11. 11.
    Courbon, J.L., Emerald, F.: A text to speech machine by synthesis from diphones. In: Proceeding of ICASSP. PTR, Upper Saddle River (2002)Google Scholar
  12. 12.
    Kim, J.K., Hahn, H.S., Bae, M.J.: On a speech multiple system implementation for speech synthesis. Wireless Pers. Commun. 49, 533–543 (2009)Google Scholar
  13. 13.
    Saraswathi, S., Vishalakshy, R.: Design of multilingual speech synthesis system. Intell. Inform. Manage. 2, 58–64 (2010)Google Scholar
  14. 14.
    Ahmed, M., Nisar, S.: Text-to-speech synthesis using phoneme concatenation. Int. J. Sci. Eng. Technol. 3(2), 193–197 (2014)Google Scholar
  15. 15.
    Campell, N., Hamza, W., Hog, H., Tao, J.: Editorial special section on expressive speech synthesis. IEEE Trans. Audio Speech Lang. Process. 14, 1097–1098 (2006)CrossRefGoogle Scholar

Copyright information

© Springer India 2015

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringAnnamalai UniversityChidambaramIndia

Personalised recommendations