Towards an unrestricted domain TTS system for African tone languages
- 85 Downloads
In this paper we discuss the procedural problems, issues and challenges involved in developing a generic speech synthesizer for African tone languages. We base our development methodology on the “MultiSyn” unit-selection approach, supported by Festival Text-To-Speech (TTS) Toolkit for Ibibio, a Lower Cross subgroup of the (New) Benue-Congo language family widely spoken in the southeastern region of Nigeria. We present in a chronological order, the several levels of infrastructural and linguistic problems as well as challenges identified in the Local Language Speech Technology Initiative (LLSTI) during the development process (from the corpus preparation and refinement stage to the integration and synthesis stage). We provide solutions to most of these challenges and point to possible outlook for further refinement. The evaluation of the initial prototype shows that the synthesis system will be useful to non-literate communities and a wide spectrum of applications.
KeywordsTTS HLT Multi-unit selection Concatenative synthesis Terraced tone modeling
Unable to display preview. Download preview PDF.
- Black, A., & Taylor, P. (1997). Festival speech synthesis system: system documentation (1.1.1). Human Communication Research Centre, Technical report. HCRC/TR-83. Google Scholar
- Black, A., Taylor, P., & Caley, R. (1999). The festival speech synthesis system. System Documentation (1.4.0), www.cstr.ed.ac.uk/projects/festival/manual/.
- Clark, R., Richmond, K., & King, S. (2004). Festival 2: build your own general purpose unit selection speech synthesizer. In 5th ISCA speech synthesis work shop, Pittsburgh, PA (pp. 173–178). Google Scholar
- Dutoit, T. (1999). An introduction to text-to-speech synthesis. Berlin: Springer. Google Scholar
- Essien, O. (1990). A grammar of the Ibibio language. Ibadan: University Press Limited. Google Scholar
- Gibbon, D. (1981). A new look at intonation syntax and semantics. In A. James & P. Westney (Eds.), New linguistics impulses in foreign language teaching. Tübingen: Gunter Narr Google Scholar
- Gibbon, D. (1987). Finite state processing of tone systems. In Proceedings of the European chapter of ACL, Copenhagen (pp. 291–297). Google Scholar
- Gibbon, D. (2001). Finite state prosodic analysis of African corpus resources. In 7th EUROSPEECH conference, Aalborg, Denmark (pp. 83–86). Google Scholar
- Gibbon, D., & Urua, E. (2006). Computational morphotonology in Niger-Congo languages. In Proceedings of speech prosody 2006, Dresden, Germany. Google Scholar
- Gibbon, D., Urua, E., & Ekpenyong, M. (2004). Data creation for Ibibio speech synthesis. LLSTI Progress Report, Third Partners Workshop, Lisbon. Google Scholar
- Gibbon, D., Urua, E.-A., & Ekpenyong, M. (2006). Problems and solutions in African tone language text-to-speech. In MULTILING 2006 ISCA Tutorial and Research Workshop (ITRW), Stallenbosch, South Africa. Google Scholar
- Gut, U., & Gibbon, D. (Eds.) (2002). Typology of African prosodic systems. Bielefeld occasional papers on typology 1. Universitaet Bielefeld, Germany. Google Scholar
- Hamza, W., Bakis, R., Shuang, Z., & Zen, H. (2005). On building a concatenative speech synthesis system for blizzard challenge speech databases. In INTERSPEECH 2005, Lisbon. Google Scholar
- Hiroya, F. (1988). A note on the physiological and physical basis for the phrase and accent components in the voice fundamental frequency contour. In O. Fugimura (Ed.), Vocal physiology: voice production, mechanisms and functions (pp. 347–355). New York: Raven Press. Google Scholar
- Hunt, A., & Black, A. (1996). Unit selection in a concatenative speech synthesis system using a large speech database. In Proceedings of ICASSP, 1, Atlanta, Georgia (pp. 373–376). Google Scholar
- Kaufman, E. (1985). Ibibio dictionary. Cross River State University and Ibibio Language Board, Nigeria, in cooperation with African Studies Centre, Leiden, The Netherlands. Google Scholar
- Klabbers, E., Stoeber, K., Veldhuis, R., & Breuer, S. (2001). Speech synthesis development made easy: the Bonn open synthesis system. In Proceedings of Eurospeech, Aalborg (pp. 521–524). Google Scholar
- Martin, J. (1998). A two-level take on Tianjin tone. In G.-J. Kruijff & I. Kruijff-Korbayová (Eds.), Proceedings of the third ESSLLI student session, 10th European summer school on logic, language and information, Saarbruecken, Germany (pp. 162–174). Google Scholar
- Mizuno, H., Asano, H., Isoyai, M., Hasebe, M., & Abe, M. (2004). Text-to-speech synthesis technology using corpus-based approach. NTT Technical Review (Vol. 2, No. 3, pp. 70–75). Google Scholar
- Olive, J. (1977). Rule synthesis of speech from diadic units. In Proceedings of ICASSP-77 (pp. 568–570). Google Scholar
- Pierrehumbert, J. (1980). The phonology and phonetics of English intonation. Diss. Massachusetts Institute of Technology. Google Scholar
- Schroeter, J. (2006). Text-to-speech (TTS) synthesis. In R. Dorf (Ed.), Circuits, signals and speech and language processing. http://www.research.att.com/~ttsweb/tts/papers/2005_EEHandbook/tts.pdf.
- Shalonova, K., & Tucker, R. (2004). Issues in porting TTS to minority languages. In SALTMIL workshop on minority languages, LREC 2004, Lisbon. Google Scholar
- Talikdar, P. (2004). Optimal text selection module version 0.2. LLSTI Progress Report, Third Partners Workshop, Lisbon. Google Scholar
- Taylor, P., Black, A., & Caley, R. (1998). The architecture of the festival speech synthesis system. In 3rd ESCA workshop on speech synthesis (pp. 147–151), Jenolan Caves, Australia. Google Scholar
- ‘t Hart, J., & Cohen, A. (1973). Intonation by rule, a perceptual quest. Journal of Phonetics, 1, 309–327. Google Scholar
- Tucker, R., & Shalonova, K. (2005). Supporting the creation of TTS for local language voice information systems. In INTERSPEECH-2005 (pp. 453–456). Google Scholar
- Urua, E. (2000). Ibibio phonetics and phonology. Cape Town: Centre for Advanced Studies of African Society. Google Scholar