Skip to main content
Log in

Tone modelling in Ibibio speech synthesis

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this paper, we investigate the contribution of tone in a Hidden Markov Model (HMM)-based speech synthesis of Ibibio (ISO 693-3: nic; Ethnologue: IBB), an under-resourced language. We review the language’s speech characteristics, required for building the front end components of the design and propose a finite state transducer (FST), useful for modelling the language’s tonetactics. The existing speech database of Ibibio is also studied and the quality of synthetic speech examined through a spectral analysis of voices obtained from two synthesis experiments, with and without tone feature labels. A confusion matrix classifying the results of a controlled listening test for both experiments is constructed and statistics comparing their performance quality presented. Results obtained revealed that synthesis systems with tone feature labels outperformed synthesis systems without tone feature labels, as more tone confusions were perceived by listeners in the latter.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. The [u] sound, however occurs in word finally in very few words based on dialectal (free) variation. Some dialects of Ibibio have the word, rat as ‘’ while others have it as ‘’. This is just one exception in the distribution of Ibibio central vowels.

References

  • Bao, Z. (1999). The structure of tone. New York: Oxford University Press.

    Google Scholar 

  • Barkat, M. S., & Gadallah, M. E.-S. (2010). The effect of speech features and HMM parameters on the quality of HMM based Arabic synthesis system. International Journal of Computer and Electrical Engineering, 2(2), 235–242.

    Article  Google Scholar 

  • Black, A. W., Zen, H., & Tokuda, K. (2007). Statistical parametric speech synthesis. In Proc. of IEEE international conference on acoustic, speech and signal processing (ICASSP) (pp. 1229–1232).

    Google Scholar 

  • Chen, M. Y. (2000). Tone sandhi: patterns across Chinese dialects, Cambridge, England: CUP Halle, M. & Stevens, K. (1971). A note on laryngeal features (Quarterly progress report 101). MIT.

  • Dutoit, T. (1997). An introduction to text-to-speech synthesis. Dordrecht: Kluwer Academic.

    Book  Google Scholar 

  • Ekpenyong, M. E. (2013). Speech synthesis for tone language systems. Ph.D. dissertation, University of Uyo, Nigeria.

  • Ekpenyong, M., Urua, E.-A., & Gibbon, D. (2008). Towards an unrestricted domain TTS system for African tone languages. International Journal of Speech Technology 11, 87–96.

    Article  Google Scholar 

  • Ekpenyong, M., Urua, E.-A., Watts, O., King, S., & Yamagishi, J. (2013). Statistical parametric speech synthesis for Ibibio. Speech Communication [Online]. Available at doi:10.1016/j.specom.2013.02.003

  • Essien, O. E. (1990). A grammar of the Ibibio language. Ibadan: University Press Limited.

    Google Scholar 

  • Gibbon, D. (1987). Finite state processing of tone languages. In Proceedings of European ACL, Copenhagen.

    Google Scholar 

  • Gibbon, D. (2004). Tone and timing: two problems and two methods for prosodic typology. In Proceedings of the international conference on tonal aspects of languages, Beijing.

    Google Scholar 

  • Gibbon, D., Urua, E.-A., & Ulrike, G. (2003). A computational model of low tones in Ibibio. In Proc. of the international Congress of phonetic sciences, Barcelona (pp. 623–626).

    Google Scholar 

  • Gibbon, D., Urua, E.-A., & Ekpenyong, M. (2004). Data creation for Ibibio speech synthesis. Local Language Speech Technology Initiative (LLSTI) Publication.

  • Gu, W., & Lee, T. (2007). Effects of focus on prosody of cantonese speech—a comparison of surface feature analysis and model-based analysis. In Proc. of the international workshop on paralinguistic speech—between models and data, Saarbrücken, Germany.

    Google Scholar 

  • Hirst, D., & di Cristo, A. (1998). Intonation systems: a survey of twenty languages. London: Cambridge Univ. Press.

    Google Scholar 

  • Hyman, L. M. (1975). Phonology: theory and analysis. New York: Holt, Rinehart and Winston.

    Google Scholar 

  • Katamba, F. (1997). Morphology. London: Macmillan Press.

    Google Scholar 

  • Keller, E., Bailly, G., Monaghan, A., Terken, J., & Huckvale, M. (2002). Improvements in speech synthesis: cost 258: the naturalness of synthetic speech. Chichester: Willey.

    Google Scholar 

  • Kingston, J. (2005). The phonetics of Athabaskan tonogenesis. In S. Hargus & K. Rice (Eds.), Athabaskan prosody (pp. 137–184). Amsterdam: John Benjamins.

    Google Scholar 

  • Kirhner, J. S. (2003). Tone synthesis in Mandarin Chinese, University of Arisona. Available at http://jessesabakirchner.com/docs/2003-Mandarin-tone-synthesis.pdf.

  • Law, K. M., Tan, L., & Lau, W. H. (2001). Cantonese text-to-speech synthesis using sub-syllable units. In Proc. INTERSPEECH (pp. 991–994).

    Google Scholar 

  • Lee, K., & Cox, R. V. (2001). A very low bit rate speech coder based on recognition/synthesis paradigm. IEEE Transactions on Speech and Audio Processing, 9(5), 482–491.

    Article  Google Scholar 

  • Louw, J. A. (2008). Speect: a multilingual text-to-speech system. In Proc. of 19th annual symposium of the pattern recognition association of South Africa (PRASA), Cape Town (pp. 165–168).

    Google Scholar 

  • Saychum, S., Rugchatjaroen, A., Thatphithakkul, N., Wutiwiwatchai, C., & Thangthai, A. (2008). Automatic duration weighting in Thai unit-selection speech synthesis. In Proceedings of 5th international conference on electrical engineering/electronics, computer, telecommunications and information technology (ECTI-CON), Krabi (pp. 549–552).

    Google Scholar 

  • Shih, C. (2007). Prosody learning and generation. Berlin: Springer.

    Google Scholar 

  • Stewart, J. M. (1983). Downstep and floating low tones in Adioukrou. Journal of African Languages and Linguistics, 5, 57–78.

    Google Scholar 

  • Tokuda, K., Zen, H., & Black, A. (2002). An HMM-based speech synthesis system applied to English. In Proc. IEEE workshop on speech synthesis (pp. 227–230).

    Google Scholar 

  • Urua, E. (2007). Ibibio phonetics and phonology, (Revised ed.). Port-Harcourt: M & J Grand Orbit Communications Ltd.

    Google Scholar 

  • van Santen, J. P. H., Sproat, R. W., Olive, J. P., & Hirschberg, J. (1997). Progress in speech synthesis. New York: Springer.

    Book  MATH  Google Scholar 

  • Werner, S., & Keller, E. (1994). Prosodic aspects of speech. In E. Keller (Ed.), Fundamentals of speech synthesis and speech recognition: basic concepts, state of the art and future challenges (pp. 23–40). Chichester: Wiley.

    Google Scholar 

  • Yip, M. (2002). Tone. London: Cambridge Univ. Press.

    Book  Google Scholar 

  • Zen, H., Nose, T., Yamagishi, J., Sako, S., Masuko, T., Black, A., & Tokuda, K. (2007). The HMM-based speech synthesis system (HTS) version 2.0. In Proc. of 6th ISCA workshop on speech synthesis, Bonn (pp. 294–299).

    Google Scholar 

Download references

Acknowledgements

We appreciate Dr. Okokon Akpan, a lecturer in the Department of Linguistics and Nigerian Languages, University of Uyo, for responding to our request—to record and use his voice for the Ibibio speech database. We also thank the anonymous reviewers for their excellent comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Moses E. Ekpenyong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ekpenyong, M.E., Udoh, E. Tone modelling in Ibibio speech synthesis. Int J Speech Technol 17, 145–159 (2014). https://doi.org/10.1007/s10772-013-9216-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-013-9216-2

Keywords

Navigation