Feature Selection for Improved Phone Duration Modeling of Greek Emotional Speech

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6040)


In the present work we address the problem of phone duration modeling for the needs of emotional speech synthesis. Specifically, relying on ten well known machine learning techniques, we investigate the practical usefulness of two feature selection techniques, namely the Relief and the Correlation-based Feature Selection (CFS) algorithms, for improving the accuracy of phone duration modeling. The feature selection is performed over a large set of phonetic, morphologic and syntactic features. In the experiments, we employed phone duration models, based on decision trees, linear regression, lazy-learning algorithms and meta-learning algorithms, trained on a Modern Greek speech database of emotional speech, which consists of five categories of emotional speech: anger, fear, joy, neutral, sadness. The experimental results demonstrated that feature selection significantly improves the accuracy of phone duration modeling regardless of the type of machine learning algorithm used for phone duration modeling.


Phone duration modeling feature selection emotional speech 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Dutoit, T.: An Introduction to Text-To-Speech Synthesis. Kluwer Academic Publishers, Dodrecht (1997)Google Scholar
  2. 2.
    Klatt, D.H.: Synthesis by rule of segmental durations in English sentences. In: Lindlom, B., Ohman, S. (eds.) Frontiers of Speech Communication Research, pp. 287–300. Academic Press, New York (1979)Google Scholar
  3. 3.
    Möbius, B., Santen, P.H.J.: Modeling Segmental duration in German Text-to-Speech Synthesis. In: 4th International Conference on Spoken Language Processing (ICSLP), pp. 2395–2398 (1996)Google Scholar
  4. 4.
    Takeda, K., Sagisaka, Y., Kuwabara, H.: On sentence-level factors governing segmental duration in Japanese. Journal of Acoustic Society of America 6(86), 2081–2087 (1989)CrossRefGoogle Scholar
  5. 5.
    Santen, J.P.H.: Contextual effects on vowel durations. Speech Communication 11, 513–546 (1992)CrossRefGoogle Scholar
  6. 6.
    Campbell, W.N.: Syllable based segment duration. In: Bailly, G., Benoit, C., Sawallis, T.R. (eds.) Talking Machines: Theories, Models and Designs, pp. 211–224. Elsevier, Amsterdam (1992)Google Scholar
  7. 7.
    Goubanova, O., King, S.: Bayesian network for phone duration prediction. Speech Communication 50, 301–311 (2008)CrossRefGoogle Scholar
  8. 8.
    Lazaridis, A., Zervas, P., Kokkinakis, G.: Segmental Duration Modeling for Greek Speech Synthesis. In: 19th IEEE International Conference of Tools with Artificial Intelligence (ICTAI), pp. 518–521 (2007)Google Scholar
  9. 9.
    Jiang, D.N., Zhang, W., Shen, L., Cai, L.H.: Prosody Analysis and Modeling for Emotional Speech Synthesis. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP 2005), pp. 281–284 (2005)Google Scholar
  10. 10.
    Inanoglu, Z., Young, S.: Data-driven emotion conversion in spoken English. Speech Communication 51, 268–283 (2009)CrossRefGoogle Scholar
  11. 11.
    Kira, K., Rendell, L.A.: A practical approach to feature selection. In: 9th International Conference on Machine Learning (ICML), pp. 249–256 (1992)Google Scholar
  12. 12.
    Witten, H.I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann Publishing, San Francisco (2005)zbMATHGoogle Scholar
  13. 13.
    Wang, Y., Witten, I.H.: Induction of model trees for predicting continuous classes. In: 9th European Conf. on Machine Learning, University of Economics, Faculty of Informatics and Statistics, pp. 128–137 (1997)Google Scholar
  14. 14.
    Quinlan, R.J.: Learning with continuous classes. In: 5th Australian Joint Conference on Artificial Intelligence, pp. 343–348 (1992)Google Scholar
  15. 15.
    Kääriäinen, M., Malinen, T.: Selective Rademacher Penalization and Reduced Error Pruning of Decision Trees. Journal of Machine Learning Research 5, 1107–1126 (2004)Google Scholar
  16. 16.
    Aha, D., Kibler, D., Albert, M.: Instance-based learning algorithms. Journal of Machine Learning 6, 37–66 (1991)Google Scholar
  17. 17.
    Atkeson, C.G., Moorey, A.W., Schaal, S.: Locally Weighted Learning. Artificial Intelligence Review 11, 11–73 (1996)CrossRefGoogle Scholar
  18. 18.
    Friedman, J.H.: Stochastic gradient boosting. Comput. Statist. Data Anal. 4(38), 367–378 (2002)CrossRefGoogle Scholar
  19. 19.
    Breiman, L.: Bagging Predictors. Journal of Machine Learning 2(24), 123–140 (1996)Google Scholar
  20. 20.
    Hall, M.A.: Correlation-based Feature Subset Selection for Machine Learning. PhD thesis, Department of Computer Science, University of Waikato, Waikato, New Zealand (1999)Google Scholar
  21. 21.
    Goldberg, D.E.: Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Reading (1989)zbMATHGoogle Scholar
  22. 22.
    Wang, L., Zhao, Y., Chu, M., Zhou, J., Cao, Z.: Refining segmental boundaries for TTS database using fine contextual-dependent boundary models. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP 2004), pp. 641–644 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  1. 1.Artificial Intelligence Group, Wire Communications Laboratory, Department of Electrical and Computer EngineeringUniversity of PatrasRion-PatrasGreece

Personalised recommendations