Prosodic Word Prediction Using a Maximum Entropy Approach

  • Honghui Dong
  • Jianhua Tao
  • Bo Xu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4274)


As the basic prosodic unit, the prosodic word influences the naturalness and the intelligibility greatly. Although the research shows that the lexicon word are greatly different from the prosodic word, the lexicon word still provides the important cues for the prosodic word forming. The rhythm constraint is another important factor for the prosodic word prediction. Some lexicon word length patterns trend to be combined together. Based on the mapping relationship and the difference between the lexicon words and the prosodic words, the process of the prosodic word prediction is divided into two parts, grouping the lexicon word to the prosodic word and splitting the lexicon word into prosodic words. This paper proposes a maximum entropy method to model these two parts, respectively. The experiment results show that this maximum entropy model is competent for the prosodic word prediction task. In the word grouping model, a feature selection algorithm is used to induce more efficient features for the model, which not only decrease the feature number greatly, but also improve the model performance at the same time. And, the splitting model can correctly detect the prosodic word boundary in the lexicon word. The f-score of the prosodic word boundary prediction reaches 95.55%.


Maximum Entropy Statistic Machine Translation Lexical Information Word Grouping Maximum Entropy Model 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Qian, Y., Chu, M., Peng, H.: Segmenting unrestricted Chinese text into prosodic words instead of lexicon words. In: Proceeding of the 2001 International Conference on Acoustic, Speech and Signal Processing, Salt Lake City (2001)Google Scholar
  2. 2.
    Shi, Q., Ma, X.: Statistic Prosody Structure Prediction. In: Int. Proc. of the IEEE 2002 Workshop on Speech Synthesis, Santa Monica, Ca. (2002)Google Scholar
  3. 3.
    Sheng, Z., Jianhua, T., et al.: Chinese prosodic phrasing with extended features. In: ICASSP 2003 (2003)Google Scholar
  4. 4.
    Cao, J.: The Rhythm of Mandarin Chinese. Journal of Chinese Linguistics, Monograph Series 17 (2002)Google Scholar
  5. 5.
    Hongjun, W.: Prosodic words and prosodic phrases in Chinese. Chinese languages and writings 274-279 (2000)Google Scholar
  6. 6.
    Berger, A.L., Della Pietra, V.J., Della Pietra, S.A.: A Maximum Entropy Approach to Natural Language Processing. Computational Linguistics 22(1), 39–71 (1996)Google Scholar
  7. 7.
    Low, J.K., Ng, H.T., Guo, W.: A Maximum Entropy Approach to Chinese Word Segmentation. In: Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea, pp. 161–164 (2005)Google Scholar
  8. 8.
    Och, F.J., Ney, H.: Discriminative Training and Maximum Entropy Models for Statistical Machine Translation. In: Proc. of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, July 2002, pp. 295–302 (2002)Google Scholar
  9. 9.
    Darroch, J.N., Ratcliff, D.: Generalized Iterative Scaling for Log-Linear Models. The Annals of Mathematical Statistics 43(5), 1470–1480 (1972)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Ratnaparkhi, A.: Maximum Entropy Models For Natural Language Ambiguity Resolution. In: Computer and Information Science. University of Pennsylvania, Philadelphia (1998)Google Scholar
  11. 11.
    Della Pietra, S., Della Pietra, V., Lafferty, J.: Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(4), 380–393 (1997)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Honghui Dong
    • 1
  • Jianhua Tao
    • 1
  • Bo Xu
    • 2
  1. 1.National Laboratory of Pattern Recognition 
  2. 2.High Technology Innovation Center, Institute of AutomationChinese Academy of Sciences 

Personalised recommendations