Evaluating Language Models Within a Predictive Framework: An Analysis of Ranking Distributions

  • Pierre Alain
  • Olivier Boëffard
  • Nelly Barbot
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4188)


Perplexity is a widely used criterion in order to compare language models without any task assumptions. However, the main drawback is that perplexity supposes probability distributions and hence cannot compare heterogeneous models. As an evaluation framework, we propose in this article to abandon perplexity and to extend the Shannon’s entropy idea which is based on model prediction performance using rank based statistics. Our methodology is able to predict joint word sequences being independent of the task or model assumptions. Experiments are carried out on the English language with different kind of language models. We show that long-term prediction language models are not more effective than the standard n-gram models. Ranking distributions follow exponential laws as already observed in predicting letter sequences. These distributions show a second mode not observed with letters and we propose to give some interpretation to this mode in this article.


Average Rank Maximum Rank Good Path Training Corpus Word Sequence 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Shannon, C.: Prediction and entropy of printed english. Bell System Technical Journal 30, 50–64 (1951)MATHGoogle Scholar
  2. 2.
    Cover, T., King, R.: A convergent gambling estimate of the entropy of english. IEEE Transactions on Information Theory 24, 413–421 (1978)MATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Bimbot, F., El-Beze, M., Igounet, S., Jardino, M., Smaili, K., Zitouni, I.: An alternative scheme for perplexity estimation and its assessment for the evaluation of language models. Computer Speech and Language 15, 1–13 (2001)CrossRefGoogle Scholar
  4. 4.
    Deligne, S., Bimbot, F.: Language modeling by variable length sequences: theoretical formulation and evaluation of multigrams. In: IEEE International Conference on Acoustics and Speech Signal Processing, pp. 169–172 (1995)Google Scholar
  5. 5.
    Chen, S., Goodman, J.: An empirical study of smoothing techniques for language modeling. Computer Speech and Language 13, 359–394 (1999)CrossRefGoogle Scholar
  6. 6.
    Garside, R., Geoffrey, L., Geoffrey, S.: The computational analysis of english. a corpus-based approach. Longman, London (1987)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Pierre Alain
    • 1
  • Olivier Boëffard
    • 1
  • Nelly Barbot
    • 1
  1. 1.IRISA / Université de Rennes 1 – ENSSATLannion

Personalised recommendations