Separating Precision and Mean in Dirichlet-Enhanced High-Order Markov Models
Robustly estimating the state-transition probabilities of high-order Markov processes is an essential task in many applications such as natural language modeling or protein sequence modeling. We propose a novel estimation algorithm called Hierarchical Separated Dirichlet Smoothing (HSDS), where Dirichlet distributions are hierarchically assumed to be the prior distributions of the state-transition probabilities. The key idea in HSDS is to separate the parameters of a Dirichlet distribution into the precision and mean, so that the precision depends on the context while the mean is given by the lower-order distribution. HSDS is designed to outperform Kneser-Ney smoothing especially when the number of states is small, where Kneser-Ney smoothing is currently known as the state-of-the-art technique for N-gram natural language models. Our experiments in protein sequence modeling showed the superiority of HSDS both in perplexity evaluation and classification tasks.
Unable to display preview. Download preview PDF.
- 1.Chen, S., Goodman, J.: An empirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Harvard Computer Science (1998)Google Scholar
- 2.Ganapathiraju, M., Manoharan, V., Klein-Seetharaman, J.: BLMT: Statistical sequence analysis using n-grams. Applied Bioinformatics 3 (November 2004)Google Scholar
- 3.Netzer, O., Lattin, J.M., Srinivasan, V.: A Hidden Markov Model of Customer Relationship Dynamics. Stanford GSB Research Paper (July 2005)Google Scholar
- 4.Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 181–184 (May 1995)Google Scholar
- 6.Goldwater, S., Griffiths, T., Johnson, M.: Interpolating between types and tokens by estimating power-law generators. In: Advances in Neural Information Processing Systems (NIPS), vol. 18 (2006)Google Scholar
- 7.Teh, Y.W.: A Bayesian interpretation of interpolated Kneser-Ney. Technical Report TRA2/06, School of Computing, National University of Singapore (2006)Google Scholar
- 8.Teh, Y.W.: A hierarchical Bayesian language model based on Pitman-Yor processes. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics, vol. 44 (2006)Google Scholar
- 9.MacKay, D.J.C., Peto, L.: A hierarchical Dirichlet language model. Natural Language Engineering 1(3), 1–19 (1994)Google Scholar
- 10.Minka, T.: Estimating a Dirichlet distribution. Technical report, Microsoft Research (2003)Google Scholar
- 11.Minka, T.: Beyond Newton’s method. Technical report, Microsoft Research (2000)Google Scholar
- 14.Lewis, D.D.: Reuters-21578 text categorization test collection distribution 1.0 (1997), Available at http://www.daviddlewis.com/resources/testcollections/reuters21578/
- 15.Guo, T., Sun, Z.: Dbsubloc: Database of protein subcellular localization (2005), Available at http://www.bioinfo.tsinghua.edu.cn/~guotao/