ICCPOL 2016, NLPCC 2016: Natural Language Understanding and Intelligent Applications pp 657-664 | Cite as
Learning from LDA Using Deep Neural Networks
Abstract
Bayesian models and neural models have demonstrated their respective advantage in topic modeling. Motivated by the dark knowledge transfer approach proposed by [3], we present a novel method that combines the advantages of the two model families. Particularly, we present a transfer learning method that uses LDA to supervise the training of a deep neural network (DNN), so that the DNN can approximate the LDA inference with less computation. Our experimental results show that by transfer learning, a simple DNN can approximate the topic distribution produced by LDA pretty well, and deliver competitive performance as LDA on document classification, with much faster computation.
Keywords
Principle Component Analysis Latent Dirichlet Allocation Neural Model Transfer Learning Deep Neural NetworkNotes
Acknowledgments
The authors give great thanks to Dr. Shujie Liu (MSRA) for fruitful discussions. This research was supported by the National Science Foundation of China (NSFC) under the project No. 61371136, and the MESTDC PhD Foundation Project No. 20130002 120011. It was also supported by Huilan Ltd.
References
- 1.Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I.J., Bergeron, A., Bouchard, N., Bengio, Y.: Theano: new features and speed improvements. In: Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop (2012)Google Scholar
- 2.Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATHGoogle Scholar
- 3.Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network (2015). arXiv preprint arXiv:1503.02531
- 4.Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRefMATHGoogle Scholar
- 5.Hinton, G.E., Salakhutdinov, R.R.: Replicated softmax: an undirected topic model. In: Advances in Neural Information Processing Systems, pp. 1607–1614 (2009)Google Scholar
- 6.Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Improving neural networks by preventing co-adaptation of feature detectors (2012). CoRR http://arxiv.org/abs/1207.0580
- 7.Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of 15th Conference on Uncertainty in Artificial Intelligence, pp. 289–296. Morgan Kaufmann Publishers Inc. (1999)Google Scholar
- 8.Jolliffe, I.: Principal Component Analysis. Wiley Online Library, Hoboken (2002)MATHGoogle Scholar
- 9.Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., Welling, M.: Fast collapsed Gibbs sampling for latent Dirichlet allocation. In: Knowledge Discovery and Data Mining (2008)Google Scholar
- 10.Srivastava, N., Salakhutdinov, R.R., Hinton, G.E.: Modeling documents with deep Boltzmann machines (2013). arXiv preprint arXiv:1309.6865
- 11.Tang, J., Meng, Z., Nguyen, X., Mei, Q., Zhang, M.: Understanding the limiting factors of topic modeling via posterior contraction analysis. In: Proceedings of 31st International Conference on Machine Learning, pp. 190–198 (2014)Google Scholar
- 12.Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 101(476), 1566–1581 (2006)MathSciNetCrossRefMATHGoogle Scholar
- 13.Wang, D., Liu, C., Tang, Z., Zhang, Z., Zhao, M.: Recurrent neural network training with dark knowledge transfer (2015). arXiv preprint arXiv:1505.04630