Journal of Signal Processing Systems

, Volume 74, Issue 3, pp 375–389 | Cite as

Bayesian Sparse Topic Model

Article

Abstract

This paper presents a new Bayesian sparse learning approach to select salient lexical features for sparse topic modeling. The Bayesian learning based on latent Dirichlet allocation (LDA) is performed by incorporating the spike-and-slab priors. According to this sparse LDA (sLDA), the spike distribution is used to select salient words while the slab distribution is applied to establish the latent topic model based on those selected relevant words. The variational inference procedure is developed to estimate prior parameters for sLDA. In the experiments on document modeling using LDA and sLDA, we find that the proposed sLDA does not only reduce the model perplexity but also reduce the memory and computation costs. Bayesian feature selection method does effectively identify relevant topic words for building sparse topic model.

Keywords

Bayesian sparse learning Feature selection Topic model 

References

  1. 1.
    Babacan, S.D., Molina, R., Katsaggelos, A.K. (2010). Bayesian compressive sensing using Laplace priors. IEEE Transactions on Image Processing, 19(1), 53–63.CrossRefMathSciNetGoogle Scholar
  2. 2.
    Bishop, C.M. (2006). Pattern recognition and machine learning. New York: Springer Science.Google Scholar
  3. 3.
    Blei, D., Ng, A., Jordan, M.I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(5), 993–1022.MATHGoogle Scholar
  4. 4.
    Chang, Y.-L., & Chien, J.-T. (2009). Latent Dirichlet learning for document summarization. In Proceedings of international conference on acoustics, speech, and signal processing (ICASSP) (1689–1692).Google Scholar
  5. 5.
    Chang, Y.-L., Lee, K.-F., Chien, J.-T. (2011). Bayesian feature selection for sparse topic model. In Proceedings of IEEE international workshop on machine learning for signal processing (pp. 1–6).Google Scholar
  6. 6.
    Chien, J.-T., & Chueh, C.-H. (2011). Dirichlet class language models for speech recognition. IEEE Transactions on Audio, Speech and Language Processing, 19(3), 482–495.CrossRefGoogle Scholar
  7. 7.
    Chueh, C.-H., & Chien, J.-T. (2008). Reliable feature selection for language model adaptation. In Proceedings of international conference on acoustics, speech, and signal processing (ICASSP) (pp. 5089–5092).Google Scholar
  8. 8.
    Doshi-Velez, F., Miller, K.T., Van Gael, J., Teh, Y.W. (2009). Variational inference for the Indian buffet process. In Proceedings of artificial intelligence and statistics.Google Scholar
  9. 9.
    Ghahramani, Z., Griffiths, T.L., Sollich, P. (2007). Bayesian nonparametric latent feature models. Bayesian Statistics, 8, 201–225.MathSciNetGoogle Scholar
  10. 10.
    Gorur, D., Jakel, F., Rasmussen, C.E. (2006). A choice model with infinitely many latent features. In Proceedings of the international conference on machine learning (pp. 361–368).Google Scholar
  11. 11.
    Griffiths, T.L., & Ghahramani, Z. (2005). Infinite latent feature models and the Indian buffet process. In Advances in neural information processing systems (NIPS) (Vol. 18).Google Scholar
  12. 12.
    Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of international ACM SIGIR conference on research and development in information retrieval 50–57.Google Scholar
  13. 13.
    Ishwaran, H., & Rao, J.S. (2005). Spike and slab variable selection: frequentist and Bayesian strategies. Annals of Statistics, 33(2), 730–773.CrossRefMATHMathSciNetGoogle Scholar
  14. 14.
    Meeds, E., Ghahramani, Z., Neal, R.M., Roweis, S.T. (2007). Modeling dyadic data with binary latent factors. In Advances in neural information processing systems (NIPS) (Vol. 19).Google Scholar
  15. 15.
    Mimno, D., Hoffman, M.D., Blei, D.M. (2012). Sparse stochastic inference for latent Dirichlet allocation. International Conference on Machine Learning.Google Scholar
  16. 16.
    Mitchell, T.J., & Beauchamp, J.J. (1988). Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83(404), 1023–1032.CrossRefMATHMathSciNetGoogle Scholar
  17. 17.
    Mohamed, S., Heller, K., Ghahramani, Z. (2010). Sparse exponential family latent variable models. In Proceedings of NIPS workshop on practical applications of sparse modeling: Open issues and new directions.Google Scholar
  18. 18.
    O‘Hara, R.B., & Sillanpaa, M.J. (2009). A review of Bayesian variable selection methods: what, how and which. Bayesian Analysis, 4(1), 85–118.CrossRefMathSciNetGoogle Scholar
  19. 19.
    Saon, G., & Chien, J.-T. (2012). Bayesian sensing hidden Markov models. IEEE Transactions on Audio, Speech and Language Processing, 20(1), 43–54.CrossRefGoogle Scholar
  20. 20.
    Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476), 1566–1581.CrossRefMATHMathSciNetGoogle Scholar
  21. 21.
    Teh, Y.W., & Gorur, D. (2009). Indian buffet processes with power-law behavior. In Advances in neural information processing systems (Vol. 22, pp. 1838–1846).Google Scholar
  22. 22.
    Tipping, M.E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211–244.MATHMathSciNetGoogle Scholar
  23. 23.
    Wang, C., & Blei, D.M. (2009). Decoupling sparsity and smoothness in the discrete hierarchical Dirichlet process. In Advances in neural information processing systems (NIPS) (Vol. 22).Google Scholar
  24. 24.
    Williamson, S., Wang, C., Heller, K.A., Blei, D.M. (2010). The IBP compound Dirichlet process and its application to focused topic modeling. In Proceedings of international conference on machine learning.Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringNational Chiao Tung UniversityHsinchuRepublic of China

Personalised recommendations