In this paper, we propose Fully Sparse Topic Model (FSTM) for modeling large collections of documents. Three key properties of the model are: (1) the inference algorithm converges in linear time, (2) learning of topics is simply a multiplication of two sparse matrices, (3) it provides a principled way to directly trade off sparsity of solutions against inference quality and running time. These properties enable us to speedily learn sparse topics and to infer sparse latent representations of documents, and help significantly save memory for storage. We show that inference in FSTM is actually MAP inference with an implicit prior. Extensive experiments show that FSTM can perform substantially better than various existing topic models by different performance measures. Finally, our parallel implementation can handily learn thousands of topics from large corpora with millions of terms.


Topic Model Latent Dirichlet Allocation Inference Algorithm Sparse Solution Sparse Approximation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Smola, A., Narayanamurthy, S.: An architecture for parallel topic models. Proceedings of the VLDB Endowment 3(1-2), 703–710 (2010)Google Scholar
  2. 2.
    Hoffman, M.D., Blei, D.M., Bach, F.: Online learning for latent dirichlet allocation. In: NIPS, vol. 23, pp. 856–864 (2010)Google Scholar
  3. 3.
    Newman, D., Asuncion, A., Smyth, P., Welling, M.: Distributed algorithms for topic models. The Journal of Machine Learning Research 10, 1801–1828 (2009)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Asuncion, A.U., Smyth, P., Welling, M.: Asynchronous distributed estimation of topic models for document analysis. Statistical Methodology 8(1), 3–17 (2011)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Wang, Q., Xu, J., Li, H., Craswell, N.: Regularized latent semantic indexing. In: SIGIR 2011, pp. 685–694. ACM (2011)Google Scholar
  6. 6.
    Wang, Y., Bai, H., Stanton, M., Chen, W.-Y., Chang, E.Y.: PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications. In: Goldberg, A.V., Zhou, Y. (eds.) AAIM 2009. LNCS, vol. 5564, pp. 301–314. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  7. 7.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3(3), 993–1022 (2003)zbMATHGoogle Scholar
  8. 8.
    Sontag, D., Roy, D.M.: Complexity of inference in latent dirichlet allocation. In: Advances in Neural Information Processing Systems, NIPS (2011)Google Scholar
  9. 9.
    Shashanka, M., Raj, B., Smaragdis, P.: Sparse overcomplete latent variable decomposition of counts data. In: NIPS (2007)Google Scholar
  10. 10.
    Zhu, J., Xing, E.P.: Sparse topical coding. In: UAI (2011)Google Scholar
  11. 11.
    Williamson, S., Wang, C., Heller, K.A., Blei, D.M.: The ibp compound dirichlet process and its application to focused topic modeling. In: ICML (2010)Google Scholar
  12. 12.
    Wang, C., Blei, D.M.: Decoupling sparsity and smoothness in the discrete hierarchical dirichlet process. In: NIPS, vol. 22, pp. 1982–1989 (2009)Google Scholar
  13. 13.
    Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 42, 177–196 (2001)zbMATHCrossRefGoogle Scholar
  14. 14.
    Clarkson, K.L.: Coresets, sparse greedy approximation, and the frank-wolfe algorithm. ACM Trans. Algorithms 6, 63:1–63:30 (2010)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Nesterov, Y.: Smooth minimization of non-smooth functions. Mathematical Programming 103(1), 127–152 (2005)MathSciNetzbMATHCrossRefGoogle Scholar
  16. 16.
    Lan, G.: An optimal method for stochastic composite optimization. Mathematical Programming, 1–33 (2011)Google Scholar
  17. 17.
    Murray, W., Gill, P., Wright, M.: Practical optimization. Academic Press (1981)Google Scholar
  18. 18.
    Forster, M.R.: Key concepts in model selection: Performance and generalizability. Journal of Mathematical Psychology 44(1), 205–231 (2000)zbMATHCrossRefGoogle Scholar
  19. 19.
    Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: Liblinear: A library for large linear classification. Journal of Machine Learning Research 9, 1871–1874 (2008)zbMATHGoogle Scholar
  20. 20.
    Yu, H.F., Hsieh, C.J., Chang, K.W., Lin, C.J.: Large linear classification when data cannot fit in memory. ACM Trans. Knowl. Discov. Data 5(4), 23:1–23:23 (2012)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Khoat Than
    • 1
  • Tu Bao Ho
    • 1
    • 2
  1. 1.Japan Advanced Institute of Science and TechnologyNomiJapan
  2. 2.John von Neumann InstituteVietnam National University, HCMVietnam

Personalised recommendations