Weak Supervision for Semi-supervised Topic Modeling via Word Embeddings

  • Gerald Conheady
  • Derek Greene
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10318)


Semi-supervised algorithms have been shown to improve the results of topic modeling when applied to unstructured text corpora. However, sufficient supervision is not always available. This paper proposes a new process, Weak+, suitable for use in semi-supervised topic modeling via matrix factorization, when limited supervision is available. This process uses word embeddings to provide additional weakly-labeled data, which can result in improved topic modeling performance.



This research was partly supported by Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289.


  1. 1.
    Arora, S., Ge, R., Moitra, A.: Learning topic models-going beyond SVD. In: 53rd Annual Symposium on Foundations of Computer Science (FOCS), pp. 1–10 (2012)Google Scholar
  2. 2.
    Greene, D., Cunningham, P.: Constraint selection by committee: an ensemble approach to identifying informative constraints for semi-supervised clustering. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS, vol. 4701, pp. 140–151. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-74958-5_16 CrossRefGoogle Scholar
  3. 3.
    Kuang, D., Choo, J., Park, H.: Nonnegative matrix factorization for interactive topic modeling and document clustering. In: Partitional Clustering Algorithms, pp. 1–28 (2015)Google Scholar
  4. 4.
    Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–91 (1999)CrossRefGoogle Scholar
  5. 5.
    Li, T., Ding, C., Jordan, M.I.: Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In: Seventh IEEE International Conference on Data Mining (ICDM 2007), vol. 1, no. 2, pp. 577–582 (2007).
  6. 6.
    Mikolov, T., Corrado, G., Chen, K., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of ICLR 2013, pp. 1–12 (2013)Google Scholar
  7. 7.
    Rehurek, R.: gensim 1.0.0rc1: Python Package Index.

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.School of Computer ScienceUniversity College DublinDublinIreland

Personalised recommendations