Knowledge-Based Dataless Text Categorization

  • Rima TürkerEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11762)


Text categorization is an important task due to the rapid growth of online available text data in various domains such as web search snippets, news documents, etc. Traditional supervised methods require a significant amount of training data and manually labeling such data can be very time-consuming and costly. Moreover, in case the text to be labeled is of a specific domain, then only the expensive domain experts are able to fulfill the manual labeling task. This thesis focuses on the problem of missing labeled data and aims to develop a novel and generic model which does not require any labeled training data to categorize text. Instead, it utilizes the semantic similarity between documents and the predefined categories by leveraging graph embedding techniques.


Text categorization Dataless classification Network embeddings 



This thesis is supervised by Prof. Harald Sack and Dr. Lei Zhang.


  1. 1.
    Chang, M.W., Ratinov, L.A., Roth, D., Srikumar, V.: Importance of semantic representation: dataless classification. In: AAAI (2008)Google Scholar
  2. 2.
    Conneau, A., Schwenk, H., Barrault, L., LeCun, Y.: Very deep convolutional networks for natural language processing. CoRR (2016)Google Scholar
  3. 3.
    Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: IJCAI (2007)Google Scholar
  4. 4.
    Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP (2014)Google Scholar
  5. 5.
    Lee, J.Y., Dernoncourt, F.: Sequential short-text classification with recurrent and convolutional neural networks. In: CoRR (2016)Google Scholar
  6. 6.
    Li, C., Xing, J., Sun, A., Ma, Z.: Effective document labeling with very few seed words: a topic model approach. In: CIKM (2016)Google Scholar
  7. 7.
    Li, Y., Zheng, R., Tian, T., Hu, Z., Iyer, R., Sycara, K.P.: Joint embedding of hierarchical categories and entities for concept categorization and dataless classification. In: COLING (2016)Google Scholar
  8. 8.
    Meng, Y., Shen, J., Zhang, C., Han, J.: Weakly-supervised neural text classification. In: ACM (2018)Google Scholar
  9. 9.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS (2013)Google Scholar
  10. 10.
    Nigam, K., McCallum, A., Thrun, S., Mitchell, T.M.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39, 103–134 (2000)CrossRefGoogle Scholar
  11. 11.
    Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: online learning of social representations. In: KDD (2014)Google Scholar
  12. 12.
    Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: WWW (2008)Google Scholar
  13. 13.
    Ristoski, P., Paulheim, H.: RDF2Vec: RDF graph embeddings for data mining. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9981, pp. 498–514. Springer, Cham (2016). Scholar
  14. 14.
    Song, Y., Roth, D.: On dataless hierarchical text classification. In: AAAI (2014)Google Scholar
  15. 15.
    Türker, R., Zhang, L., Koutraki, M., Sack, H.: Knowledge-based short text categorization using entity and category embedding. In: Hitzler, P., et al. (eds.) ESWC 2019. LNCS, vol. 11503, pp. 346–362. Springer, Cham (2019). Scholar
  16. 16.
    Wang, C., Song, Y., Li, H., Zhang, M., Han, J.: Text classification with heterogeneous information network kernels. In: AAAI (2016)Google Scholar
  17. 17.
    Wang, J., Wang, Z., Zhang, D., Yan, J.: Combining knowledge with deep convolutional neural networks for short text classification. In: IJCAI (2017)Google Scholar
  18. 18.
    Wang, P., Xu, B., Xu, J., Tian, G., Liu, C.L., Hao, H.: Semantic expansionusing word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing 174, 806–814 (2016)CrossRefGoogle Scholar
  19. 19.
    Xuan, J., Jiang, H., Ren, Z., Yan, J., Luo, Z.: Automatic bug triage using semi-supervised text classification. In: SEKE (2010)Google Scholar
  20. 20.
    Zhang, X., LeCun, Y.: Text understanding from scratch. CoRR (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.FIZ Karlsruhe, Leibniz Institute for Information InfrastructureEggenstein-LeopoldshafenGermany
  2. 2.AIFBKarlsruhe Institute of Technology (KIT)KarlsruheGermany

Personalised recommendations