KIDER: Knowledge-Infused Document Embedding Representation for Text Categorization

  • Yu-Ting Chen
  • Zheng-Wen Lin
  • Yung-Chun ChangEmail author
  • Wen-Lian Hsu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12144)


Advancement of deep learning has improved performances on a wide variety of tasks. However, language reasoning and understanding remain difficult tasks in Natural Language Processing (NLP). In this work, we consider this problem and propose a novel Knowledge-Infused Document Embedding Representation (KIDER) for text categorization. We use knowledge patterns to generate high quality document representation. These patterns preserve categorical-distinctive semantic information, provide interpretability, and achieve superior performances at the same time. Experiments show that the KIDER model outperforms state-of-the-art methods on two important NLP tasks, i.e., emotion analysis and news topic detection, by 7% and 20%. In addition, we also demonstrate the potential of highlighting important information for each category and news using these patterns. These results show the value of knowledge-infused patterns in terms of interpretability and performance enhancement.


Text categorization Natural Language Processing Knowledge representation 


  1. 1.
    Aggarwal, C.C., Zhai, C.: Mining Text Data. Springer Science & Business Media, Heidelberg (2012)CrossRefGoogle Scholar
  2. 2.
    Wu, H., Salton, G.: A comparison of search term weighting: term relevance vs. inverse document frequency. In: ACM SIGIR Forum, vol. 16, pp. 30–39. ACM (1981)Google Scholar
  3. 3.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  4. 4.
    LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)CrossRefGoogle Scholar
  5. 5.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  6. 6.
    Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
  7. 7.
    Lee, J.Y., Dernoncourt, F.: Sequential short-text classification with recurrent and convolutional neural networks. arXiv preprint arXiv:1603.03827 (2016)
  8. 8.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
  9. 9.
    Alexander, J.A.: Template-based procedures for neural network interpretation. Ph.D. thesis, University of Colorado (1994)Google Scholar
  10. 10.
    Seidenberg, M.: Language at the Speed of Sight How We Read, Why So Many Can’t, and What Can Be Done About It, 1st edn. Basic Books, New York (2017)Google Scholar
  11. 11.
    Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998)CrossRefGoogle Scholar
  12. 12.
    Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 363–370. Association for Computational Linguistic (2005)Google Scholar
  13. 13.
    Chen, K.J., Huang, S.L., Shih, Y.Y., Chen, Y.J.: Extended-Hownet: a representational framework for concepts. In: Proceedings of OntoLex 2005-Ontologies and Lexical Resources (2005)Google Scholar
  14. 14.
    Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Comput. Linguist. 19(1), 61–74 (1993)Google Scholar
  15. 15.
    Chang, Y.C., Chu, C.H., Su, Y.C., Chen, C.C., Hsu, W.L.: PIPE: a BIOC module for protein-protein interaction passage extraction. Database (Oxford) 2016, baw101 (2016)Google Scholar
  16. 16.
    Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)Google Scholar
  17. 17.
    Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML, vol. 30, p. 3 (2013)Google Scholar
  18. 18.
    Chang, Y.C., Chen, C.C., Hsieh, Y.L., Chen, C.C., Hsu, W.L.: Linguistic template extraction for recognizing reader-emotion and emotional resonance writing assistance. In: ACL, pp. 775–780 (2015)Google Scholar
  19. 19.
    Chang, Y.-C., Hsieh, Y.-L., Chen, C.-C., Hsu, W.-L.: A semantic frame-based intelligent agent for topic detection. Soft. Comput. 21(2), 391–401 (2015). Scholar
  20. 20.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  21. 21.
    Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
  22. 22.
    Yogatama, D., Dyer, C., Ling, W., Blunsom, P.: Generative and discriminative text classification with recurrent neural networks. arXiv preprint arXiv:1703.01898 (2017)
  23. 23.
    McCallum, A., Nigam, K.: A comparison of event models for Naïve Bayes text classification. In: AAAI/ICML-1998 Workshop on Learning for Text Categorization, pp. 41–48 (1998)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Yu-Ting Chen
    • 1
  • Zheng-Wen Lin
    • 2
  • Yung-Chun Chang
    • 3
    Email author
  • Wen-Lian Hsu
    • 4
  1. 1.Department of StatisticsNational Taiwan UniversityTaipeiTaiwan
  2. 2.Department of Information Science and ApplicationsNational Tsing Hua UniversityHsinchuTaiwan
  3. 3.Department of Data ScienceTaipei Medical UniversityTaipeiTaiwan
  4. 4.Institute of Information ScienceAcademia SinicaTaipeiTaiwan

Personalised recommendations