Learning concept embeddings for dataless classification via efficient bag-of-concepts densification

  • Walid ShalabyEmail author
  • Wlodek Zadrozny
Regular Paper


Explicit concept space models have proven efficacy for text representation in many natural language and text mining applications. The idea is to embed textual structures into a semantic space of concepts which captures the main ideas, objects, and the characteristics of these structures. The so-called bag-of-concepts (BoC) representation suffers from data sparsity causing low similarity scores between similar texts due to low concept overlap. To address this problem, we propose two neural embedding models to learn continuous concept vectors. Once they are learned, we propose an efficient vector aggregation method to generate fully continuous BoC representations. We evaluate our concept embedding models on three tasks: (1) measuring entity semantic relatedness and ranking where we achieve 1.6% improvement in correlation scores, (2) dataless concept categorization where we achieve state-of-the-art performance and reduce the categorization error rate by more than 5% compared to five prior word and entity embedding models, and (3) dataless document classification where our models outperform the sparse BoC representations. In addition, by exploiting our efficient linear time vector aggregation method, we achieve better accuracy scores with much less concept dimensions compared to previous BoC densification methods which operate in polynomial time and require hundreds of dimensions in the BoC representation.


Dataless classification Concept categorization Entity relatedness Concept space models Concept embeddings Bag-of-concepts 



This work was supported by the National Science Foundation (Grant No. 1624035). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.


  1. 1.
    Baroni M, Lenci A (2010) Distributional memory: a general framework for corpus-based semantics. Comput Linguist 36(4):673–721CrossRefGoogle Scholar
  2. 2.
    Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems, pp 2787–2795Google Scholar
  3. 3.
    Ceccarelli D, Lucchese C, Orlando S, Perego R, Trani S (2013) Learning relatedness measures for entity linking. In: Proceedings of the 22nd ACM international conference on information & knowledge management. ACM, pp 139–148Google Scholar
  4. 4.
    Chang M-W, Ratinov L-A, Roth D, Srikumar V (2008) Importance of semantic representation: dataless classification. AAAI 2:830–835Google Scholar
  5. 5.
    Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(Aug):2493–2537zbMATHGoogle Scholar
  6. 6.
    Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using wikipedia-based explicit semantic analysis. IJCAI 7:1606–1611Google Scholar
  7. 7.
    Harris ZS (1954) Distributional structure. Word 10(2–3):146–162Google Scholar
  8. 8.
    Hoffart J, Seufert S, Nguyen DB, Theobald M, Weikum G (2012) Kore: keyphrase overlap relatedness for entity disambiguation. In: Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, pp 545–554Google Scholar
  9. 9.
    Hu Z, Huang P, Deng Y, Gao Y, Xing EP (2015) Entity hierarchy embedding. In: Proceedings of the 53rd annual meeting of the association for computational linguisticsGoogle Scholar
  10. 10.
    Hua W, Wang Z, Wang H, Zheng K, Zhou X (2015) Short text understanding through lexical-semantic analysis. In: 2015 IEEE 31st international conference on data engineering (ICDE). IEEE, pp 495–506Google Scholar
  11. 11.
    Huang H, Heck L, Ji H (2015) Leveraging deep neural networks and knowledge graphs for entity disambiguation. arXiv:1504.07678
  12. 12.
    Hulpus I, Prangnawarat N, Hayes C (2015) Path-based semantic relatedness on linked data and its use to word and entity disambiguation. In: International semantic web conference. Springer, pp 442–457Google Scholar
  13. 13.
    Kim D, Wang H, Oh AH (2013) Context-dependent conceptualization. In: IJCAI, pp 2654–2661Google Scholar
  14. 14.
    Lang K (1995) Newsweeder: learning to filter netnews. In: Proceedings of the 12th international conference on machine learning, pp 331–339Google Scholar
  15. 15.
    Li P, Wang H, Zhu KQ, Wang Z, Wu X (2013) Computing term similarity by large probabilistic isa knowledge. In: Proceedings of the 22nd ACM international conference on Conference on information & knowledge management. ACM, pp 1401–1410Google Scholar
  16. 16.
    Li Y, Zheng R, Tian T, Hu Z, Iyer R, Sycara K (2016) Joint embedding of hierarchical categories and entities for concept categorization and dataless classification. arXiv:1607.07956
  17. 17.
    Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781
  18. 18.
    Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119Google Scholar
  19. 19.
    Milne D, Witten IH (2008) Learning to link with wikipedia. In: Proceedings of the 17th ACM conference on information and knowledge management. ACM, pp 509–518Google Scholar
  20. 20.
    Papadimitriou CH, Steiglitz K (1982) Combinatorial optimization: algorithms and complexity. Courier Corporation, North ChelmsfordzbMATHGoogle Scholar
  21. 21.
    Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12(Oct):2825–2830MathSciNetzbMATHGoogle Scholar
  22. 22.
    Peng H, Song Y, Roth D (2016) Event detection and co-reference with minimal supervision. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 392–402Google Scholar
  23. 23.
    Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the empirical methods in natural language processing (EMNLP 2014), vol 12, pp 1532–1543Google Scholar
  24. 24.
    Rocchio JJ (1971) Relevance feedback in information retrieval. In: Salton G (ed) SMART retrieval system-experiments in automatic document processing. Prentice Hall, Englewood Cliffs, NJ, pp 313–323Google Scholar
  25. 25.
    Schuhmacher M, Ponzetto SP (2014) Knowledge-based graph document modeling. In: Proceedings of the 7th ACM international conference on web search and data mining. ACM, pp 543–552Google Scholar
  26. 26.
    Shalaby W, Zadrozny W (2015) Measuring semantic relatedness using mined semantic analysis. arXiv:1512.03465
  27. 27.
    Song Y, Roth D (2014) On dataless hierarchical text classification. In: AAAI, pp 1579–1585Google Scholar
  28. 28.
    Song Y, Roth D (2015) Unsupervised sparse vector densification for short text similarity. In: Proceedings of NAACLGoogle Scholar
  29. 29.
    Song Y, Wang H, Wang Z, Li H, Chen W (2011) Short text conceptualization using a probabilistic knowledgebase. In: Proceedings of the twenty-second international joint conference on artificial intelligence-volume volume three. AAAI Press, pp 2330–2336Google Scholar
  30. 30.
    Song Y, Wang S, Wang H (2015) Open domain short text conceptualization: a generative + descriptive modeling approach. In: IJCAI, pp 3820–3826Google Scholar
  31. 31.
    Wang Z, Wang H (2016) Understanding short texts. In: The association for computational linguistics (ACL) (Tutorial).
  32. 32.
    Wang Z, Wang H, Hu Z (2014) Head, modifier, and constraint detection in short texts. In: 2014 IEEE 30th international conference on data engineering (ICDE). IEEE, pp 280–291Google Scholar
  33. 33.
    Wang Z, Zhao K, Wang H, Meng X, Wen J-R (2015) Query understanding through knowledge-based conceptualization. In: IJCAI, pp 3264–3270Google Scholar
  34. 34.
    Witten I, Milne D (2008) An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proceeding of AAAI workshop on wikipedia and artificial intelligence: an evolving synergy, AAAI Press, Chicago, USA, pp 25–30Google Scholar
  35. 35.
    Wu W, Li H, Wang H, Zhu KQ (2012) Probase: A probabilistic taxonomy for text understanding. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data. ACM, pp 481–492Google Scholar
  36. 36.
    Yamada I, Shindo H, Takeda H, Takefuji Y (2016) Joint learning of the embedding of words and entities for named entity disambiguation. arXiv:1601.01343
  37. 37.
    Kokoska S, Zwillinger D (1999) CRC standard probability and statistics tables and formulae. CRC PressGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Computer Science DepartmentUniversity of North Carolina at CharlotteCharlotteUSA

Personalised recommendations