Skip to main content
Log in

Learning concept embeddings for dataless classification via efficient bag-of-concepts densification

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Explicit concept space models have proven efficacy for text representation in many natural language and text mining applications. The idea is to embed textual structures into a semantic space of concepts which captures the main ideas, objects, and the characteristics of these structures. The so-called bag-of-concepts (BoC) representation suffers from data sparsity causing low similarity scores between similar texts due to low concept overlap. To address this problem, we propose two neural embedding models to learn continuous concept vectors. Once they are learned, we propose an efficient vector aggregation method to generate fully continuous BoC representations. We evaluate our concept embedding models on three tasks: (1) measuring entity semantic relatedness and ranking where we achieve 1.6% improvement in correlation scores, (2) dataless concept categorization where we achieve state-of-the-art performance and reduce the categorization error rate by more than 5% compared to five prior word and entity embedding models, and (3) dataless document classification where our models outperform the sparse BoC representations. In addition, by exploiting our efficient linear time vector aggregation method, we achieve better accuracy scores with much less concept dimensions compared to previous BoC densification methods which operate in polynomial time and require hundreds of dimensions in the BoC representation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. A concept is an expression that denotes an idea, event, or an object.

  2. Unless the term is very common (e.g., a, the, some...etc) and carry no relevant information.

  3. We use the terms continuous, dense, distributed vectors interchangeably to refer to real-valued vectors.

  4. https://concept.research.microsoft.com.

  5. https://www.bing.com/.

  6. \(p(\text{ Arabic } \text{ coffee }\mid \text{ beverage })=0\).

  7. In this paper, we use the terms “concept” and “entity” interchangeably.

  8. This is an illustrative example and doesn’t imply the two concepts will have totally dissimilar vectors.

  9. http://dumps.wikimedia.org/enwiki/.

  10. The weights are the TF-IDF scores from searching Wikipedia.

  11. In this paper, we use concept learning and concept categorization interchangeably.

  12. From a multi-class classification perspective, the accuracy scores would be equivalent to the clustering purity score as reported in Li et al. [16].

  13. Wikification is the process of identifying mentions of concepts and entities in a given free-text and linking them to Wikipedia.

  14. https://en.wikipedia.org/wiki/William_Stryker.

References

  1. Baroni M, Lenci A (2010) Distributional memory: a general framework for corpus-based semantics. Comput Linguist 36(4):673–721

    Article  Google Scholar 

  2. Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems, pp 2787–2795

  3. Ceccarelli D, Lucchese C, Orlando S, Perego R, Trani S (2013) Learning relatedness measures for entity linking. In: Proceedings of the 22nd ACM international conference on information & knowledge management. ACM, pp 139–148

  4. Chang M-W, Ratinov L-A, Roth D, Srikumar V (2008) Importance of semantic representation: dataless classification. AAAI 2:830–835

    Google Scholar 

  5. Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(Aug):2493–2537

    MATH  Google Scholar 

  6. Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using wikipedia-based explicit semantic analysis. IJCAI 7:1606–1611

    Google Scholar 

  7. Harris ZS (1954) Distributional structure. Word 10(2–3):146–162

    Google Scholar 

  8. Hoffart J, Seufert S, Nguyen DB, Theobald M, Weikum G (2012) Kore: keyphrase overlap relatedness for entity disambiguation. In: Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, pp 545–554

  9. Hu Z, Huang P, Deng Y, Gao Y, Xing EP (2015) Entity hierarchy embedding. In: Proceedings of the 53rd annual meeting of the association for computational linguistics

  10. Hua W, Wang Z, Wang H, Zheng K, Zhou X (2015) Short text understanding through lexical-semantic analysis. In: 2015 IEEE 31st international conference on data engineering (ICDE). IEEE, pp 495–506

  11. Huang H, Heck L, Ji H (2015) Leveraging deep neural networks and knowledge graphs for entity disambiguation. arXiv:1504.07678

  12. Hulpus I, Prangnawarat N, Hayes C (2015) Path-based semantic relatedness on linked data and its use to word and entity disambiguation. In: International semantic web conference. Springer, pp 442–457

  13. Kim D, Wang H, Oh AH (2013) Context-dependent conceptualization. In: IJCAI, pp 2654–2661

  14. Lang K (1995) Newsweeder: learning to filter netnews. In: Proceedings of the 12th international conference on machine learning, pp 331–339

  15. Li P, Wang H, Zhu KQ, Wang Z, Wu X (2013) Computing term similarity by large probabilistic isa knowledge. In: Proceedings of the 22nd ACM international conference on Conference on information & knowledge management. ACM, pp 1401–1410

  16. Li Y, Zheng R, Tian T, Hu Z, Iyer R, Sycara K (2016) Joint embedding of hierarchical categories and entities for concept categorization and dataless classification. arXiv:1607.07956

  17. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781

  18. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

  19. Milne D, Witten IH (2008) Learning to link with wikipedia. In: Proceedings of the 17th ACM conference on information and knowledge management. ACM, pp 509–518

  20. Papadimitriou CH, Steiglitz K (1982) Combinatorial optimization: algorithms and complexity. Courier Corporation, North Chelmsford

    MATH  Google Scholar 

  21. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12(Oct):2825–2830

    MathSciNet  MATH  Google Scholar 

  22. Peng H, Song Y, Roth D (2016) Event detection and co-reference with minimal supervision. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 392–402

  23. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the empirical methods in natural language processing (EMNLP 2014), vol 12, pp 1532–1543

  24. Rocchio JJ (1971) Relevance feedback in information retrieval. In: Salton G (ed) SMART retrieval system-experiments in automatic document processing. Prentice Hall, Englewood Cliffs, NJ, pp 313–323

    Google Scholar 

  25. Schuhmacher M, Ponzetto SP (2014) Knowledge-based graph document modeling. In: Proceedings of the 7th ACM international conference on web search and data mining. ACM, pp 543–552

  26. Shalaby W, Zadrozny W (2015) Measuring semantic relatedness using mined semantic analysis. arXiv:1512.03465

  27. Song Y, Roth D (2014) On dataless hierarchical text classification. In: AAAI, pp 1579–1585

  28. Song Y, Roth D (2015) Unsupervised sparse vector densification for short text similarity. In: Proceedings of NAACL

  29. Song Y, Wang H, Wang Z, Li H, Chen W (2011) Short text conceptualization using a probabilistic knowledgebase. In: Proceedings of the twenty-second international joint conference on artificial intelligence-volume volume three. AAAI Press, pp 2330–2336

  30. Song Y, Wang S, Wang H (2015) Open domain short text conceptualization: a generative + descriptive modeling approach. In: IJCAI, pp 3820–3826

  31. Wang Z, Wang H (2016) Understanding short texts. In: The association for computational linguistics (ACL) (Tutorial). https://www.microsoft.com/en-us/research/publication/understanding-short-texts/

  32. Wang Z, Wang H, Hu Z (2014) Head, modifier, and constraint detection in short texts. In: 2014 IEEE 30th international conference on data engineering (ICDE). IEEE, pp 280–291

  33. Wang Z, Zhao K, Wang H, Meng X, Wen J-R (2015) Query understanding through knowledge-based conceptualization. In: IJCAI, pp 3264–3270

  34. Witten I, Milne D (2008) An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proceeding of AAAI workshop on wikipedia and artificial intelligence: an evolving synergy, AAAI Press, Chicago, USA, pp 25–30

  35. Wu W, Li H, Wang H, Zhu KQ (2012) Probase: A probabilistic taxonomy for text understanding. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data. ACM, pp 481–492

  36. Yamada I, Shindo H, Takeda H, Takefuji Y (2016) Joint learning of the embedding of words and entities for named entity disambiguation. arXiv:1601.01343

  37. Kokoska S, Zwillinger D (1999) CRC standard probability and statistics tables and formulae. CRC Press

Download references

Acknowledgements

This work was supported by the National Science Foundation (Grant No. 1624035). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Walid Shalaby.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shalaby, W., Zadrozny, W. Learning concept embeddings for dataless classification via efficient bag-of-concepts densification. Knowl Inf Syst 61, 1047–1070 (2019). https://doi.org/10.1007/s10115-018-1321-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-018-1321-8

Keywords

Navigation