Abstract
Explicit concept space models have proven efficacy for text representation in many natural language and text mining applications. The idea is to embed textual structures into a semantic space of concepts which captures the main ideas, objects, and the characteristics of these structures. The so-called bag-of-concepts (BoC) representation suffers from data sparsity causing low similarity scores between similar texts due to low concept overlap. To address this problem, we propose two neural embedding models to learn continuous concept vectors. Once they are learned, we propose an efficient vector aggregation method to generate fully continuous BoC representations. We evaluate our concept embedding models on three tasks: (1) measuring entity semantic relatedness and ranking where we achieve 1.6% improvement in correlation scores, (2) dataless concept categorization where we achieve state-of-the-art performance and reduce the categorization error rate by more than 5% compared to five prior word and entity embedding models, and (3) dataless document classification where our models outperform the sparse BoC representations. In addition, by exploiting our efficient linear time vector aggregation method, we achieve better accuracy scores with much less concept dimensions compared to previous BoC densification methods which operate in polynomial time and require hundreds of dimensions in the BoC representation.
Similar content being viewed by others
Notes
A concept is an expression that denotes an idea, event, or an object.
Unless the term is very common (e.g., a, the, some...etc) and carry no relevant information.
We use the terms continuous, dense, distributed vectors interchangeably to refer to real-valued vectors.
\(p(\text{ Arabic } \text{ coffee }\mid \text{ beverage })=0\).
In this paper, we use the terms “concept” and “entity” interchangeably.
This is an illustrative example and doesn’t imply the two concepts will have totally dissimilar vectors.
The weights are the TF-IDF scores from searching Wikipedia.
In this paper, we use concept learning and concept categorization interchangeably.
From a multi-class classification perspective, the accuracy scores would be equivalent to the clustering purity score as reported in Li et al. [16].
Wikification is the process of identifying mentions of concepts and entities in a given free-text and linking them to Wikipedia.
References
Baroni M, Lenci A (2010) Distributional memory: a general framework for corpus-based semantics. Comput Linguist 36(4):673–721
Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems, pp 2787–2795
Ceccarelli D, Lucchese C, Orlando S, Perego R, Trani S (2013) Learning relatedness measures for entity linking. In: Proceedings of the 22nd ACM international conference on information & knowledge management. ACM, pp 139–148
Chang M-W, Ratinov L-A, Roth D, Srikumar V (2008) Importance of semantic representation: dataless classification. AAAI 2:830–835
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(Aug):2493–2537
Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using wikipedia-based explicit semantic analysis. IJCAI 7:1606–1611
Harris ZS (1954) Distributional structure. Word 10(2–3):146–162
Hoffart J, Seufert S, Nguyen DB, Theobald M, Weikum G (2012) Kore: keyphrase overlap relatedness for entity disambiguation. In: Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, pp 545–554
Hu Z, Huang P, Deng Y, Gao Y, Xing EP (2015) Entity hierarchy embedding. In: Proceedings of the 53rd annual meeting of the association for computational linguistics
Hua W, Wang Z, Wang H, Zheng K, Zhou X (2015) Short text understanding through lexical-semantic analysis. In: 2015 IEEE 31st international conference on data engineering (ICDE). IEEE, pp 495–506
Huang H, Heck L, Ji H (2015) Leveraging deep neural networks and knowledge graphs for entity disambiguation. arXiv:1504.07678
Hulpus I, Prangnawarat N, Hayes C (2015) Path-based semantic relatedness on linked data and its use to word and entity disambiguation. In: International semantic web conference. Springer, pp 442–457
Kim D, Wang H, Oh AH (2013) Context-dependent conceptualization. In: IJCAI, pp 2654–2661
Lang K (1995) Newsweeder: learning to filter netnews. In: Proceedings of the 12th international conference on machine learning, pp 331–339
Li P, Wang H, Zhu KQ, Wang Z, Wu X (2013) Computing term similarity by large probabilistic isa knowledge. In: Proceedings of the 22nd ACM international conference on Conference on information & knowledge management. ACM, pp 1401–1410
Li Y, Zheng R, Tian T, Hu Z, Iyer R, Sycara K (2016) Joint embedding of hierarchical categories and entities for concept categorization and dataless classification. arXiv:1607.07956
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Milne D, Witten IH (2008) Learning to link with wikipedia. In: Proceedings of the 17th ACM conference on information and knowledge management. ACM, pp 509–518
Papadimitriou CH, Steiglitz K (1982) Combinatorial optimization: algorithms and complexity. Courier Corporation, North Chelmsford
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12(Oct):2825–2830
Peng H, Song Y, Roth D (2016) Event detection and co-reference with minimal supervision. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 392–402
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the empirical methods in natural language processing (EMNLP 2014), vol 12, pp 1532–1543
Rocchio JJ (1971) Relevance feedback in information retrieval. In: Salton G (ed) SMART retrieval system-experiments in automatic document processing. Prentice Hall, Englewood Cliffs, NJ, pp 313–323
Schuhmacher M, Ponzetto SP (2014) Knowledge-based graph document modeling. In: Proceedings of the 7th ACM international conference on web search and data mining. ACM, pp 543–552
Shalaby W, Zadrozny W (2015) Measuring semantic relatedness using mined semantic analysis. arXiv:1512.03465
Song Y, Roth D (2014) On dataless hierarchical text classification. In: AAAI, pp 1579–1585
Song Y, Roth D (2015) Unsupervised sparse vector densification for short text similarity. In: Proceedings of NAACL
Song Y, Wang H, Wang Z, Li H, Chen W (2011) Short text conceptualization using a probabilistic knowledgebase. In: Proceedings of the twenty-second international joint conference on artificial intelligence-volume volume three. AAAI Press, pp 2330–2336
Song Y, Wang S, Wang H (2015) Open domain short text conceptualization: a generative + descriptive modeling approach. In: IJCAI, pp 3820–3826
Wang Z, Wang H (2016) Understanding short texts. In: The association for computational linguistics (ACL) (Tutorial). https://www.microsoft.com/en-us/research/publication/understanding-short-texts/
Wang Z, Wang H, Hu Z (2014) Head, modifier, and constraint detection in short texts. In: 2014 IEEE 30th international conference on data engineering (ICDE). IEEE, pp 280–291
Wang Z, Zhao K, Wang H, Meng X, Wen J-R (2015) Query understanding through knowledge-based conceptualization. In: IJCAI, pp 3264–3270
Witten I, Milne D (2008) An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proceeding of AAAI workshop on wikipedia and artificial intelligence: an evolving synergy, AAAI Press, Chicago, USA, pp 25–30
Wu W, Li H, Wang H, Zhu KQ (2012) Probase: A probabilistic taxonomy for text understanding. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data. ACM, pp 481–492
Yamada I, Shindo H, Takeda H, Takefuji Y (2016) Joint learning of the embedding of words and entities for named entity disambiguation. arXiv:1601.01343
Kokoska S, Zwillinger D (1999) CRC standard probability and statistics tables and formulae. CRC Press
Acknowledgements
This work was supported by the National Science Foundation (Grant No. 1624035). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Shalaby, W., Zadrozny, W. Learning concept embeddings for dataless classification via efficient bag-of-concepts densification. Knowl Inf Syst 61, 1047–1070 (2019). https://doi.org/10.1007/s10115-018-1321-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-018-1321-8