Learning concept embeddings for dataless classification via efficient bag-of-concepts densification

Shalaby, Walid; Zadrozny, Wlodek

doi:10.1007/s10115-018-1321-8

Learning concept embeddings for dataless classification via efficient bag-of-concepts densification

Regular Paper
Published: 17 January 2019

Volume 61, pages 1047–1070, (2019)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

408 Accesses
3 Citations
4 Altmetric
Explore all metrics

Abstract

Explicit concept space models have proven efficacy for text representation in many natural language and text mining applications. The idea is to embed textual structures into a semantic space of concepts which captures the main ideas, objects, and the characteristics of these structures. The so-called bag-of-concepts (BoC) representation suffers from data sparsity causing low similarity scores between similar texts due to low concept overlap. To address this problem, we propose two neural embedding models to learn continuous concept vectors. Once they are learned, we propose an efficient vector aggregation method to generate fully continuous BoC representations. We evaluate our concept embedding models on three tasks: (1) measuring entity semantic relatedness and ranking where we achieve 1.6% improvement in correlation scores, (2) dataless concept categorization where we achieve state-of-the-art performance and reduce the categorization error rate by more than 5% compared to five prior word and entity embedding models, and (3) dataless document classification where our models outperform the sparse BoC representations. In addition, by exploiting our efficient linear time vector aggregation method, we achieve better accuracy scores with much less concept dimensions compared to previous BoC densification methods which operate in polynomial time and require hundreds of dimensions in the BoC representation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Beyond Bag-of-Concepts: Vectors of Locally Aggregated Concepts

Wikipedia-based hybrid document representation for textual news classification

Article 09 March 2018

BOWL: Bag of Word Clusters Text Representation Using Word Embeddings

Notes

A concept is an expression that denotes an idea, event, or an object.
Unless the term is very common (e.g., a, the, some...etc) and carry no relevant information.
We use the terms continuous, dense, distributed vectors interchangeably to refer to real-valued vectors.
https://concept.research.microsoft.com.
https://www.bing.com/.
\(p(\text{ Arabic } \text{ coffee }\mid \text{ beverage })=0\).
In this paper, we use the terms “concept” and “entity” interchangeably.
This is an illustrative example and doesn’t imply the two concepts will have totally dissimilar vectors.
http://dumps.wikimedia.org/enwiki/.
The weights are the TF-IDF scores from searching Wikipedia.
In this paper, we use concept learning and concept categorization interchangeably.
From a multi-class classification perspective, the accuracy scores would be equivalent to the clustering purity score as reported in Li et al. [16].
Wikification is the process of identifying mentions of concepts and entities in a given free-text and linking them to Wikipedia.
https://en.wikipedia.org/wiki/William_Stryker.

References

Baroni M, Lenci A (2010) Distributional memory: a general framework for corpus-based semantics. Comput Linguist 36(4):673–721
Article Google Scholar
Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems, pp 2787–2795
Ceccarelli D, Lucchese C, Orlando S, Perego R, Trani S (2013) Learning relatedness measures for entity linking. In: Proceedings of the 22nd ACM international conference on information & knowledge management. ACM, pp 139–148
Chang M-W, Ratinov L-A, Roth D, Srikumar V (2008) Importance of semantic representation: dataless classification. AAAI 2:830–835
Google Scholar
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(Aug):2493–2537
MATH Google Scholar
Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using wikipedia-based explicit semantic analysis. IJCAI 7:1606–1611
Google Scholar
Harris ZS (1954) Distributional structure. Word 10(2–3):146–162
Google Scholar
Hoffart J, Seufert S, Nguyen DB, Theobald M, Weikum G (2012) Kore: keyphrase overlap relatedness for entity disambiguation. In: Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, pp 545–554
Hu Z, Huang P, Deng Y, Gao Y, Xing EP (2015) Entity hierarchy embedding. In: Proceedings of the 53rd annual meeting of the association for computational linguistics
Hua W, Wang Z, Wang H, Zheng K, Zhou X (2015) Short text understanding through lexical-semantic analysis. In: 2015 IEEE 31st international conference on data engineering (ICDE). IEEE, pp 495–506
Huang H, Heck L, Ji H (2015) Leveraging deep neural networks and knowledge graphs for entity disambiguation. arXiv:1504.07678
Hulpus I, Prangnawarat N, Hayes C (2015) Path-based semantic relatedness on linked data and its use to word and entity disambiguation. In: International semantic web conference. Springer, pp 442–457
Kim D, Wang H, Oh AH (2013) Context-dependent conceptualization. In: IJCAI, pp 2654–2661
Lang K (1995) Newsweeder: learning to filter netnews. In: Proceedings of the 12th international conference on machine learning, pp 331–339
Li P, Wang H, Zhu KQ, Wang Z, Wu X (2013) Computing term similarity by large probabilistic isa knowledge. In: Proceedings of the 22nd ACM international conference on Conference on information & knowledge management. ACM, pp 1401–1410
Li Y, Zheng R, Tian T, Hu Z, Iyer R, Sycara K (2016) Joint embedding of hierarchical categories and entities for concept categorization and dataless classification. arXiv:1607.07956
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Milne D, Witten IH (2008) Learning to link with wikipedia. In: Proceedings of the 17th ACM conference on information and knowledge management. ACM, pp 509–518
Papadimitriou CH, Steiglitz K (1982) Combinatorial optimization: algorithms and complexity. Courier Corporation, North Chelmsford
MATH Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12(Oct):2825–2830
MathSciNet MATH Google Scholar
Peng H, Song Y, Roth D (2016) Event detection and co-reference with minimal supervision. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 392–402
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the empirical methods in natural language processing (EMNLP 2014), vol 12, pp 1532–1543
Rocchio JJ (1971) Relevance feedback in information retrieval. In: Salton G (ed) SMART retrieval system-experiments in automatic document processing. Prentice Hall, Englewood Cliffs, NJ, pp 313–323
Google Scholar
Schuhmacher M, Ponzetto SP (2014) Knowledge-based graph document modeling. In: Proceedings of the 7th ACM international conference on web search and data mining. ACM, pp 543–552
Shalaby W, Zadrozny W (2015) Measuring semantic relatedness using mined semantic analysis. arXiv:1512.03465
Song Y, Roth D (2014) On dataless hierarchical text classification. In: AAAI, pp 1579–1585
Song Y, Roth D (2015) Unsupervised sparse vector densification for short text similarity. In: Proceedings of NAACL
Song Y, Wang H, Wang Z, Li H, Chen W (2011) Short text conceptualization using a probabilistic knowledgebase. In: Proceedings of the twenty-second international joint conference on artificial intelligence-volume volume three. AAAI Press, pp 2330–2336
Song Y, Wang S, Wang H (2015) Open domain short text conceptualization: a generative + descriptive modeling approach. In: IJCAI, pp 3820–3826
Wang Z, Wang H (2016) Understanding short texts. In: The association for computational linguistics (ACL) (Tutorial). https://www.microsoft.com/en-us/research/publication/understanding-short-texts/
Wang Z, Wang H, Hu Z (2014) Head, modifier, and constraint detection in short texts. In: 2014 IEEE 30th international conference on data engineering (ICDE). IEEE, pp 280–291
Wang Z, Zhao K, Wang H, Meng X, Wen J-R (2015) Query understanding through knowledge-based conceptualization. In: IJCAI, pp 3264–3270
Witten I, Milne D (2008) An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proceeding of AAAI workshop on wikipedia and artificial intelligence: an evolving synergy, AAAI Press, Chicago, USA, pp 25–30
Wu W, Li H, Wang H, Zhu KQ (2012) Probase: A probabilistic taxonomy for text understanding. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data. ACM, pp 481–492
Yamada I, Shindo H, Takeda H, Takefuji Y (2016) Joint learning of the embedding of words and entities for named entity disambiguation. arXiv:1601.01343
Kokoska S, Zwillinger D (1999) CRC standard probability and statistics tables and formulae. CRC Press

Download references

Acknowledgements

This work was supported by the National Science Foundation (Grant No. 1624035). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Author information

Authors and Affiliations

Computer Science Department, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
Walid Shalaby & Wlodek Zadrozny

Authors

Walid Shalaby
View author publications
You can also search for this author in PubMed Google Scholar
Wlodek Zadrozny
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Walid Shalaby.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shalaby, W., Zadrozny, W. Learning concept embeddings for dataless classification via efficient bag-of-concepts densification. Knowl Inf Syst 61, 1047–1070 (2019). https://doi.org/10.1007/s10115-018-1321-8

Download citation

Received: 08 December 2017
Revised: 10 December 2018
Accepted: 17 December 2018
Published: 17 January 2019
Issue Date: 01 November 2019
DOI: https://doi.org/10.1007/s10115-018-1321-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning concept embeddings for dataless classification via efficient bag-of-concepts densification

Abstract

Access this article

Similar content being viewed by others

Beyond Bag-of-Concepts: Vectors of Locally Aggregated Concepts

Wikipedia-based hybrid document representation for textual news classification

BOWL: Bag of Word Clusters Text Representation Using Word Embeddings

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning concept embeddings for dataless classification via efficient bag-of-concepts densification

Abstract

Access this article

Similar content being viewed by others

Beyond Bag-of-Concepts: Vectors of Locally Aggregated Concepts

Wikipedia-based hybrid document representation for textual news classification

BOWL: Bag of Word Clusters Text Representation Using Word Embeddings

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation