Should Term-Relatedness Be Used in Text Representation?

Sani, Sadiq; Wiratunga, Nirmalie; Massie, Stewart; Lothian, Robert

doi:10.1007/978-3-642-39056-2_21

Should Term-Relatedness Be Used in Text Representation?

Sadiq Sani²¹,
Nirmalie Wiratunga²¹,
Stewart Massie²¹ &
…
Robert Lothian²¹

Conference paper

1175 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7969))

Abstract

The variation in natural language vocabulary remains a challenge for text representation as the same idea can be expressed in many different ways. Thus document representations often rely on generalisation to map low-level lexical expressions to higher level concepts in order to capture the inherent semantics of the documents. Term-relatedness measures are often used to generalise document representations by capturing semantic relationships between terms. In this work we conduct a comparative study of common term-relatedness metrics on 43 datasets and discover that generalisation is not always beneficial. Hence, the ability to predict whether or not to generalise the indexing vocabulary of a dataset is important given the computation overhead of generalisation. Accordingly, we present a case-based approach that predicts, given a text dataset, whether or not using generalisation will improve text retrieval performance. The evaluation shows that our approach is able to correctly predict datasets that are likely to benefit from generalisation with over 90% accuracy.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bensusan, H., Giraud-Carrier, C., Kennedy, C.: A higher-order approach to meta-learning. In: Proceedings of the ECML 2000 Workshop on Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination, pp. 109–117 (2000)
Google Scholar
Brants, T., Inc, G.: Natural language processing in information retrieval. In: Proceedings of the 14th Meeting of Computational Linguistics in the Netherlands, pp. 1–13 (2004)
Google Scholar
Chakraborti, S., Wiratunga, N., Lothian, R., Watt, S.: Acquiring word similarities with higher order association mining. In: Weber, R.O., Richter, M.M. (eds.) ICCBR 2007. LNCS (LNAI), vol. 4626, pp. 61–76. Springer, Heidelberg (2007)
Chapter Google Scholar
Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Computational Linguistics 16(1), 22–29 (1990)
Google Scholar
Craw, S., Wiratunga, N., Rowe, R.C.: Learning adaptation knowledge to improve case-based reasoning. Artificial Intelligence 170(16-17), 1175–1192 (2006)
Article MathSciNet MATH Google Scholar
Cummins, L., Bridge, D.: On dataset complexity for case base maintenance. In: Ram, A., Wiratunga, N. (eds.) ICCBR 2011. LNCS, vol. 6880, pp. 47–61. Springer, Heidelberg (2011)
Chapter Google Scholar
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Article Google Scholar
Gabrilovich, E., Markovitch, S.: Wikipedia-based semantic interpretation for natural language processing. Journal of Artificial Intelligence Research 34, 443–498 (2009)
MATH Google Scholar
Lindner, G., Studer, R.: Ast: Support for algorithm selection with a cbr approach. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 418–423. Springer, Heidelberg (1999)
Chapter Google Scholar
Massie, S., Craw, S., Wiratunga, N.: Complexity profiling for informed case-base editing. In: Roth-Berghofer, T.R., Göker, M.H., Güvenir, H.A. (eds.) ECCBR 2006. LNCS (LNAI), vol. 4106, pp. 325–339. Springer, Heidelberg (2006)
Chapter Google Scholar
Ohana, B., Delany, S., Tierney, B.: A case-based approach to cross domain sentiment classification. In: Agudo, B.D., Watson, I. (eds.) ICCBR 2012. LNCS, vol. 7466, pp. 284–296. Springer, Heidelberg (2012)
Chapter Google Scholar
Sani, S., Wiratunga, N., Massie, S., Lothian, R.: Term similarity and weighting framework for text representation. In: Ram, A., Wiratunga, N. (eds.) ICCBR 2011. LNCS, vol. 6880, pp. 304–318. Springer, Heidelberg (2011)
Chapter Google Scholar
Tsatsaronis, G., Panagiotopoulou, V.: A generalized vector space model for text retrieval based on semantic relatedness. In: Proceedings of the Student Research Workshop at EACL 2009, pp. 70–78 (2009)
Google Scholar
Wettschereck, D., Aha, D.W., Mohri, T.: A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artificial Intelligence Review 11(1-5), 273–314 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, Robert Gordon University, Aberdeen, AB25 1HG, Scotland, UK
Sadiq Sani, Nirmalie Wiratunga, Stewart Massie & Robert Lothian

Authors

Sadiq Sani
View author publications
You can also search for this author in PubMed Google Scholar
Nirmalie Wiratunga
View author publications
You can also search for this author in PubMed Google Scholar
Stewart Massie
View author publications
You can also search for this author in PubMed Google Scholar
Robert Lothian
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing, Dublin Institute of Technology, Kevin Street, 8, Dublin, Ireland
Sarah Jane Delany
Department of Computer Science, Drexel University, 3141 Chestnut Street, 19104, Philadelphia, PA, USA
Santiago Ontañón

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sani, S., Wiratunga, N., Massie, S., Lothian, R. (2013). Should Term-Relatedness Be Used in Text Representation?. In: Delany, S.J., Ontañón, S. (eds) Case-Based Reasoning Research and Development. ICCBR 2013. Lecture Notes in Computer Science(), vol 7969. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39056-2_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-39056-2_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39055-5
Online ISBN: 978-3-642-39056-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics