Supervised Semantic Indexing Using Sub-spacing

Sani, Sadiq; Wiratunga, Nirmalie; Massie, Stewart; Lothian, Robert

doi:10.1007/978-3-319-11209-1_30

Supervised Semantic Indexing Using Sub-spacing

Sadiq Sani²¹,
Nirmalie Wiratunga²¹,
Stewart Massie²¹ &
…
Robert Lothian²¹

Conference paper

1201 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8765))

Abstract

Indexing of textual cases is commonly affected by the problem of variation in vocabulary. Semantic indexing is commonly used to address this problem by discovering semantic or conceptual relatedness between individual terms and using this to improve textual case representation. However, representations produced using this approach are not optimal for supervised tasks because standard semantic indexing approaches do not take into account class membership of these textual cases. Supervised semantic indexing approaches e.g. sprinkled Latent Semantic Indexing (SpLSI) and supervised Latent Dirichlet Allocation (sLDA) have been proposed for addressing this limitation. However, both SpLSI and sLDA are computationally expensive and require parameter tuning. In this work, we present an approach called Supervised Sub-Spacing (S3) for supervised semantic indexing of documents. S3 works by creating a separate sub-space for each class within which class-specific term relations and term weights are extracted. The power of S3 lies in its ability to modify document representations such that documents that belong to the same class are made more similar to one another while, at the same time, reducing their similarity to documents of other classes. In addition, S3 is flexible enough to work with a variety of semantic relatedness metrics and yet, powerful enough that it leads to significant improvements in text classification accuracy. We evaluate our approach on a number of supervised datasets and results show classification performance on S3-based representations to significantly outperform both a supervised version of Latent Semantic Indexing (LSI) called Sprinkled LSI, and supervised LDA.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C.C., Zhai, C. (eds.): Mining Text Data. Springer (2012)
Google Scholar
Blei, D., McAuliffe, J.: Supervised topic models. In: Platt, J., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems 20, pp. 121–128. MIT Press, Cambridge (2008)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Chakraborti, S., Lothian, R., Wiratunga, N., Watt, S.: Sprinkling: Supervised latent semantic indexing. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 510–514. Springer, Heidelberg (2006)
Chapter Google Scholar
Chakraborti, S., Mukras, R., Lothian, R., Wiratunga, N., Watt, S., Harper, D.: Supervised latent semantic indexing using adaptive sprinkling. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI 2007, pp. 1582–1587 (2007)
Google Scholar
Chakraborti, S., Wiratunga, N., Lothian, R., Watt, S.: Acquiring word similarities with higher order association mining. In: Weber, R.O., Richter, M.M. (eds.) ICCBR 2007. LNCS (LNAI), vol. 4626, pp. 61–76. Springer, Heidelberg (2007)
Chapter Google Scholar
Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Computational Linguistics 16(1), 22–29 (1990)
Google Scholar
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)
Article Google Scholar
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing, EMNLP 2002, vol. 10, pp. 79–86. Association for Computational Linguistics, Stroudsburg (2002)
Chapter Google Scholar
Rohde, D.L.T., Gonnerman, L.M., Plaut, D.C.: An improved model of semantic similarity based on lexical co-occurence. Communications of the ACM 8, 627–633 (2006)
Google Scholar
Sun, J.T., Chen, Z., Zeng, H.J., Lu, Y.C., Shi, C.Y., Ma, W.Y.: Supervised latent semantic indexing for document categorization. In: IEEE International Conference on Data Mining, pp. 535–538 (2004)
Google Scholar
Tsatsaronis, G., Panagiotopoulou, V.: A generalized vector space model for text retrieval based on semantic relatedness. In: Proceedings of the Student Research Workshop at EACL 2009, pp. 70–78 (2009)
Google Scholar
Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Int. Res. 37, 141–188 (2010)
MathSciNet MATH Google Scholar
Wang, C., Blei, D., Fei-fei, L.: Simultaneous image classification and annotation. In: Proceedings of Computer Vision and Pattern Recognition (2009)
Google Scholar
Weber, R.O., Ashley, K.D., Bruninghaus, S.: Textual case-based reasoning. Knowledge Engineering Review 20(3), 255–260 (2005)
Article Google Scholar
Wong, S.K., Ziarko, W., Raghavan, V.V., Wong, P.C.: On modeling of information retrieval concepts in vector spaces. ACM Trans. Database Syst. 12(2), 299–321 (1987)
Article Google Scholar
Xu, Z.E., Chen, M., Weinberger, K.Q., Sha, F.: An alternative text representation to tf-idf and bag-of-words. In: Proceedings of the 21st ACM Conferece of Information and Knowledge Management, CIKM (2012)
Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML 1997, pp. 412–420 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

IDEAS Research Institute, Robert Gordon University, Aberdeen, AB10 7GJ, Scotland, UK
Sadiq Sani, Nirmalie Wiratunga, Stewart Massie & Robert Lothian

Authors

Sadiq Sani
View author publications
You can also search for this author in PubMed Google Scholar
Nirmalie Wiratunga
View author publications
You can also search for this author in PubMed Google Scholar
Stewart Massie
View author publications
You can also search for this author in PubMed Google Scholar
Robert Lothian
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Software Engineering, Université Laval, G1K 7P4, Québec, Canada
Luc Lamontagne
IIIA, Artificial Intelligence Research Institute CSIC, Spanish Council for Scientific Research Campus UAB, 08193, Bellaterra, Catalonia, Spain
Enric Plaza

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sani, S., Wiratunga, N., Massie, S., Lothian, R. (2014). Supervised Semantic Indexing Using Sub-spacing. In: Lamontagne, L., Plaza, E. (eds) Case-Based Reasoning Research and Development. ICCBR 2014. Lecture Notes in Computer Science(), vol 8765. Springer, Cham. https://doi.org/10.1007/978-3-319-11209-1_30

Download citation

DOI: https://doi.org/10.1007/978-3-319-11209-1_30
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11208-4
Online ISBN: 978-3-319-11209-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics