Skip to main content

Supervised Semantic Indexing Using Sub-spacing

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8765))

Abstract

Indexing of textual cases is commonly affected by the problem of variation in vocabulary. Semantic indexing is commonly used to address this problem by discovering semantic or conceptual relatedness between individual terms and using this to improve textual case representation. However, representations produced using this approach are not optimal for supervised tasks because standard semantic indexing approaches do not take into account class membership of these textual cases. Supervised semantic indexing approaches e.g. sprinkled Latent Semantic Indexing (SpLSI) and supervised Latent Dirichlet Allocation (sLDA) have been proposed for addressing this limitation. However, both SpLSI and sLDA are computationally expensive and require parameter tuning. In this work, we present an approach called Supervised Sub-Spacing (S3) for supervised semantic indexing of documents. S3 works by creating a separate sub-space for each class within which class-specific term relations and term weights are extracted. The power of S3 lies in its ability to modify document representations such that documents that belong to the same class are made more similar to one another while, at the same time, reducing their similarity to documents of other classes. In addition, S3 is flexible enough to work with a variety of semantic relatedness metrics and yet, powerful enough that it leads to significant improvements in text classification accuracy. We evaluate our approach on a number of supervised datasets and results show classification performance on S3-based representations to significantly outperform both a supervised version of Latent Semantic Indexing (LSI) called Sprinkled LSI, and supervised LDA.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., Zhai, C. (eds.): Mining Text Data. Springer (2012)

    Google Scholar 

  2. Blei, D., McAuliffe, J.: Supervised topic models. In: Platt, J., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems 20, pp. 121–128. MIT Press, Cambridge (2008)

    Google Scholar 

  3. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  4. Chakraborti, S., Lothian, R., Wiratunga, N., Watt, S.: Sprinkling: Supervised latent semantic indexing. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 510–514. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  5. Chakraborti, S., Mukras, R., Lothian, R., Wiratunga, N., Watt, S., Harper, D.: Supervised latent semantic indexing using adaptive sprinkling. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI 2007, pp. 1582–1587 (2007)

    Google Scholar 

  6. Chakraborti, S., Wiratunga, N., Lothian, R., Watt, S.: Acquiring word similarities with higher order association mining. In: Weber, R.O., Richter, M.M. (eds.) ICCBR 2007. LNCS (LNAI), vol. 4626, pp. 61–76. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  7. Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Computational Linguistics 16(1), 22–29 (1990)

    Google Scholar 

  8. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  9. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing, EMNLP 2002, vol. 10, pp. 79–86. Association for Computational Linguistics, Stroudsburg (2002)

    Chapter  Google Scholar 

  10. Rohde, D.L.T., Gonnerman, L.M., Plaut, D.C.: An improved model of semantic similarity based on lexical co-occurence. Communications of the ACM 8, 627–633 (2006)

    Google Scholar 

  11. Sun, J.T., Chen, Z., Zeng, H.J., Lu, Y.C., Shi, C.Y., Ma, W.Y.: Supervised latent semantic indexing for document categorization. In: IEEE International Conference on Data Mining, pp. 535–538 (2004)

    Google Scholar 

  12. Tsatsaronis, G., Panagiotopoulou, V.: A generalized vector space model for text retrieval based on semantic relatedness. In: Proceedings of the Student Research Workshop at EACL 2009, pp. 70–78 (2009)

    Google Scholar 

  13. Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Int. Res. 37, 141–188 (2010)

    MathSciNet  MATH  Google Scholar 

  14. Wang, C., Blei, D., Fei-fei, L.: Simultaneous image classification and annotation. In: Proceedings of Computer Vision and Pattern Recognition (2009)

    Google Scholar 

  15. Weber, R.O., Ashley, K.D., Bruninghaus, S.: Textual case-based reasoning. Knowledge Engineering Review 20(3), 255–260 (2005)

    Article  Google Scholar 

  16. Wong, S.K., Ziarko, W., Raghavan, V.V., Wong, P.C.: On modeling of information retrieval concepts in vector spaces. ACM Trans. Database Syst. 12(2), 299–321 (1987)

    Article  Google Scholar 

  17. Xu, Z.E., Chen, M., Weinberger, K.Q., Sha, F.: An alternative text representation to tf-idf and bag-of-words. In: Proceedings of the 21st ACM Conferece of Information and Knowledge Management, CIKM (2012)

    Google Scholar 

  18. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML 1997, pp. 412–420 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Sani, S., Wiratunga, N., Massie, S., Lothian, R. (2014). Supervised Semantic Indexing Using Sub-spacing. In: Lamontagne, L., Plaza, E. (eds) Case-Based Reasoning Research and Development. ICCBR 2014. Lecture Notes in Computer Science(), vol 8765. Springer, Cham. https://doi.org/10.1007/978-3-319-11209-1_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11209-1_30

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11208-4

  • Online ISBN: 978-3-319-11209-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics