Sprinkling: Supervised Latent Semantic Indexing

  • Sutanu Chakraborti
  • Robert Lothian
  • Nirmalie Wiratunga
  • Stuart Watt
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3936)

Abstract

Latent Semantic Indexing (LSI) is an established dimensionality reduction technique for Information Retrieval applications. However, LSI generated dimensions are not optimal in a classification setting, since LSI fails to exploit class labels of training documents. We propose an approach that uses class information to influence LSI dimensions whereby class labels of training documents are endoded as new terms, which are appended to the documents. When LSI is carried out on the augmented term-document matrix, terms pertaining to the same class are pulled closer to each other. Evaluation over experimental data reveals significant improvement in classification accuracy over LSI. The results also compare favourably with naive Support Vector Machines.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)CrossRefGoogle Scholar
  2. 2.
    Zelikovitz, S., Hirsh, H.: Using LSI for Text Classification in the Presence of Background Text. In: International Conference on Information and Knowledge Management 2001, pp. 113–118 (2001)Google Scholar
  3. 3.
    Gee, K.R.: Using Latent Semantic Indexing to Filter Spam. In: Proc. of the 2003 ACM Symposium on Applied Computing, pp. 460–464 (2003)Google Scholar
  4. 4.
    Mitchell, T.: Machine Learning. Mc Graw Hill International, New York (1997)MATHGoogle Scholar
  5. 5.
    Wang, M.W., Nie, J.Y.: A Latent Semantic Structure Model for Text Classification. In: 26th ACM-SIGIR Workshop on Mathematical/Formal methods in Information Retrieval (2003)Google Scholar
  6. 6.
    Wiener, E., Pedersen, J.O., Weigend, A.S.: A Neural Network Approach To Topic Spotting. In: Proc. of Symposium on Document Analysis and Information Retrieval-1995, pp. 317–332 (1995)Google Scholar
  7. 7.
    Sakkis, G., Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Spyropoulos, C.D., Stamatopoulos, P.: A Memory-based Approach to Anti-Spam Filtering for Mailing Lists. Information Retrieval 6, 49–73 (2003)CrossRefGoogle Scholar
  8. 8.
    Delany, S.J., Cunningham, P.: An Analysis of Case-base Editing in a Spam Filtering System. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 128–141. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  9. 9.
    Joachims, T.: Making Large-Scale SVM Learning Practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT-Press, Cambridge (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Sutanu Chakraborti
    • 1
  • Robert Lothian
    • 1
  • Nirmalie Wiratunga
    • 1
  • Stuart Watt
    • 1
  1. 1.School of ComputingThe Robert Gordon UniversityAberdeen, ScotlandUK

Personalised recommendations