BIT Numerical Mathematics

, Volume 43, Issue 2, pp 427–448

Lower Dimensional Representation of Text Data Based on Centroids and Least Squares

  • Haesun Park
  • Moongu Jeon
  • J. Ben Rosen
Article

DOI: 10.1023/A:1026039313770

Cite this article as:
Park, H., Jeon, M. & Rosen, J.B. BIT Numerical Mathematics (2003) 43: 427. doi:10.1023/A:1026039313770

Abstract

Dimension reduction in today's vector space based information retrieval system is essential for improving computational efficiency in handling massive amounts of data. A mathematical framework for lower dimensional representation of text data in vector space based information retrieval is proposed using minimization and a matrix rank reduction formula. We illustrate how the commonly used Latent Semantic Indexing based on the Singular Value Decomposition (LSI/SVD) can be derived as a method for dimension reduction from our mathematical framework. Then two new methods for dimension reduction based on the centroids of data clusters are proposed and shown to be more efficient and effective than LSI/SVD when we have a priori information on the cluster structure of the data. Several advantages of the new methods in terms of computational efficiency and data representation in the reduced space, as well as their mathematical properties are discussed.

Experimental results are presented to illustrate the effectiveness of our methods on certain classification problems in a reduced dimensional space. The results indicate that for a successful lower dimensional representation of the data, it is important to incorporate a priori knowledge in the dimension reduction algorithms.

Dimension reductioncentroidsleast squaresrank reducing decompositionclassificationfeature extraction

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Haesun Park
    • 1
  • Moongu Jeon
    • 2
  • J. Ben Rosen
    • 3
    • 4
  1. 1.Department of Computer Science and EngineeringUniversity of MinnesotaMinneapolisU.S.A.
  2. 2.Department of Computer Science and EngineeringUniv. of California, Santa BarbaraSanta BarbaraU.S.A.
  3. 3.Department of Computer Science and EngineeringUniversity of MinnesotaMinneapolisU.S.A.
  4. 4.University of CaliforniaSan Diego, La JollaU.S.A.