Using Term Clustering and Supervised Term Affinity Construction to Boost Text Classification

  • Chong Wang
  • Wenyuan Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3518)

Abstract

The similarity measure is a crucial step in many machine learning problems. The traditional cosine similarity suffers from its inability to represent the semantic relationship of terms. This paper explores the kernel-based similarity measure by using term clustering. An affinity matrix of terms is constructed via the co-occurrence of the terms in both unsupervised and supervised ways. Normalized cut is employed to do the clustering to cut off the noisy edges. Diffusion kernel is adopted to measure the kernel-like similarity of the terms in the same cluster. Experiments demonstrate our methods can give satisfactory results, even when the training set is small.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ferrer, R., Sole, R.V.: The small world of human language. In: Proceedings of the Royal Society o f London Series B- Biological Sciences, pp. 2261–2265 (2001)Google Scholar
  2. 2.
    Kandola, J., Taylor, J.S., Cristianini, N., Davis: Learning Semantic Similarity. In: Proceedings of Neural Information Processing Systems (2002)Google Scholar
  3. 3.
    Kondor, R.I., Lafferty, J.: Diffusion kernels on graphs and other discrete structures. In: Proceedings of International Conferecne on Machine Learning (ICML 2002) (2002)Google Scholar
  4. 4.
    Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworths (1979)Google Scholar
  5. 5.
    Salton, G., Michael, J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)MATHGoogle Scholar
  6. 6.
    Shi, J., Malik, J.: Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Chong Wang
    • 1
  • Wenyuan Wang
    • 1
  1. 1.Department of AutomationTsinghua UniversityBeijingP.R.China

Personalised recommendations