PAKDD 2005: Advances in Knowledge Discovery and Data Mining pp 813-819 | Cite as
Using Term Clustering and Supervised Term Affinity Construction to Boost Text Classification
Abstract
The similarity measure is a crucial step in many machine learning problems. The traditional cosine similarity suffers from its inability to represent the semantic relationship of terms. This paper explores the kernel-based similarity measure by using term clustering. An affinity matrix of terms is constructed via the co-occurrence of the terms in both unsupervised and supervised ways. Normalized cut is employed to do the clustering to cut off the noisy edges. Diffusion kernel is adopted to measure the kernel-like similarity of the terms in the same cluster. Experiments demonstrate our methods can give satisfactory results, even when the training set is small.
Preview
Unable to display preview. Download preview PDF.
References
- 1.Ferrer, R., Sole, R.V.: The small world of human language. In: Proceedings of the Royal Society o f London Series B- Biological Sciences, pp. 2261–2265 (2001)Google Scholar
- 2.Kandola, J., Taylor, J.S., Cristianini, N., Davis: Learning Semantic Similarity. In: Proceedings of Neural Information Processing Systems (2002)Google Scholar
- 3.Kondor, R.I., Lafferty, J.: Diffusion kernels on graphs and other discrete structures. In: Proceedings of International Conferecne on Machine Learning (ICML 2002) (2002)Google Scholar
- 4.Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworths (1979)Google Scholar
- 5.Salton, G., Michael, J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)MATHGoogle Scholar
- 6.Shi, J., Malik, J.: Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)CrossRefGoogle Scholar