Document clustering is a fundamental task of text mining, by which efficient organization, navigation, summarization, and retrieval of documents can be achieved. The clustering of documents presents difficult challenges due to the sparsity and the high dimensionality of text data, and to the complex semantics of natural language. Subspace clustering is an extension of traditional clustering that is designed to cap-ture local feature relevance, and to group documents with respect to the features (or words) that matter the most.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 94-105, ACM Press, New York, 1998.
C.C. Aggarwal, J.L. Wolf, P.S. Yu, C. Procopiuc, and J.S. Park. Fast algorithms for projected clustering. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 61-72, ACM Press, New York, 1999.
R. Basili, M. Cammisa, and A. Moschitti. A semantic kernel to classify texts with very few training examples. Informatica, 30:163-172, 2006.
D. Barbará, C. Domeniconi, and N. Kang. Classifying documents without labels. In Proceedings of the Fourth SIAM International Conference on Data Mining, pages 502-506, SIAM, Philadelphia, 2004.
N. Cristianini, J. Shawe-Taylor, and H. Lodhi. Latent semantic kernels. Journal of Intelligent Information Systems, 18(2-3):127-152, 2002.
C. Domeniconi, D. Gunopulos, S. Ma, B. Yan, M. Al-Razgan, and D. Papadopoulos. Locally adaptive metrics for clustering high dimensional data. Data Mining and Knowledge Discovery Journal, 14(1):63-97, 2007.
C. Domeniconi, D. Papadopoulos, D. Gunopulos, and S. Ma. Subspace clustering of high dimensional data. In Proceedings of the Fourth SIAM International Conference on Data Mining, pages 517-521, SIAM, Philadelphia, 2004.
A. Hotho, S. Staab, and G. Stumme. Wordnet improves text document clustering. In Proceedings of the Workshop on Semantic Web, SIGIR-2003, Toronto, Canada, 2003.
L. Jing, L. Zhou, M.K. Ng, and J. Zhexue Huang. Ontology-based distance measure for text clustering. In Proceedings of the Text Mining Workshop, SIAM International Conference on Data Mining, SIAM, Philadelphia, 2006.
N. Kang, C. Domeniconi, and D. Barbar á . Categorization and keyword identification of unlabeled documents. In Proceedings of the Fifth International Conference on Data Mining, pages 677-680, IEEE, Los Alamitos, CA, 2005.
C.-H. Lee and H.-C. Yang. A classifier-based text mining approach for evaluating semantic relatedness using support vector machines. In Proceedings of the IEEE International Conference on Information Technology: Coding and Computing (ITCC’05), pages 128-133, IEEE, Los Alamitos, CA, 2005.
L. Parsons, E. Haque, and H. Liu. Evaluating subspace clustering algorithms. In Proceedings of the Fourth SIAM International Conference on Data Mining, pages 48-56, SIAM, Philadelphia, 2004.
C.M. Procopiuc, M. Jones, P.K. Agarwal, and T.M. Murali. A monte carlo algorithm for fast projective clustering. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 418-427, ACM Press, NewYork, 2002.
G. Siolas and F. d’Alché Buc. Support vector machines based on a semantic kernel for text categorization. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’00), pages 205-209, IEEE, Los Alamitos, CA, 2000.
M. Steinbach, G. Karypis, and V. Kumar. A comparison of document clustering techniques. In Proceedings of the Sixth ACM SIGKDD World Text Mining Conference, Boston, MA, 2000. Available from World Wide Web: citeseer.nj. nec.com/steinbach00comparison.html.
J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, UK, 2004.
P.-N. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining. Pearson Addison Wesley, Boston, 2006.
Y. Wang and J. Hodges. Document clustering with semantic analysis. In Proceedings of the Hawaii International Conference on System Sciences (HICSS’06), IEEE, Los Alamitos, CA, 2006.
S.K. Michael Wong, W. Ziarko, and P.C.N. Wong. Generalized vector space model in information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pages 18-25, ACM Press, New York, 1985.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag London Limited
About this chapter
Cite this chapter
AlSumait, L., Domeniconi, C. (2008). Text Clustering with Local Semantic Kernels. In: Berry, M.W., Castellanos, M. (eds) Survey of Text Mining II. Springer, London. https://doi.org/10.1007/978-1-84800-046-9_5
Download citation
DOI: https://doi.org/10.1007/978-1-84800-046-9_5
Publisher Name: Springer, London
Print ISBN: 978-1-84800-045-2
Online ISBN: 978-1-84800-046-9
eBook Packages: Computer ScienceComputer Science (R0)