Text Clustering with Local Semantic Kernels

AlSumait, Loulwah; Domeniconi, Carlotta

doi:10.1007/978-1-84800-046-9_5

Loulwah AlSumait³ &
Carlotta Domeniconi³

2200 Accesses
15 Citations

Document clustering is a fundamental task of text mining, by which efficient organization, navigation, summarization, and retrieval of documents can be achieved. The clustering of documents presents difficult challenges due to the sparsity and the high dimensionality of text data, and to the complex semantics of natural language. Subspace clustering is an extension of traditional clustering that is designed to cap-ture local feature relevance, and to group documents with respect to the features (or words) that matter the most.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 94-105, ACM Press, New York, 1998.
Google Scholar
C.C. Aggarwal, J.L. Wolf, P.S. Yu, C. Procopiuc, and J.S. Park. Fast algorithms for projected clustering. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 61-72, ACM Press, New York, 1999.
Google Scholar
R. Basili, M. Cammisa, and A. Moschitti. A semantic kernel to classify texts with very few training examples. Informatica, 30:163-172, 2006.
MATH Google Scholar
D. Barbará, C. Domeniconi, and N. Kang. Classifying documents without labels. In Proceedings of the Fourth SIAM International Conference on Data Mining, pages 502-506, SIAM, Philadelphia, 2004.
Google Scholar
N. Cristianini, J. Shawe-Taylor, and H. Lodhi. Latent semantic kernels. Journal of Intelligent Information Systems, 18(2-3):127-152, 2002.
Article Google Scholar
C. Domeniconi, D. Gunopulos, S. Ma, B. Yan, M. Al-Razgan, and D. Papadopoulos. Locally adaptive metrics for clustering high dimensional data. Data Mining and Knowledge Discovery Journal, 14(1):63-97, 2007.
Article MathSciNet Google Scholar
C. Domeniconi, D. Papadopoulos, D. Gunopulos, and S. Ma. Subspace clustering of high dimensional data. In Proceedings of the Fourth SIAM International Conference on Data Mining, pages 517-521, SIAM, Philadelphia, 2004.
Google Scholar
A. Hotho, S. Staab, and G. Stumme. Wordnet improves text document clustering. In Proceedings of the Workshop on Semantic Web, SIGIR-2003, Toronto, Canada, 2003.
Google Scholar
L. Jing, L. Zhou, M.K. Ng, and J. Zhexue Huang. Ontology-based distance measure for text clustering. In Proceedings of the Text Mining Workshop, SIAM International Conference on Data Mining, SIAM, Philadelphia, 2006.
Google Scholar
N. Kang, C. Domeniconi, and D. Barbar á . Categorization and keyword identification of unlabeled documents. In Proceedings of the Fifth International Conference on Data Mining, pages 677-680, IEEE, Los Alamitos, CA, 2005.
Chapter Google Scholar
C.-H. Lee and H.-C. Yang. A classifier-based text mining approach for evaluating semantic relatedness using support vector machines. In Proceedings of the IEEE International Conference on Information Technology: Coding and Computing (ITCC’05), pages 128-133, IEEE, Los Alamitos, CA, 2005.
Google Scholar
L. Parsons, E. Haque, and H. Liu. Evaluating subspace clustering algorithms. In Proceedings of the Fourth SIAM International Conference on Data Mining, pages 48-56, SIAM, Philadelphia, 2004.
Google Scholar
C.M. Procopiuc, M. Jones, P.K. Agarwal, and T.M. Murali. A monte carlo algorithm for fast projective clustering. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 418-427, ACM Press, NewYork, 2002.
Google Scholar
G. Siolas and F. d’Alché Buc. Support vector machines based on a semantic kernel for text categorization. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’00), pages 205-209, IEEE, Los Alamitos, CA, 2000.
Google Scholar
M. Steinbach, G. Karypis, and V. Kumar. A comparison of document clustering techniques. In Proceedings of the Sixth ACM SIGKDD World Text Mining Conference, Boston, MA, 2000. Available from World Wide Web: citeseer.nj. nec.com/steinbach00comparison.html.
Google Scholar
J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, UK, 2004.
Google Scholar
P.-N. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining. Pearson Addison Wesley, Boston, 2006.
Google Scholar
Y. Wang and J. Hodges. Document clustering with semantic analysis. In Proceedings of the Hawaii International Conference on System Sciences (HICSS’06), IEEE, Los Alamitos, CA, 2006.
Google Scholar
S.K. Michael Wong, W. Ziarko, and P.C.N. Wong. Generalized vector space model in information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pages 18-25, ACM Press, New York, 1985.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, George Mason University, 4400 University Drive MSN 4A4, Fairfax, VA 22030
Loulwah AlSumait & Carlotta Domeniconi

Authors

Loulwah AlSumait
View author publications
You can also search for this author in PubMed Google Scholar
Carlotta Domeniconi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Tennessee, USA
Michael W. Berry
Hewlett-Packard Laboratories, Palo Alto, California, USA
Malu Castellanos

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

AlSumait, L., Domeniconi, C. (2008). Text Clustering with Local Semantic Kernels. In: Berry, M.W., Castellanos, M. (eds) Survey of Text Mining II. Springer, London. https://doi.org/10.1007/978-1-84800-046-9_5

Download citation

DOI: https://doi.org/10.1007/978-1-84800-046-9_5
Publisher Name: Springer, London
Print ISBN: 978-1-84800-045-2
Online ISBN: 978-1-84800-046-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics