Advertisement

Text Clustering with Local Semantic Kernels

  • Loulwah AlSumait
  • Carlotta Domeniconi
Chapter

Document clustering is a fundamental task of text mining, by which efficient organization, navigation, summarization, and retrieval of documents can be achieved. The clustering of documents presents difficult challenges due to the sparsity and the high dimensionality of text data, and to the complex semantics of natural language. Subspace clustering is an extension of traditional clustering that is designed to cap-ture local feature relevance, and to group documents with respect to the features (or words) that matter the most.

Keywords

Kernel Method Frequent Itemsets Support Level Semantic Distance Subspace Cluster 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 94-105, ACM Press, New York, 1998.Google Scholar
  2. C.C. Aggarwal, J.L. Wolf, P.S. Yu, C. Procopiuc, and J.S. Park. Fast algorithms for projected clustering. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 61-72, ACM Press, New York, 1999.Google Scholar
  3. R. Basili, M. Cammisa, and A. Moschitti. A semantic kernel to classify texts with very few training examples. Informatica, 30:163-172, 2006.zbMATHGoogle Scholar
  4. D. Barbará, C. Domeniconi, and N. Kang. Classifying documents without labels. In Proceedings of the Fourth SIAM International Conference on Data Mining, pages 502-506, SIAM, Philadelphia, 2004.Google Scholar
  5. N. Cristianini, J. Shawe-Taylor, and H. Lodhi. Latent semantic kernels. Journal of Intelligent Information Systems, 18(2-3):127-152, 2002.CrossRefGoogle Scholar
  6. C. Domeniconi, D. Gunopulos, S. Ma, B. Yan, M. Al-Razgan, and D. Papadopoulos. Locally adaptive metrics for clustering high dimensional data. Data Mining and Knowledge Discovery Journal, 14(1):63-97, 2007.CrossRefMathSciNetGoogle Scholar
  7. C. Domeniconi, D. Papadopoulos, D. Gunopulos, and S. Ma. Subspace clustering of high dimensional data. In Proceedings of the Fourth SIAM International Conference on Data Mining, pages 517-521, SIAM, Philadelphia, 2004.Google Scholar
  8. A. Hotho, S. Staab, and G. Stumme. Wordnet improves text document clustering. In Proceedings of the Workshop on Semantic Web, SIGIR-2003, Toronto, Canada, 2003.Google Scholar
  9. L. Jing, L. Zhou, M.K. Ng, and J. Zhexue Huang. Ontology-based distance measure for text clustering. In Proceedings of the Text Mining Workshop, SIAM International Conference on Data Mining, SIAM, Philadelphia, 2006.Google Scholar
  10. N. Kang, C. Domeniconi, and D. Barbar á . Categorization and keyword identification of unlabeled documents. In Proceedings of the Fifth International Conference on Data Mining, pages 677-680, IEEE, Los Alamitos, CA, 2005.CrossRefGoogle Scholar
  11. C.-H. Lee and H.-C. Yang. A classifier-based text mining approach for evaluating semantic relatedness using support vector machines. In Proceedings of the IEEE International Conference on Information Technology: Coding and Computing (ITCC’05), pages 128-133, IEEE, Los Alamitos, CA, 2005.Google Scholar
  12. L. Parsons, E. Haque, and H. Liu. Evaluating subspace clustering algorithms. In Proceedings of the Fourth SIAM International Conference on Data Mining, pages 48-56, SIAM, Philadelphia, 2004.Google Scholar
  13. C.M. Procopiuc, M. Jones, P.K. Agarwal, and T.M. Murali. A monte carlo algorithm for fast projective clustering. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 418-427, ACM Press, NewYork, 2002.Google Scholar
  14. G. Siolas and F. d’Alché Buc. Support vector machines based on a semantic kernel for text categorization. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’00), pages 205-209, IEEE, Los Alamitos, CA, 2000.Google Scholar
  15. M. Steinbach, G. Karypis, and V. Kumar. A comparison of document clustering techniques. In Proceedings of the Sixth ACM SIGKDD World Text Mining Conference, Boston, MA, 2000. Available from World Wide Web: citeseer.nj. nec.com/steinbach00comparison.html.Google Scholar
  16. J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, UK, 2004.Google Scholar
  17. P.-N. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining. Pearson Addison Wesley, Boston, 2006.Google Scholar
  18. Y. Wang and J. Hodges. Document clustering with semantic analysis. In Proceedings of the Hawaii International Conference on System Sciences (HICSS’06), IEEE, Los Alamitos, CA, 2006.Google Scholar
  19. S.K. Michael Wong, W. Ziarko, and P.C.N. Wong. Generalized vector space model in information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pages 18-25, ACM Press, New York, 1985.Google Scholar

Copyright information

© Springer-Verlag London Limited 2008

Authors and Affiliations

  • Loulwah AlSumait
    • 1
  • Carlotta Domeniconi
    • 1
  1. 1.Department of Computer ScienceGeorge Mason UniversityFairfax

Personalised recommendations