Skip to main content

Text Clustering with Local Semantic Kernels

  • Chapter
Survey of Text Mining II

Document clustering is a fundamental task of text mining, by which efficient organization, navigation, summarization, and retrieval of documents can be achieved. The clustering of documents presents difficult challenges due to the sparsity and the high dimensionality of text data, and to the complex semantics of natural language. Subspace clustering is an extension of traditional clustering that is designed to cap-ture local feature relevance, and to group documents with respect to the features (or words) that matter the most.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 94-105, ACM Press, New York, 1998.

    Google Scholar 

  • C.C. Aggarwal, J.L. Wolf, P.S. Yu, C. Procopiuc, and J.S. Park. Fast algorithms for projected clustering. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 61-72, ACM Press, New York, 1999.

    Google Scholar 

  • R. Basili, M. Cammisa, and A. Moschitti. A semantic kernel to classify texts with very few training examples. Informatica, 30:163-172, 2006.

    MATH  Google Scholar 

  • D. Barbará, C. Domeniconi, and N. Kang. Classifying documents without labels. In Proceedings of the Fourth SIAM International Conference on Data Mining, pages 502-506, SIAM, Philadelphia, 2004.

    Google Scholar 

  • N. Cristianini, J. Shawe-Taylor, and H. Lodhi. Latent semantic kernels. Journal of Intelligent Information Systems, 18(2-3):127-152, 2002.

    Article  Google Scholar 

  • C. Domeniconi, D. Gunopulos, S. Ma, B. Yan, M. Al-Razgan, and D. Papadopoulos. Locally adaptive metrics for clustering high dimensional data. Data Mining and Knowledge Discovery Journal, 14(1):63-97, 2007.

    Article  MathSciNet  Google Scholar 

  • C. Domeniconi, D. Papadopoulos, D. Gunopulos, and S. Ma. Subspace clustering of high dimensional data. In Proceedings of the Fourth SIAM International Conference on Data Mining, pages 517-521, SIAM, Philadelphia, 2004.

    Google Scholar 

  • A. Hotho, S. Staab, and G. Stumme. Wordnet improves text document clustering. In Proceedings of the Workshop on Semantic Web, SIGIR-2003, Toronto, Canada, 2003.

    Google Scholar 

  • L. Jing, L. Zhou, M.K. Ng, and J. Zhexue Huang. Ontology-based distance measure for text clustering. In Proceedings of the Text Mining Workshop, SIAM International Conference on Data Mining, SIAM, Philadelphia, 2006.

    Google Scholar 

  • N. Kang, C. Domeniconi, and D. Barbar á . Categorization and keyword identification of unlabeled documents. In Proceedings of the Fifth International Conference on Data Mining, pages 677-680, IEEE, Los Alamitos, CA, 2005.

    Chapter  Google Scholar 

  • C.-H. Lee and H.-C. Yang. A classifier-based text mining approach for evaluating semantic relatedness using support vector machines. In Proceedings of the IEEE International Conference on Information Technology: Coding and Computing (ITCC’05), pages 128-133, IEEE, Los Alamitos, CA, 2005.

    Google Scholar 

  • L. Parsons, E. Haque, and H. Liu. Evaluating subspace clustering algorithms. In Proceedings of the Fourth SIAM International Conference on Data Mining, pages 48-56, SIAM, Philadelphia, 2004.

    Google Scholar 

  • C.M. Procopiuc, M. Jones, P.K. Agarwal, and T.M. Murali. A monte carlo algorithm for fast projective clustering. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 418-427, ACM Press, NewYork, 2002.

    Google Scholar 

  • G. Siolas and F. d’Alché Buc. Support vector machines based on a semantic kernel for text categorization. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’00), pages 205-209, IEEE, Los Alamitos, CA, 2000.

    Google Scholar 

  • M. Steinbach, G. Karypis, and V. Kumar. A comparison of document clustering techniques. In Proceedings of the Sixth ACM SIGKDD World Text Mining Conference, Boston, MA, 2000. Available from World Wide Web: citeseer.nj. nec.com/steinbach00comparison.html.

    Google Scholar 

  • J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, UK, 2004.

    Google Scholar 

  • P.-N. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining. Pearson Addison Wesley, Boston, 2006.

    Google Scholar 

  • Y. Wang and J. Hodges. Document clustering with semantic analysis. In Proceedings of the Hawaii International Conference on System Sciences (HICSS’06), IEEE, Los Alamitos, CA, 2006.

    Google Scholar 

  • S.K. Michael Wong, W. Ziarko, and P.C.N. Wong. Generalized vector space model in information retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pages 18-25, ACM Press, New York, 1985.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag London Limited

About this chapter

Cite this chapter

AlSumait, L., Domeniconi, C. (2008). Text Clustering with Local Semantic Kernels. In: Berry, M.W., Castellanos, M. (eds) Survey of Text Mining II. Springer, London. https://doi.org/10.1007/978-1-84800-046-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-84800-046-9_5

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84800-045-2

  • Online ISBN: 978-1-84800-046-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics