Skip to main content

Text Clustering and Text Summarization on the Use of Side Information

  • Conference paper
  • First Online:
Innovations in Computer Science and Engineering

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 413))

Abstract

Clustering algorithm order information focuses on persuading social events concentrated around their similarity to abuse important data from data focuses. The end place of clustering these properties (text) has huge measure of information. It is difficult to measure relative data in light of the way in which the rate of the information is not clear. In such cases, it can be risky to partner side-data into the mining technique, since it can either build the nature of the representation for the mining system, then again add noise to the methodology. In various content mining applications, side-information is accessible nearby the content reports. Such text documents may be of a few sorts, for instance, record provenance information, the connections in the file, user access conduct from web logs, or other non-text based characteristics which are embedded into the content record. Such qualities may contain a massive measure of data for clustering purposes in the proposed system merge summarization methods. While executing the COATES estimation we used summarization system which is the union of duplicated clusters what’s more, give last summary. COATES cluster algorithms we get the clusters on the establishment of substance what’s more, auxiliary attributes. So in this project, an algorithm is designed, in order to give an effective clustering algorithm. Two algorithms are used in this project for clustering. In this paper COATES algorithm (this algorithm combines classical partitioning algorithms with probabilistic models) is used and the proposed system implements hierarchical algorithm which is compared with COATES algorithm and also implements the merging and summary generation algorithm which produces the summary or pure data for the user’s convenience.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. D. Cutting, D. Karger, J. Pedersen, and J. Turkey, “Scatter/Gather: A cluster-based approach to browsing large document collections,” in Proc. ACM SIGIR Conf., New York, NY, USA, 1992, pp. 318–329.

    Google Scholar 

  2. R. Ng and J. Han, “Efficient and effective clustering methods for spatial data mining,” in Proc. VLDB Conf., San Francisco, CA, USA, 1994, pp. 144–155.

    Google Scholar 

  3. C. Aggarwal and P. S. Yu, “A framework for clustering massive text and categorical data streams,” in Proc. SIAM Conf. Data Mining, 2006, pp. 477–481.J.

    Google Scholar 

  4. S. Guha, R. Rastogi, and K. Shim, “ROCK: A robust clustering algorithm for categorical attributes,” Inf. Syst., vol. 25, no. 5, pp. 345–366, 2000.

    Google Scholar 

  5. D. Cutting, D. Karger, J. Pedersen, and J. Tukey, “Scatter/Gather: A cluster-based approach to browsing large document collections,” in Proc. ACM SIGIR Conf., New York, NY, USA, 1992, pp. 318–329.

    Google Scholar 

  6. W. Xu, X. Liu, and Y. Gong, “Document clustering based on nonnegative Matrix factorization,” in Proc. ACM SIGIR Conf., New York, NY, USA, 2003, pp. 267–273.

    Google Scholar 

  7. C. C. Aggarwal and P. S. Yu, “A framework for clustering massive text and categorical data streams,” in Proc. SIAM Conf. Data Mining, 2006, pp. 477–481.

    Google Scholar 

  8. I. Dillon, “Co-clustering documents and words using bipartite spectral Graph partitioning,” in Proc. ACM KDD Conf., New York, NY, USA, 2001, pp. 269–274.

    Google Scholar 

  9. Q. He, K. Chang, E.-P. Lim, and J. Zhang, “Bursty feature representation for clustering text streams,” in Proc. SDM Conf., 2007, pp. 491–496.

    Google Scholar 

  10. T. Liu, S. Liu, Z. Chen, and W.-Y. Ma, “An evaluation of feature selection for text is clustering,” in Proc. ICML Conf., Washington, DC, USA, 2003, pp. 488–495.

    Google Scholar 

  11. A. Banerjee and S. Basu, “Topic models over text streams: A study of batch and online unsupervised learning,” in Proc. SDM Conf., 2007, pp. 437–442.

    Google Scholar 

  12. S. Zhong, “Efficient streaming text clustering,” Neural Net w., vol. 18, no. 5-6, pp. 790–798, 2005.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shilpa S. Raut .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media Singapore

About this paper

Cite this paper

Raut, S.S., Maral, V.B. (2016). Text Clustering and Text Summarization on the Use of Side Information. In: Saini, H., Sayal, R., Rawat, S. (eds) Innovations in Computer Science and Engineering. Advances in Intelligent Systems and Computing, vol 413. Springer, Singapore. https://doi.org/10.1007/978-981-10-0419-3_16

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-0419-3_16

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-0417-9

  • Online ISBN: 978-981-10-0419-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics