Text Clustering and Text Summarization on the Use of Side Information

Raut, Shilpa S.; Maral, V. B.

doi:10.1007/978-981-10-0419-3_16

Shilpa S. Raut⁵ &
V. B. Maral⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 413))

Abstract

Clustering algorithm order information focuses on persuading social events concentrated around their similarity to abuse important data from data focuses. The end place of clustering these properties (text) has huge measure of information. It is difficult to measure relative data in light of the way in which the rate of the information is not clear. In such cases, it can be risky to partner side-data into the mining technique, since it can either build the nature of the representation for the mining system, then again add noise to the methodology. In various content mining applications, side-information is accessible nearby the content reports. Such text documents may be of a few sorts, for instance, record provenance information, the connections in the file, user access conduct from web logs, or other non-text based characteristics which are embedded into the content record. Such qualities may contain a massive measure of data for clustering purposes in the proposed system merge summarization methods. While executing the COATES estimation we used summarization system which is the union of duplicated clusters what’s more, give last summary. COATES cluster algorithms we get the clusters on the establishment of substance what’s more, auxiliary attributes. So in this project, an algorithm is designed, in order to give an effective clustering algorithm. Two algorithms are used in this project for clustering. In this paper COATES algorithm (this algorithm combines classical partitioning algorithms with probabilistic models) is used and the proposed system implements hierarchical algorithm which is compared with COATES algorithm and also implements the merging and summary generation algorithm which produces the summary or pure data for the user’s convenience.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Improved Clustering Technique Using Metadata for Text Mining

Thematic Clustering Methods Applied to News Texts Analysis

Improving Clustering Quality by Automatic Text Summarization

References

D. Cutting, D. Karger, J. Pedersen, and J. Turkey, “Scatter/Gather: A cluster-based approach to browsing large document collections,” in Proc. ACM SIGIR Conf., New York, NY, USA, 1992, pp. 318–329.
Google Scholar
R. Ng and J. Han, “Efficient and effective clustering methods for spatial data mining,” in Proc. VLDB Conf., San Francisco, CA, USA, 1994, pp. 144–155.
Google Scholar
C. Aggarwal and P. S. Yu, “A framework for clustering massive text and categorical data streams,” in Proc. SIAM Conf. Data Mining, 2006, pp. 477–481.J.
Google Scholar
S. Guha, R. Rastogi, and K. Shim, “ROCK: A robust clustering algorithm for categorical attributes,” Inf. Syst., vol. 25, no. 5, pp. 345–366, 2000.
Google Scholar
D. Cutting, D. Karger, J. Pedersen, and J. Tukey, “Scatter/Gather: A cluster-based approach to browsing large document collections,” in Proc. ACM SIGIR Conf., New York, NY, USA, 1992, pp. 318–329.
Google Scholar
W. Xu, X. Liu, and Y. Gong, “Document clustering based on nonnegative Matrix factorization,” in Proc. ACM SIGIR Conf., New York, NY, USA, 2003, pp. 267–273.
Google Scholar
C. C. Aggarwal and P. S. Yu, “A framework for clustering massive text and categorical data streams,” in Proc. SIAM Conf. Data Mining, 2006, pp. 477–481.
Google Scholar
I. Dillon, “Co-clustering documents and words using bipartite spectral Graph partitioning,” in Proc. ACM KDD Conf., New York, NY, USA, 2001, pp. 269–274.
Google Scholar
Q. He, K. Chang, E.-P. Lim, and J. Zhang, “Bursty feature representation for clustering text streams,” in Proc. SDM Conf., 2007, pp. 491–496.
Google Scholar
T. Liu, S. Liu, Z. Chen, and W.-Y. Ma, “An evaluation of feature selection for text is clustering,” in Proc. ICML Conf., Washington, DC, USA, 2003, pp. 488–495.
Google Scholar
A. Banerjee and S. Basu, “Topic models over text streams: A study of batch and online unsupervised learning,” in Proc. SDM Conf., 2007, pp. 437–442.
Google Scholar
S. Zhong, “Efficient streaming text clustering,” Neural Net w., vol. 18, no. 5-6, pp. 790–798, 2005.
Google Scholar

Download references

Author information

Authors and Affiliations

K. J. College of Engineering, Kondhwa, 411043, Pune, India
Shilpa S. Raut & V. B. Maral

Authors

Shilpa S. Raut
View author publications
You can also search for this author in PubMed Google Scholar
V. B. Maral
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shilpa S. Raut .

Editor information

Editors and Affiliations

Guru Nanak Institutions, Professor & Managing Director, Ibrahimpatnam, Andhra Pradesh, India
H. S. Saini
Guru Nanak Institutions, Professor & Associate Director, Ibrahimpatnam, Andhra Pradesh, India
Rishi Sayal
Guru Nanak Institutions, Professor and Head – CSE and IT, Ibrahimpatnam, India
Sandeep Singh Rawat

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Raut, S.S., Maral, V.B. (2016). Text Clustering and Text Summarization on the Use of Side Information. In: Saini, H., Sayal, R., Rawat, S. (eds) Innovations in Computer Science and Engineering. Advances in Intelligent Systems and Computing, vol 413. Springer, Singapore. https://doi.org/10.1007/978-981-10-0419-3_16

Download citation

DOI: https://doi.org/10.1007/978-981-10-0419-3_16
Published: 20 February 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0417-9
Online ISBN: 978-981-10-0419-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Text Clustering and Text Summarization on the Use of Side Information

Abstract

Access this chapter

Similar content being viewed by others

Improved Clustering Technique Using Metadata for Text Mining

Thematic Clustering Methods Applied to News Texts Analysis

Improving Clustering Quality by Automatic Text Summarization

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Text Clustering and Text Summarization on the Use of Side Information

Abstract

Access this chapter

Similar content being viewed by others

Improved Clustering Technique Using Metadata for Text Mining

Thematic Clustering Methods Applied to News Texts Analysis

Improving Clustering Quality by Automatic Text Summarization

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation