Time and Space Efficient Web Document Clustering Using Rayleigh Distribution

Srikanth, D.; Sakthivel, S.

doi:10.1007/s11277-018-5366-5

Time and Space Efficient Web Document Clustering Using Rayleigh Distribution

Published: 31 January 2018

Volume 102, pages 3255–3268, (2018)
Cite this article

Wireless Personal Communications Aims and scope Submit manuscript

D. Srikanth¹ &
S. Sakthivel²

109 Accesses
Explore all metrics

Abstract

Web document clustering identifies the relevant and useful information like comparing shopping service provider from flipkart.com, information retrieval from web search engines and so on. Choosing the best representation and enhancing knowledge discovery for a given task in very large textual data stores is the most critical step in web document clustering. In this work, considering the problem of discovering the most predominant word with similar semantic model and measuring relative strength of predominant word of web document. This paper presents an efficient technique called Rayleigh Clustering with Self Organizing Map (RC-SOM) for web document domain using generation of self organizing patterns, clustering of predominant word and Rayleigh distribution. Self organizing patterns are generated to identify the most predominant word from web document. Then clustering of predominant word with similar semantic are organized for all the web documents. Finally, the efficiency of web document clustering is improved by applying Rayleigh distribution that lists out the relative strength of predominant word for each web document. The experimental is presented for RC-SOM technique on Anonymous Microsoft Web Data dataset and performs evolution factor such as cluster accuracy, execution time and computation space for cluster.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Liu, K., Liheng, X., & Zhao, J. (2015). Co-extracting opinion targets and opinion words from online reviews based on the word alignment model. IEEE Transactions on Knowledge and Data Engineering, 27(3), 636–650.
Article Google Scholar
Skabar, A., & Abdalgader, K. (2013). Clustering sentence-level text using a novel fuzzy relational clustering algorithm. IEEE Transactions on Knowledge and Data Engineering, 25(1), 62–75.
Article Google Scholar
Tao, X., Li, Y., & Zhong, N. (2011). A personalized ontology model for web information gathering. IEEE Transactions on Knowledge and Data Engineering, 23(4), 496–511.
Article Google Scholar
Habibi, M., & Popescu-Belis, A. (2015). Keyword extraction and clustering for document recommenda in conversations. IEEE Transactions on Audio, Speech and Language Processing, 23(4), 746–759.
Article Google Scholar
Kim, C., & Shim, K. (2011). TEXT: Automatic template extraction from heterogeneous web pages. IEEE Transactions on Knowledge and Data Engineering, 23(4), 612–626.
Article Google Scholar
Yang, C., Cao, Y., Nie, Z., Zhou, J., & Wen, J.-R. (2010). Closing the loop in webpage understanding. IEEE Transactions on Knowledge and Data Engineering, 22(5), 639–650.
Article Google Scholar
Wong, T.-L., & Lam, W. (2010). Learning to adapt web information extraction knowledge and discovering new attributes via a bayesian approach. IEEE Transactions on Knowledge and Data Engineering, 22(4), 523–536.
Article Google Scholar
Minku, L. L., White, A. P., & Yao, X. (2010). The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Transactions on Knowledge and Data Engineering, 22(5), 730–742.
Article Google Scholar
Shirani-Mehr, H., Li, C., Liang, G., Shmueli-Scheuer, M. (2008). Quality-aware retrieval of data objects from autonomous sources for web-based repositories. In Data engineering, IEEE 24th international conference on 2008. ICDE 2008 (pp. 1492–1494).
Chu, Y.-H., Huang, J.-W., Chuang, K.-T., Yang, D.-N., & Chen, M.-S. (2010). Density conscious subspace clustering for high-dimensional data. IEEE Transactions on Knowledge and Data Engineering, 22(1), 16–30.
Article Google Scholar
Hwang, M., Choi, C., & Kim, P. (2011). Automatic enrichment of semantic relation network and its application to word sense disambiguation. IEEE Transactions on Knowledge and Data Engineering, 23(6), 845–858.
Article Google Scholar
Nguyen, T. T. S., Lu, H. Y., & Lu, J. (2010). Web-page recommendation based on web usage and domain knowledge. IEEE Transactions on Knowledge and Data Engineering, 26(10), 2574–2587.
Article Google Scholar
Yu, G., Gao, C., Cong, G., & Ge, Yu. (2014). Effective and efficient clustering methods for correlated probabilistic graphs. IEEE Transactions on Knowledge and Data Engineering, 26(5), 1117–1130.
Article Google Scholar
Nguyen, D. T., Chen, L., & Chan, C. K. (2012). Clustering with multiviewpoint-based similarity measure. IEEE Transactions on Knowledge and Data Engineering, 26(6), 988–1001.
Article Google Scholar
Hassan, M. T., Karim, A., Kim, J.-B., & Jeon, M. (2015). CDIM: Document clustering by discrimination information maximization. Elsevier, Information Sciences, 316, 87–106.
Article Google Scholar
Guan, R., Shi, X., Marchese, M., Yang, C., & Liang, Y. (2011). Text clustering with seeds affinity propagation. IEEE Transactions on Knowledge and Data Engineering, 23(4), 627–637.
Article Google Scholar
Cai, D., He, X., & Han, J. (2011). Locally consistent concept factorization for document clustering. IEEE Transactions on Knowledge and Data Engineering, 23(6), 902–913.
Article Google Scholar
Lecue, F., & Mehandjiev, N. (2011). Seeking quality of web service composition in a semantic dimension. IEEE Transactions on Knowledge and Data Engineering, 23(6), 942–959.
Article Google Scholar
Li, Z., Lee, K. C. K., Zheng, B., Lee, W.-C., Lee, D. L., & Wang, X. (2011). IR-Tree: An efficient index for geographic document search. IEEE Transactions on Knowledge and Data Engineering, 23(4), 585–599.
Article Google Scholar
Yorek, N., Ugulu, I., & Aydin, H. (2015). Using self-organizing neural network map combined with ward’s clustering algorithm for visualization of students’ cognitive structural models about aliveness concept. Hindawi Publishing Corporation, Computational Intelligence and Neuroscience, 2015, 1–15.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Vidyaa Vikas College of Engineering and Technology, Tiruchengode, Namakkal, Tamilnadu, India
D. Srikanth
Department of Computer Science and Engineering, Sona College of Technology, Salem, India
S. Sakthivel

Authors

D. Srikanth
View author publications
You can also search for this author in PubMed Google Scholar
S. Sakthivel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. Srikanth.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Srikanth, D., Sakthivel, S. Time and Space Efficient Web Document Clustering Using Rayleigh Distribution. Wireless Pers Commun 102, 3255–3268 (2018). https://doi.org/10.1007/s11277-018-5366-5

Download citation

Published: 31 January 2018
Issue Date: October 2018
DOI: https://doi.org/10.1007/s11277-018-5366-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Time and Space Efficient Web Document Clustering Using Rayleigh Distribution

Abstract

Access this article

Similar content being viewed by others

A Hybrid Approach for Improving Web Document Clustering Based on Concept Mining

An Effective of Data Organizing Method Combines with Naïve Bayes for Vietnamese Document Retrieval

Automatic Scientific Document Clustering Using Self-organized Multi-objective Differential Evolution

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Time and Space Efficient Web Document Clustering Using Rayleigh Distribution

Abstract

Access this article

Similar content being viewed by others

A Hybrid Approach for Improving Web Document Clustering Based on Concept Mining

An Effective of Data Organizing Method Combines with Naïve Bayes for Vietnamese Document Retrieval

Automatic Scientific Document Clustering Using Self-organized Multi-objective Differential Evolution

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation